Commit Graph

61 Commits

Author SHA1 Message Date
b3f4e17935 [Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444) 2024-08-16 13:59:16 -07:00
f020a6297e [Docs] Update readme (#7316) 2024-08-11 17:13:37 -07:00
6a1e25b151 [Doc] Add documentations for nightly benchmarks (#6412) 2024-07-25 11:57:16 -07:00
87525fab92 [bitsandbytes]: support read bnb pre-quantized model (#5753)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-23 23:45:09 +00:00
a27f87da34 [Doc] Fix Typo in Doc (#6392)
Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>
2024-07-13 00:48:23 +00:00
2d23b42d92 [doc] update pipeline parallel in readme (#6347) 2024-07-11 11:38:40 -07:00
9389380015 [Doc] Move guide for multimodal model and other improvements (#6168) 2024-07-06 17:18:59 +08:00
83bdcb6ac3 add FAQ doc under 'serving' (#5946) 2024-07-01 14:11:36 -07:00
57f09a419c [Hardware][Intel] OpenVINO vLLM backend (#5379) 2024-06-28 13:50:16 +00:00
5cbe8d155c [Core] Registry for processing model inputs (#5214)
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-28 12:09:56 +00:00
5b15bde539 [Doc] Documentation on supported hardware for quantization methods (#5745) 2024-06-21 12:44:29 -04:00
728c4c8a06 [Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-06-17 11:01:25 -07:00
6e2527a7cb [Doc] Update documentation on Tensorizer (#5471) 2024-06-14 11:27:57 -07:00
1a8bfd92d5 [Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00
9fde251bf0 [Doc] Add an automatic prefix caching section in vllm documentation (#5324)
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-06-11 10:24:59 -07:00
4c2ffb28ff [Speculative decoding] Initial spec decode docs (#5400) 2024-06-11 10:15:40 -07:00
d8f31f2f8b [Doc] add debugging tips (#5409) 2024-06-10 23:21:43 -07:00
77c87beb06 [Doc] Add documentation for FP8 W8A8 (#5388) 2024-06-10 18:55:12 -06:00
7a64d24aad [Core] Support image processor (#4197) 2024-06-02 22:56:41 -07:00
5ae5ed1e60 [Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
e941f88584 [Docs] Add acknowledgment for sponsors (#4925) 2024-05-21 00:17:25 -07:00
c579b750a0 [Doc] Add meetups to the doc (#4798) 2024-05-13 18:48:00 -07:00
4bfa7e7f75 [Doc] Add API reference for offline inference (#4710) 2024-05-13 17:47:42 -07:00
36fb68f947 [Doc] Chunked Prefill Documentation (#4580) 2024-05-04 00:18:00 -07:00
2d7bce9cd5 [Doc] add env vars to the doc (#4572) 2024-05-03 05:13:49 +00:00
b31a1fb63c [Doc] add visualization for multi-stage dockerfile (#4456)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-04-30 17:41:59 +00:00
3d925165f2 Add example scripts to documentation (#4225)
Co-authored-by: Harry Mellor <hmellor@oxts.com>
2024-04-22 16:36:54 +00:00
2ff767b513 Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
0e3f06fe9c [Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
2024-04-01 22:07:30 -07:00
d8658c8cc1 Usage Stats Collection (#2852) 2024-03-28 22:16:12 -07:00
ef65dcfa6f [Doc] Add docs about OpenAI compatible server (#3288) 2024-03-18 22:05:34 -07:00
b0925b3878 docs: Add BentoML deployment doc (#3336)
Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
2024-03-12 10:34:30 -07:00
27a7b070db Add document for vllm paged attention kernel. (#2978) 2024-03-04 09:23:34 -08:00
d0fae88114 [DOC] add setup document to support neuron backend (#2777) 2024-03-04 01:03:51 +00:00
49d849b3ab docs: Add tutorial on deploying vLLM model with KServe (#2586)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-03-01 11:04:14 -08:00
f964493274 [CI] Ensure documentation build is checked in CI (#2842) 2024-02-12 22:53:07 -08:00
4ca2c358b1 Add documentation section about LoRA (#2834) 2024-02-12 17:24:45 +01:00
1af090b57d Bump up version to v0.3.0 (#2656) 2024-01-31 00:07:07 -08:00
6549aef245 [DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011) 2024-01-11 19:26:49 -08:00
26c52a5ea6 [Docs] Add CUDA graph support to docs (#2148) 2023-12-17 01:49:20 -08:00
b81a6a6bb3 [Docs] Add supported quantization methods to docs (#2135) 2023-12-15 13:29:22 -08:00
6ccc0bfffb Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00
5313c2cb8b Add Production Metrics in Prometheus format (#1890) 2023-12-02 16:37:44 -08:00
05a38612b0 docs: add instruction for langchain (#1162) 2023-11-30 10:57:44 -08:00
a921d8be9d [DOCS] Add engine args documentation (#1741) 2023-11-22 12:31:27 -08:00
8516999495 Add Quantization and AutoAWQ to docs (#1235) 2023-11-04 22:43:39 -07:00
9cabcb7645 Add Dockerfile (#1350) 2023-10-31 12:36:47 -07:00
6f2dd6c37e Add documentation to Triton server tutorial (#983) 2023-09-20 10:32:40 -07:00
eda1a7cad3 Announce paper release (#1036) 2023-09-13 17:38:13 -07:00
58df2883cb [Doc] Add doc for running vLLM on the cloud (#426)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-16 13:37:14 -07:00