youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Michael Goin	b3f4e17935	[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444 )	2024-08-16 13:59:16 -07:00
Simon Mo	f020a6297e	[Docs] Update readme (#7316 )	2024-08-11 17:13:37 -07:00
Kuntai Du	6a1e25b151	[Doc] Add documentations for nightly benchmarks (#6412 )	2024-07-25 11:57:16 -07:00
dongmao zhang	87525fab92	[bitsandbytes]: support read bnb pre-quantized model (#5753 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-07-23 23:45:09 +00:00
Saliya Ekanayake	a27f87da34	[Doc] Fix Typo in Doc (#6392 ) Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>	2024-07-13 00:48:23 +00:00
youkaichao	2d23b42d92	[doc] update pipeline parallel in readme (#6347 )	2024-07-11 11:38:40 -07:00
Cyrus Leung	9389380015	[Doc] Move guide for multimodal model and other improvements (#6168 )	2024-07-06 17:18:59 +08:00
ning.zhang	83bdcb6ac3	add FAQ doc under 'serving' (#5946 )	2024-07-01 14:11:36 -07:00
Ilya Lavrenov	57f09a419c	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
Cyrus Leung	5cbe8d155c	[Core] Registry for processing model inputs (#5214 ) Co-authored-by: ywang96 <ywang@roblox.com>	2024-06-28 12:09:56 +00:00
Michael Goin	5b15bde539	[Doc] Documentation on supported hardware for quantization methods (#5745 )	2024-06-21 12:44:29 -04:00
Kunshang Ji	728c4c8a06	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-06-17 11:01:25 -07:00
Sanger Steel	6e2527a7cb	[Doc] Update documentation on Tensorizer (#5471 )	2024-06-14 11:27:57 -07:00
Woosuk Kwon	1a8bfd92d5	[Hardware] Initial TPU integration (#5292 )	2024-06-12 11:53:03 -07:00
Kuntai Du	9fde251bf0	[Doc] Add an automatic prefix caching section in vllm documentation (#5324 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-06-11 10:24:59 -07:00
Cade Daniel	4c2ffb28ff	[Speculative decoding] Initial spec decode docs (#5400 )	2024-06-11 10:15:40 -07:00
youkaichao	d8f31f2f8b	[Doc] add debugging tips (#5409 )	2024-06-10 23:21:43 -07:00
Michael Goin	77c87beb06	[Doc] Add documentation for FP8 W8A8 (#5388 )	2024-06-10 18:55:12 -06:00
Cyrus Leung	7a64d24aad	[Core] Support image processor (#4197 )	2024-06-02 22:56:41 -07:00
Cyrus Leung	5ae5ed1e60	[Core] Consolidate prompt arguments to LLM engines (#4328 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-05-28 13:29:31 -07:00
Simon Mo	e941f88584	[Docs] Add acknowledgment for sponsors (#4925 )	2024-05-21 00:17:25 -07:00
Zhuohan Li	c579b750a0	[Doc] Add meetups to the doc (#4798 )	2024-05-13 18:48:00 -07:00
Cyrus Leung	4bfa7e7f75	[Doc] Add API reference for offline inference (#4710 )	2024-05-13 17:47:42 -07:00
SangBin Cho	36fb68f947	[Doc] Chunked Prefill Documentation (#4580 )	2024-05-04 00:18:00 -07:00
youkaichao	2d7bce9cd5	[Doc] add env vars to the doc (#4572 )	2024-05-03 05:13:49 +00:00
Prashant Gupta	b31a1fb63c	[Doc] add visualization for multi-stage dockerfile (#4456 ) Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-04-30 17:41:59 +00:00
Harry Mellor	3d925165f2	Add example scripts to documentation (#4225 ) Co-authored-by: Harry Mellor <hmellor@oxts.com>	2024-04-22 16:36:54 +00:00
Adrian Abeyta	2ff767b513	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 ) Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com> Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by: guofangze <guofangze@kuaishou.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-03 14:15:55 -07:00
bigPYJ1151	0e3f06fe9c	[Hardware][Intel] Add CPU inference backend (#3634 ) Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>	2024-04-01 22:07:30 -07:00
yhu422	d8658c8cc1	Usage Stats Collection (#2852 )	2024-03-28 22:16:12 -07:00
Simon Mo	ef65dcfa6f	[Doc] Add docs about OpenAI compatible server (#3288 )	2024-03-18 22:05:34 -07:00
Sherlock Xu	b0925b3878	docs: Add BentoML deployment doc (#3336 ) Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>	2024-03-12 10:34:30 -07:00
Jialun Lyu	27a7b070db	Add document for vllm paged attention kernel. (#2978 )	2024-03-04 09:23:34 -08:00
Liangfu Chen	d0fae88114	[DOC] add setup document to support neuron backend (#2777 )	2024-03-04 01:03:51 +00:00
Yuan Tang	49d849b3ab	docs: Add tutorial on deploying vLLM model with KServe (#2586 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-03-01 11:04:14 -08:00
Simon Mo	f964493274	[CI] Ensure documentation build is checked in CI (#2842 )	2024-02-12 22:53:07 -08:00
Philipp Moritz	4ca2c358b1	Add documentation section about LoRA (#2834 )	2024-02-12 17:24:45 +01:00
Zhuohan Li	1af090b57d	Bump up version to v0.3.0 (#2656 )	2024-01-31 00:07:07 -08:00
Jiaxiang	6549aef245	[DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011 )	2024-01-11 19:26:49 -08:00
Woosuk Kwon	26c52a5ea6	[Docs] Add CUDA graph support to docs (#2148 )	2023-12-17 01:49:20 -08:00
Woosuk Kwon	b81a6a6bb3	[Docs] Add supported quantization methods to docs (#2135 )	2023-12-15 13:29:22 -08:00
TJian	6ccc0bfffb	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 ) Co-authored-by: Philipp Moritz <pcmoritz@gmail.com> Co-authored-by: Amir Balwel <amoooori04@gmail.com> Co-authored-by: root <kuanfu.liu@akirakan.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com> Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>	2023-12-07 23:16:52 -08:00
Simon Mo	5313c2cb8b	Add Production Metrics in Prometheus format (#1890 )	2023-12-02 16:37:44 -08:00
Massimiliano Pronesti	05a38612b0	docs: add instruction for langchain (#1162 )	2023-11-30 10:57:44 -08:00
Casper	a921d8be9d	[DOCS] Add engine args documentation (#1741 )	2023-11-22 12:31:27 -08:00
Casper	8516999495	Add Quantization and AutoAWQ to docs (#1235 )	2023-11-04 22:43:39 -07:00
Stephen Krider	9cabcb7645	Add Dockerfile (#1350 )	2023-10-31 12:36:47 -07:00
Tanmay Verma	6f2dd6c37e	Add documentation to Triton server tutorial (#983 )	2023-09-20 10:32:40 -07:00
Woosuk Kwon	eda1a7cad3	Announce paper release (#1036 )	2023-09-13 17:38:13 -07:00
Zhanghao Wu	58df2883cb	[Doc] Add doc for running vLLM on the cloud (#426 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-07-16 13:37:14 -07:00

1 2

61 Commits