youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
Harry Mellor	72a3f6b898	Construct `KVTransferConfig` properly from Python instead of using JSON blobs without CLI (#17994 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-12 11:25:33 -07:00
Xu Wenqing	3a5ea75129	[Feature] Support DeepSeekV3 Function Call (#17784 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com> Signed-off-by: Xu Wenqing <xuwq1993@qq.com>	2025-05-12 00:45:21 -07:00
Isotr0py	021c16c7ca	[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-11 17:56:30 -07:00
Frieda Huang	9cea90eab4	[Frontend] Add /classify endpoint (#17032 ) Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>	2025-05-11 07:57:07 +00:00
Reid	4c31218f80	[Misc] remove --model from vllm serve usage (#17944 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-10 13:23:31 +00:00
Mark McLoughlin	7e3571134f	[V1][Spec Decoding] Include bonus tokens in mean acceptance length (#17908 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-09 13:32:36 -07:00
Rui Qiao	c44c384b1c	[Misc] Add references in ray_serve_deepseek example (#17907 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-05-09 16:59:36 +00:00
Cyrus Leung	a1e19b635d	[Doc] Fix a typo in the file name (#17836 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-08 18:04:18 +08:00
Rick Yuan	ca04b97c93	[Bugfix] Fix tool call template validation for Mistral models (#17644 ) Signed-off-by: Rick Yuan <yuan821120@gmail.com> Signed-off-by: RIck Yuan <yuan821120@gmail.com> Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>	2025-05-08 09:47:19 +00:00
Cyrus Leung	96722aa81d	[Frontend] Chat template fallbacks for multimodal models (#17805 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-07 23:05:54 -07:00
Aaron Pham	a8238bbdb0	[Chore][Doc] uses model id determined from OpenAI client (#17815 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-05-08 01:48:57 +00:00
Harry Mellor	646a31e51e	Fix and simplify `deprecated=True` CLI `kwarg` (#17781 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-07 16:51:06 +01:00
Cyrus Leung	8a15c2603a	[Frontend] Add missing chat templates for various MLLMs (#17758 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-07 00:10:01 -07:00
Satyajith Chilappagari	043e4c4955	Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Aaron Dou <yzdou@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Chongming Ni <chongmni@amazon.com> Co-authored-by: Amulya Ballakur <amulyaab@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Lin Lin Pan <tailinpa@amazon.com> Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>	2025-05-07 00:07:30 -07:00
Jee Jee Li	ba7703e659	[Misc] Remove qlora_adapter_name_or_path (#17699 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-06 23:10:37 -07:00
Jevin Jiang	621ca2c0ab	[TPU] Increase block size and reset block shapes (#16458 )	2025-05-06 13:55:04 -04:00
Cyrus Leung	5b8c390747	[Bugfix] Fix modality limits in vision language example (#17721 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-06 16:12:28 +00:00
Reid	7525d5f3d5	[doc] Add RAG Integration example (#17692 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-06 16:10:23 +00:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Cyrus Leung	d7543862bd	[Misc] Rename assets for testing (#17575 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 03:29:25 -07:00
Cyrus Leung	f89d0e11bf	[Misc] Continue refactoring model tests (#17573 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 22:06:08 -07:00
Isotr0py	88c8304104	[Model] Refactor Ovis2 to support original tokenizer (#17537 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-01 11:00:53 -07:00
Reid	7423cf0a9b	[Misc] refactor example - cpu_offload_lmcache (#17460 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-01 15:05:24 +00:00
Chauncey	98060b001d	[Feature][Frontend]: Deprecate --enable-reasoning (#17452 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-01 06:46:16 -07:00
Reid	7169f87ad0	[doc] add streamlit integration (#17522 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-01 13:34:02 +00:00
zh Wang	d586ddc691	[BugFix] Fix authorization of openai_transcription_client.py (#17321 ) Signed-off-by: zh Wang <rekind133@outlook.com>	2025-04-30 09:51:05 -07:00
Alec	0be6d05b5e	[V1][Metrics] add support for kv event publishing (#16750 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-04-30 07:44:45 -07:00
Marco	54072f315f	[MODEL ADDITION] Ovis2 Model Addition (#15826 ) Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-04-30 07:33:29 +00:00
Huy Do	2c4f59afc3	Update PyTorch to 2.7.0 (#16859 )	2025-04-29 19:08:04 -07:00
Bryan Lu	70788bdbdc	[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE (#17211 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-29 21:10:00 +00:00
Harry Mellor	a6977dbd15	Simplify (and fix) passing of guided decoding backend options (#17008 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:02:23 +00:00
Chauncey	96e06e3cb7	[Misc] Add a Jinja template to support Mistral3 function calling (#17195 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-28 19:53:44 -07:00
Alex Brooks	fa93cd9f60	[Model] Add Granite Speech Support (#16246 ) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-04-28 10:05:00 +00:00
Kuntai Du	9053d0b134	[Doc] Fix wrong github link in LMCache examples (#17274 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-04-28 03:09:11 +00:00
Russell Bryant	f8acd01ff7	[V1] Add `structural_tag` support using xgrammar (#17085 )	2025-04-26 14:06:37 +00:00
Isotr0py	8c1c926d00	[Bugfix] Fix missing int type for `-n` in multi-image example (#17223 )	2025-04-26 08:49:52 +00:00
Yihua Cheng	5e83a7277f	[v1] [P/D] Adding LMCache KV connector for v1 (#16625 )	2025-04-26 03:03:38 +00:00
Rui Qiao	c53e0730cb	[Misc] Refine ray_serve_deepseek example (#17204 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-25 16:06:59 -07:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00
Rui Qiao	583e900996	[Misc] Add example to run DeepSeek with Ray Serve LLM (#17134 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 22:25:21 +00:00
Maximilien de Bayser	05e1fbfc52	Add chat template for Llama 4 models (#16428 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-24 20:19:36 +00:00
Reid	1bcbcbf574	[Misc] refactor example series - structured outputs (#17040 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-24 07:49:48 -07:00
wang.yuqi	67309a1cb5	[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970 )	2025-04-24 07:06:28 -07:00
Reid	db2f8d915c	[V1] Update structured output (#16812 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-23 23:57:17 -07:00
Reid	4b91c927f6	[Misc] refactor example series (#16972 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-22 11:44:21 +00:00
Cyrus Leung	205d84aaa9	[VLM] Clean up models (#16873 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 12:13:06 +00:00
Isotr0py	83f3c3bd91	[Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 02:26:11 -07:00
Nicolò Lucchesi	2ef0dc53b8	[Frontend] Add sampling params to `v1/audio/transcriptions` endpoint (#16591 ) Signed-off-by: Jannis Schönleber <joennlae@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Jannis Schönleber <joennlae@gmail.com>	2025-04-19 07:03:54 +00:00
Yang Fan	2c1bd848a6	[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130 ) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiong Wang <wangxiongts@163.com>	2025-04-18 23:14:36 -07:00

1 2 3 4 5 ...

415 Commits