youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Alexander Matveev	cfbca8a2f2	[V1] TPU - Tensor parallel MP support (#15059 )	2025-03-20 00:55:18 +00:00
Simon Mo	0fe5609874	[Docs] Annouce Ollama and Singapore Meetups (#15161 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-03-19 16:18:04 -07:00
Nick Hill	22d33baca2	[FrontEnd][Perf] `merge_async_iterators` fast-path for single-prompt requests (#15150 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-19 21:04:41 +00:00
iefgnoix	b0e96aaebb	[V1][TPU] Change kv cache shape. (#15145 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-03-19 12:16:42 -07:00
Wang Ran (汪然)	8310e0b59b	simple bugfix: Update stats.py (#15139 )	2025-03-19 18:26:27 +00:00
maobaolong	26dd972adb	[FEAT]Support reset prefix cache by specified device (#15003 )	2025-03-19 10:54:41 -07:00
Murali Andoorveedu	61c7a1b856	[V1] Minor V1 async engine test refactor (#15075 ) Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca> Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca> v0.8.1	2025-03-19 10:37:17 -07:00
Alessandro Sangiorgi	374ee287d8	[Frontend] Remove custom_cache_manager (#13791 ) Signed-off-by: fulvius31 <asangior@redhat.com>	2025-03-20 00:13:50 +08:00
Kero Liang	a4d83661d7	[Misc] Update the "the first vLLM China Meetup" slides link to point to the first page (#15134 ) Signed-off-by: imkero <kerorek@outlook.com>	2025-03-19 15:07:39 +00:00
Jan Kaniecki	8363cd093d	[Bugfix] Adjust mllama to regional compilation (#15112 ) Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>	2025-03-19 07:57:25 -07:00
Aaron Pham	6c5a3195db	[Misc][Benchmark] Add support for different `tokenizer_mode` (#15040 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-03-19 14:56:50 +00:00
Marc-Alexandre Côté	073d1ed354	[Doc] Update tip info on using latest transformers when creating a custom Dockerfile (#15070 )	2025-03-19 13:33:40 +00:00
Cyrus Leung	3d446433ec	[Bugfix] Fix size calculation of processing cache (#15114 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 05:53:19 -07:00
Cyrus Leung	1fe0fd12d3	[Misc] Avoid unnecessary HF `do_rescale` warning when passing dummy data (#15107 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 03:42:31 -07:00
Roger Wang	dafb4e504a	[V1][Bugfix] Fix oracle for device checking (#15104 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-19 18:35:32 +08:00
Kunshang Ji	68cf1601d3	[CI][Intel GPU] update XPU dockerfile and CI script (#15109 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-03-19 01:29:25 -07:00
Cyrus Leung	61f412187d	[Bugfix] Re-enable Gemma3 for V1 (#14980 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-18 23:58:22 -07:00
Woosuk Kwon	05ccd0aa35	[V1] Ensure using int64 for sampled token ids (#15065 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-18 23:52:19 -07:00
Cyrus Leung	f690372b68	[Core] Update dtype detection and defaults (#14858 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 13:49:33 +08:00
Brayden Zhong	8b3e94a357	[Model] Remove duplicated message check in Mistral chat completion request (#15069 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-19 05:09:32 +00:00
Julien Denize	437f9162d0	[Model] Pixtral: Remove layer instantiation duplication (#15053 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-03-19 10:34:03 +08:00
Cody Yu	4f065f12f5	[Misc][V1] Skip device checking if not available (#15061 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2025-03-18 19:33:43 -07:00
Jennifer Zhao	228b768db6	[Doc] Minor v1_user_guide update (#15064 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2025-03-18 16:10:45 -07:00
Chujie Zheng	027827cc1d	fix long dtype in topk sampling (#15049 )	2025-03-18 15:57:31 -07:00
Alexander Matveev	72a8639b68	[V1] TPU - CI/CD use smaller model (#15054 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-18 21:39:21 +00:00
Woosuk Kwon	99abb8b650	[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-18 14:31:54 -07:00
Russell Bryant	3a1e648158	[V1] Refactor Structured Output for multiple backends (#14694 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-18 19:49:15 +00:00
Jee Jee Li	46c759c165	[Bugfix] Fix LoRA extra vocab size (#15047 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-18 09:40:29 -07:00
Isotr0py	179a619c21	[Bugfix] Fix broken CPU quantization due to triton import (#15038 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-18 08:57:39 -07:00
yury-tokpanov	452e8fd968	[MODEL] Add support for Zamba2 models (#13185 ) Signed-off-by: Yury Tokpanov <yury@zyphra.com> Signed-off-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Quentin Anthony <qganthony@yahoo.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-18 08:56:21 -07:00
ekuznetsov139	8b793f7ec6	MI325 configs, fused_moe_kernel bugfix (#14987 ) Signed-off-by: Eugene Kuznetsov <eugene.kuznetsov@amd.com>	2025-03-18 08:05:18 -07:00
Nicolò Lucchesi	af35d3a3cc	[TPU][V1][Bugfix] Fix chunked prefill with padding (#15037 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-18 07:34:45 -07:00
Simon Mo	3b457143d2	[Bugfix] Register serializers for V0 MQ Engine (#15009 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-03-18 09:14:47 -04:00
Cyrus Leung	ab656f2c2f	[Bugfix] Loosen type check to avoid errors in V1 (#15021 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-18 12:54:40 +00:00
Serena	64fc2193dc	[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros (#14347 )	2025-03-18 05:50:19 -07:00
Sebastian Schoennenbeck	dd732028f5	[Bugfix][Frontend] Fix validation of `logprobs` in `ChatCompletionRequest` (#14352 ) Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>	2025-03-18 05:50:05 -07:00
hoshi-hiyouga	414919138b	[Bugfix] torchrun compatibility (#14899 ) Signed-off-by: hiyouga <hiyouga@buaa.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-03-18 05:49:27 -07:00
Jee Jee Li	db7c8ca910	[Misc] Embedding model support LoRA (#14935 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-18 12:07:00 +00:00
Patrick von Platen	f863ffc965	[Mistral-Small 3.1] Update docs and tests (#14977 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-18 03:29:42 -07:00
Varun Sundar Rabindranath	400d483e87	[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-18 09:47:53 +00:00
Shanshan Shen	d1695758b2	[Doc][V1] Fix V1 APC doc (#14920 )	2025-03-18 08:15:46 +00:00
Liangfu Chen	53a0cf8b95	[Neuron] trim attention kernel tests to fit trn1.2x instance (#14988 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com>	2025-03-18 15:05:52 +08:00
Tristan Leclercq	5eeabc2a44	[Bugfix] Fix bnb quantization for models with both HF-format and Mistral-format weights (#14950 )	2025-03-17 23:27:26 +00:00
Alexander Matveev	18551e820c	[V1] TPU - Fix CI/CD runner (#14974 )	2025-03-17 21:07:07 +00:00
Robert Shaw	e41e160263	[V1] Guard Against Main Thread Usage (#14972 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-17 13:23:02 -07:00
Cyrus Leung	b89fb2a4a1	[CI/Build] Use `AutoModelForImageTextToText` to load VLMs in tests (#14945 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-17 18:35:17 +00:00
Roger Wang	5340b0e221	[Bugfix] Fix interface for Olmo2 on V1 (#14976 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-17 11:26:38 -07:00
Roger Wang	37e3806132	[Bugfix] Make Gemma3 MM V0 only for now (#14971 ) Signed-off-by: Roger Wang <ywang@roblox.com> v0.8.0rc2	2025-03-17 10:04:21 -07:00
Aaron Pham	c0efdd655b	[Fix][Structured Output] using vocab_size to construct matcher (#14868 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-03-17 11:42:45 -04:00
Quentin	aaaec52ad9	[Bugfix][Model] Mixtral: use unused head_dim config argument (#14961 ) Signed-off-by: Quentin Torroba <quentin.torroba@mistral.ai>	2025-03-17 07:44:18 -07:00

1 2 3 4 5 ...

5271 Commits