youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
XiongfeiWei	9765940824	[TPU] Enable gemma3-27b with TP>1 on multi-chips. (#17335 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-05-05 14:19:58 -07:00
Nick Hill	5ea5c514da	[BugFix] Increase timeout for startup failure test (#17642 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-05 20:53:19 +00:00
Jinzhen Lin	1d0c9d6b2d	[Kernel] some optimizations for dense marlin and moe marlin (#16850 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>	2025-05-05 09:39:30 -07:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Isotr0py	f66f1e0fa3	[Bugfix] Fix broken Qwen2.5-omni tests (#17613 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-03 17:08:14 +00:00
Cyrus Leung	887d7af882	[Core] Gate `prompt_embeds` behind a feature flag (#17607 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-04 00:19:20 +08:00
Richard Zou	b90b0852e9	[easy] Print number of needed GPUs in skip message (#17594 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-02 15:27:43 -07:00
Caleb_Du	3e887d2e0c	permute/unpermute kernel for moe optimization (#14568 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-05-02 11:31:55 -07:00
Cyrus Leung	cb234955df	[Misc] Clean up input processing (#17582 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 08:11:53 -07:00
Cyrus Leung	99404f53c7	[Security] Fix image hash collision (#17378 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 08:36:39 -04:00
Harry Mellor	785d75a03b	Automatically tell users that dict args must be valid JSON in CLI (#17577 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-02 05:24:55 -07:00
Cyrus Leung	d7543862bd	[Misc] Rename assets for testing (#17575 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 03:29:25 -07:00
Robert Shaw	c777df79f7	[BugFix] Fix Memory Leak (#17567 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-02 01:07:03 -07:00
Andrew Sansom	cc2a77d7f1	[Core] [Bugfix] Add Input Embeddings (#15428 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Nan2018 <nan@protopia.ai> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 01:06:39 -07:00
Jerry Zhang	109e15a335	Add `pt_load_map_location` to allow loading to cuda (#16869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-05-01 23:23:42 -07:00
Cyrus Leung	f89d0e11bf	[Misc] Continue refactoring model tests (#17573 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 22:06:08 -07:00
Michael Goin	292fc59d61	[CI] Actually run tests/kv_transfer/test_disagg.py in CI (#17555 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-02 04:05:04 +00:00
Isotr0py	88c8304104	[Model] Refactor Ovis2 to support original tokenizer (#17537 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-01 11:00:53 -07:00
Sage Moore	460a2b1100	[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (#10867 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-05-01 07:59:28 -07:00
Chauncey	98060b001d	[Feature][Frontend]: Deprecate --enable-reasoning (#17452 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-01 06:46:16 -07:00
Huy Do	b74d888c63	Fix more broken speculative decode tests (#17450 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-05-01 06:05:58 -07:00
Cyrus Leung	48e925fab5	[Misc] Clean up test docstrings and names (#17521 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 05:19:32 -07:00
Russell Bryant	fbefc8a78d	[Core] Enable IPv6 with vllm.utils.make_zmq_socket() (#16506 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-01 09:38:18 +00:00
Noah Yoshida	13cf6b6236	[BugFix] fix speculative decoding memory leak when speculation is disabled (#15506 ) Signed-off-by: Noah Yoshida <noahcy117@gmail.com>	2025-04-30 23:28:17 -07:00
Cyrus Leung	afb4429b4f	[CI/Build] Reorganize models tests (#17459 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-30 23:03:08 -07:00
Michael Goin	aa4502e7f3	[CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg (#17500 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-30 21:03:30 -07:00
Michael Goin	17b4d85f63	[CI][TPU] Skip structured outputs+spec decode tests on TPU (#17510 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-30 20:36:20 -07:00
Siyuan Liu	dbc18e7816	[CI][TPU] Skip Multimodal test (#17488 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com>	2025-04-30 19:51:39 -07:00
Chen Zhang	81ecf425f0	[v1][Spec Decode] Make sliding window compatible with eagle prefix caching (#17398 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-30 18:25:53 +00:00
Russell Bryant	947f2f5375	[V1] Allow turning off pickle fallback in vllm.v1.serial_utils (#17427 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-04-30 16:10:54 +00:00
Alec	0be6d05b5e	[V1][Metrics] add support for kv event publishing (#16750 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-04-30 07:44:45 -07:00
Marko Rosenmueller	77073c77bc	[Core] Prevent side-channel attacks via cache salting (#17045 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-04-30 20:27:21 +08:00
Nicolò Lucchesi	a7d5b016bd	[TPU][V1][CI] Update regression test baseline for v6 CI (#17064 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-04-30 04:03:22 -07:00
Marco	54072f315f	[MODEL ADDITION] Ovis2 Model Addition (#15826 ) Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-04-30 07:33:29 +00:00
Huy Do	88fcf00dda	Fix some speculative decode tests with tl.dot (#17371 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-04-29 19:41:02 -07:00
Harry Mellor	13698db634	Improve configs - `ModelConfig` (#17130 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-30 10:38:22 +08:00
Gabriel Marinho	1c2bc7ead0	Truncation control for embedding models (#14776 ) Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-04-30 09:24:57 +08:00
Benjamin Chislett	34120f5acd	[V1][Feature] Enable Speculative Decoding with Structured Outputs (#14702 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-04-30 00:02:10 +00:00
Harry Mellor	7489ec0bab	Remove Bamba 9B from CI (#17407 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 21:10:31 +00:00
Harry Mellor	0350809f3a	Remove Falcon3 2x7B from CI (#17404 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:52:25 +00:00
Harry Mellor	a6977dbd15	Simplify (and fix) passing of guided decoding backend options (#17008 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 19:02:23 +00:00
mofanke	a39203f99e	[Bugfix] add qwen3 reasoning-parser fix content is None when disable … (#17369 ) Signed-off-by: mofanke <mofanke@gmail.com>	2025-04-29 16:32:40 +00:00
Harry Mellor	2ef5d106bb	Improve literal dataclass field conversion to argparse argument (#17391 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-29 16:25:08 +00:00
Cyrus Leung	88ad9ec6b2	[Frontend] Support `chat_template_kwargs` in `LLM.chat` (#17356 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-29 22:03:35 +08:00
Cyrus Leung	00ee37efa2	[Bugfix] Clean up MiniMax-VL and fix processing (#17354 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-29 20:42:16 +08:00
Jee Jee Li	890f104cdf	[Doc] Fix QWen3MOE info (#17381 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-04-29 12:38:32 +00:00
ponix-j	bdb2cddafc	[Misc]Use a platform independent interface to obtain the device attributes (#17100 )	2025-04-29 06:59:13 +00:00
qscqesze	cde384cd92	[Model] support MiniMax-VL-01 model (#16328 ) Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-04-29 12:05:50 +08:00
Michał Moskal	86d9fc29cb	implement Structural Tag with Guidance backend (#17333 ) Signed-off-by: Michal Moskal <michal@moskal.me>	2025-04-29 02:21:32 +00:00
Harry Mellor	b6dd32aa07	Make name of `compressed-tensors` quant method consistent across vLLM (#17255 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-28 16:28:13 +00:00

... 2 3 4 5 6 ...

2014 Commits