youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Gregory Shtrasberg	a92842454c	[Bugfix][ROCm] Using device_type because on ROCm the API is still torch.cuda (#17601 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-02 22:25:47 -07:00
Tyler Michael Smith	c8386fa61d	[Build/CI] Upgrade CUTLASS to 3.9.1 (#17602 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-05-02 22:25:14 -07:00
Chenyaaang	87baebebd8	[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-02 21:42:44 -07:00
rasmith	e3d0a1d190	[Quantizaton] [AMD] Add support for running DeepSeek int8 w8a8 MoE on ROCm (#17558 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-05-02 21:41:10 -07:00
22quinn	d47b605eca	Update test requirements to CUDA 12.8 (#17576 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-05-02 21:40:15 -07:00
Liangfu Chen	22c6f6397f	[Neuron][Build] Require setuptools >= 77.0.3 for PEP 639 (#17603 ) Signed-off-by: Liangfu Chen <liangfc@amazon.com>	2025-05-03 02:41:59 +00:00
Kevin H. Luu	3ec97e2cc5	[release] Add command to clean up Docker containers/images in TPU release machine (#17606 )	2025-05-02 18:54:34 -07:00
Eric Hartford	9b103a1d76	fix typo in logging (#17605 )	2025-05-02 18:04:40 -07:00
Richard Zou	b90b0852e9	[easy] Print number of needed GPUs in skip message (#17594 ) Signed-off-by: rzou <zou3519@gmail.com>	2025-05-02 15:27:43 -07:00
Xiaodong Wang	9352cdb56d	[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263 ) Signed-off-by: Lu Fang <lufang@fb.com> Co-authored-by: Lu Fang <lufang@fb.com>	2025-05-02 19:44:19 +00:00
Zhiyu	182f40ea8b	Add NVIDIA TensorRT Model Optimizer in vLLM documentation (#17561 )	2025-05-02 11:36:46 -07:00
Caleb_Du	3e887d2e0c	permute/unpermute kernel for moe optimization (#14568 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-05-02 11:31:55 -07:00
Lucas Wilkinson	0f87d8f7b2	[BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (#17574 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-02 11:01:38 -07:00
Hui Liu	4c33d67321	[Bugfix] fix tmp_out and exp_sums dimensions (#17438 ) Signed-off-by: Hui Liu <96135754+hliuca@users.noreply.github.com>	2025-05-02 16:44:07 +00:00
Cyrus Leung	cb234955df	[Misc] Clean up input processing (#17582 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 08:11:53 -07:00
Reid	3a500cd0b6	[doc] miss result (#17589 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-02 07:04:49 -07:00
Michael Goin	868c546da4	Support W8A8 INT8 MoE for compressed-tensors (#16745 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-02 10:03:32 -04:00
Cyrus Leung	99404f53c7	[Security] Fix image hash collision (#17378 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 08:36:39 -04:00
Harry Mellor	785d75a03b	Automatically tell users that dict args must be valid JSON in CLI (#17577 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-02 05:24:55 -07:00
Reid	6d1479ca4b	[doc] add the print result (#17584 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-02 05:24:45 -07:00
Yang Wang	b8b0859b5c	add more pytorch related tests for torch nightly (#17422 ) Signed-off-by: Yang Wang <elainewy@meta.com>	2025-05-02 03:29:59 -07:00
Cyrus Leung	d7543862bd	[Misc] Rename assets for testing (#17575 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 03:29:25 -07:00
Robert Shaw	c777df79f7	[BugFix] Fix Memory Leak (#17567 ) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-05-02 01:07:03 -07:00
Andrew Sansom	cc2a77d7f1	[Core] [Bugfix] Add Input Embeddings (#15428 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Nan2018 <nan@protopia.ai> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 01:06:39 -07:00
Isotr0py	9e2de9b9e9	[Bugifx] Remove TritonPlaceholder from sys.modules (#17317 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-02 00:45:01 -07:00
Jerry Zhang	109e15a335	Add `pt_load_map_location` to allow loading to cuda (#16869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-05-01 23:23:42 -07:00
Michael Goin	f192ca90e6	Fix PixtralHF missing spatial_merge_size (#17571 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-01 22:14:09 -07:00
Cyrus Leung	f89d0e11bf	[Misc] Continue refactoring model tests (#17573 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 22:06:08 -07:00
Michael Goin	b4003d11fc	Check if bitblas is installed during support check (#17572 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-02 04:32:54 +00:00
Michael Goin	292fc59d61	[CI] Actually run tests/kv_transfer/test_disagg.py in CI (#17555 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-02 04:05:04 +00:00
Lucas Wilkinson	afcb3f8863	[Attention] MLA move o_proj q_proj into cuda-graph region (#17484 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-02 03:16:26 +00:00
David Xia	afb12e4294	[Doc] note that not all unit tests pass on CPU platforms (#17554 ) Signed-off-by: David Xia <david@davidxia.com>	2025-05-02 02:57:21 +00:00
Michael Goin	24aebae177	[Bugfix] Disable gptq_bitblas for <SM80 to fix GPTQ on V100/T4 (#17541 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-01 17:59:35 -07:00
qizixi	39c0813a7f	[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 (#17504 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-01 16:19:30 -07:00
Chenyaaang	9b70e2b4c1	[Misc][Tools][Benchmark] Publish script to auto tune server parameters (#17207 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-05-01 19:53:03 +00:00
Chen Xia	173daac19d	[Bug]change the position of cuda_graph_sizes in dataclasses (#17548 ) Signed-off-by: CXIAAAAA <cxia0209@gmail.com>	2025-05-01 11:52:37 -07:00
sstamenk	04f2cfc894	Remove duplicate code from dbrx.py (#17550 )	2025-05-01 11:51:58 -07:00
Juan Villamizar	811a6c0972	[ROCM] Add gfx950 to the custom attention archs (#16034 ) Signed-off-by: jpvillam <Juan.Villamizar@amd.com> Signed-off-by: seungrokjung <seungrok.jung@amd.com> Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: seungrokjung <seungrok.jung@amd.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-01 11:18:28 -07:00
Cyrus Leung	9b1769dd9a	[Bugfix] Fix lint error (#17547 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 11:12:19 -07:00
Chen Xia	61c299f81f	[Misc]add configurable cuda graph size (#17201 ) Signed-off-by: CXIAAAAA <cxia0209@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-01 11:04:50 -07:00
Hongxia Yang	4acfa3354a	[ROCm] update installation guide to include build aiter from source instructions (#17542 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-01 11:01:28 -07:00
Isotr0py	88c8304104	[Model] Refactor Ovis2 to support original tokenizer (#17537 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-01 11:00:53 -07:00
Harry Mellor	6768ff4a22	Move the last arguments in `arg_utils.py` to be in their final groups (#17531 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-01 10:31:44 -07:00
Cyrus Leung	f2e7af9b86	[CI/Build] Remove `awscli` dependency (#17532 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 09:20:54 -07:00
Reid	7423cf0a9b	[Misc] refactor example - cpu_offload_lmcache (#17460 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-01 15:05:24 +00:00
Sage Moore	460a2b1100	[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (#10867 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-05-01 07:59:28 -07:00
Hongxia Yang	28566d73b3	[ROCm] remove unsupported archs from rocm triton flash-attention supported list (#17536 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2025-05-01 07:54:25 -07:00
Chauncey	98060b001d	[Feature][Frontend]: Deprecate --enable-reasoning (#17452 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-05-01 06:46:16 -07:00
TJian	f5a3c655b2	[FEAT] [ROCm]: Add Qwen/Qwen3-235B-A22B-FP8 TP4 triton fused moe config (#17535 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-01 06:37:17 -07:00
Reid	7169f87ad0	[doc] add streamlit integration (#17522 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-01 13:34:02 +00:00

... 9 10 11 12 13 ...

6768 Commits