youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
yurhett	11c0198615	[Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading (#20682 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-07-11 20:52:43 -07:00
Li, Jiang	b1235c3e10	[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices (#20822 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-11 20:52:05 -07:00
Jee Jee Li	44d02f54db	[Misc] Restrict deep_gemm's log output (#20827 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-11 20:50:42 -07:00
Trevor Morris	a8593237c0	Add pynccl all-gatherv and reducescatterv (#20154 ) Signed-off-by: Trevor Morris <tmorris@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 18:59:23 -07:00
Ilya Markov	fc0f41d10a	Integration SM100 FlashInfer fused allreduce RMSNorm (#20691 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-07-11 18:58:15 -07:00
Wentao Ye	7b828e30d5	[CI Bug] Fix Async Engine, Inputs, Utils, Worker Test: 'State' object has no attribute 'enable_server_load_tracking' (#20845 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-11 18:57:24 -07:00
bigmoyan	5f0af36af5	Update kimi-k2 tool calling docs, enable unit tests (#20821 ) Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@msh.team>	2025-07-11 20:16:14 +00:00
Isotr0py	0d21b2664c	[Bugfix] Fix OOM in language generation test (#20814 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-11 11:21:52 -07:00
Nick Hill	9907fc4494	[Docs] Data Parallel deployment documentation (#20768 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-11 09:42:10 -07:00
Michael Goin	d47661f0cd	[Kernel] Basic tuned configs for NVFP4 CUTLASS dense GEMM (#20646 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-11 10:05:33 -06:00
Varun Sundar Rabindranath	53fa457391	[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-11 07:51:46 -07:00
Reid	6fb162447b	[doc] fix ordered list issue (#20819 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-11 06:49:46 -07:00
Li, Jiang	66177189c5	[Bugfix] Add missing field to TritonLanguagePlaceholder (#20812 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-11 05:25:11 -07:00
QiliangCui	b4f0b5f9aa	Temporarily suspend google/gemma-3-1b-it. (#20722 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-11 11:21:26 +00:00
Cyrus Leung	cbd14ed561	[Bugfix] Refactor `/invocations` to be task-agnostic (#20764 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-11 03:20:54 -07:00
Pavani Majety	7bd4c37ae7	[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (#19825 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: shuw <shuw@nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 09:23:23 +00:00
Jee Jee Li	8020e98c9f	[Quantization][1/N] MoE support BNB-Inflight Quantization (#20061 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-11 08:01:13 +00:00
Luka Govedič	762be26a8e	[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging (#20777 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com> Signed-off-by: luka <lgovedic@redhat.com>	2025-07-11 00:15:22 -07:00
Reid	6a9e6b2abf	[doc] fold long code block (#20795 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-10 23:16:41 -07:00
nopperl	5d09152ff1	[V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine (#20660 ) Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>	2025-07-11 05:53:31 +00:00
Luka Govedič	31d5c1797f	[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 04:56:28 +00:00
Ratnam Parikh	35514b682a	[XPU] XCCL support enabled in torch 2.8.0.dev nightly builds (#20705 ) Signed-off-by: ratnampa <ratnam.parikh@intel.com>	2025-07-10 20:39:52 -07:00
Wentao Ye	e2de455c34	[Feature] Integrate SM100 DeepGEMM support (#20087 )	2025-07-10 20:18:05 -07:00
Alexander Matveev	5b032352cc	[Attention] MLA - Flashinfer Ragged Prefill (#20034 )	2025-07-10 20:17:47 -07:00
Michael Goin	922f316441	[Model] Support HF format of minimax (#20211 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-11 02:55:21 +00:00
Duncan Moss	5923ab9524	[fix]: disable cutlass block scaled group gemm for EP (#20781 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com>	2025-07-11 02:39:18 +00:00
bigmoyan	0cf893cae1	Add kimi-k2 tool parser (#20789 ) Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn> Co-authored-by: wangzhengtao <wangzhengtao@msh.team>	2025-07-11 10:36:23 +08:00
Michael Goin	cf75cd2098	[CI Bugfix] Specify same TORCH_CUDA_ARCH_LIST for flashinfer aot and install (#20772 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-11 01:16:01 +00:00
Simon Mo	b854321ffe	[Docs] Lazy import gguf (#20785 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-07-10 16:06:37 -07:00
Kuntai Du	5b6fe23d05	[Bugfix][Benchmark] Make sure the output length > 0 when testing prefill workload. (#20786 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-10 14:52:46 -07:00
Varun Sundar Rabindranath	f0c98cae27	[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce (#20648 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-10 14:40:38 -07:00
Nick Hill	574ad60db9	[KVConnector] Always call connector `clear_metadata()` at end of step (#20756 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: David Ben-David <sdavidbd@gmail.com>	2025-07-10 22:37:27 +01:00
Varun Sundar Rabindranath	fdadb6f43a	[Bugfix] Fused MoE Modular Kernel chunking loop (#20392 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-10 20:31:10 +00:00
Alex Brooks	41060c6e08	[Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] (#19126 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-07-10 21:09:37 +01:00
Ming Yang	3de2ed767f	[Bugfix] Remove assertion of expert_map being None (#20714 ) Signed-off-by: Ming Yang <yming@meta.com> Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-07-10 19:55:22 +00:00
Wentao Ye	299252ea82	[CI] Fix pre commit issue (#20782 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-10 12:48:13 -07:00
Nathan Hoos	d6902ce79f	[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975 ) Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>	2025-07-10 15:30:26 -04:00
Sanger Steel	5e53c89a74	[Bugfix] [CI] Fix Tensorizer LoRA test (#20760 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-07-10 19:07:06 +00:00
QiliangCui	c66e38ea4c	[Test] Remove docker build from test. (#20542 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-07-10 11:21:58 -07:00
sfbemerk	251595368f	Fix DeepSeek-R1-0528 chat template (#20717 ) Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com> Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com>	2025-07-10 17:47:36 +00:00
shineran96	4bed167768	[Model][VLM] Support JinaVL Reranker (#20260 ) Signed-off-by: shineran96 <shinewang96@gmail.com>	2025-07-10 10:43:43 -07:00
Asher	b140416abf	[Model] Add reason parser for Hunyuan A13B Model. (#20625 ) Signed-off-by: Asher Zhang <asherszhang@tencent.com>	2025-07-10 16:33:26 +00:00
Gregory Shtrasberg	5b8366b61a	[ROCm][Regression] Remove tensor creation that harms performance on ROCm (#20741 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-07-10 09:22:23 -07:00
nishith-fujitsu	c7753a9809	[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (#14129 ) Signed-off-by: nishith-fujitsu <nishith.jaiswal@fujitsu.com>	2025-07-10 15:59:04 +00:00
Michael Goin	4b9a9435bb	Update Dockerfile FlashInfer to v0.2.8rc1 (#20718 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-10 08:09:02 -07:00
Harry Mellor	3482fd7e4e	[Doc] Add engine args back in to the docs (#20674 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-10 08:02:40 -07:00
Isotr0py	77f77a951e	[Misc] Clean up mark to fork process in BNB tests (#20692 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-10 13:59:40 +00:00
Michael Goin	1a4f35e2ea	Normalize lm-eval command between baseline and correctness test (#18560 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-10 13:27:32 +00:00
Michael Goin	be1e128dfb	[CI Bugfix] Skip failing Tensorizer+LoRA test (#20724 )	2025-07-10 21:15:03 +09:00
Reid	65393ee064	[doc] fix ordered list (#20749 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-10 03:13:52 -07:00

1 2 3 4 5 ...

7646 Commits