Commit Graph

1540 Commits

Author SHA1 Message Date
2bb0e1a799 [Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-17 11:33:35 +00:00
8d6cf89526 [V1] [Spec Decode] Support random sampling for spec decode (#13933)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-16 22:00:20 -07:00
a73e183e36 [Misc] Replace os environ to monkeypatch in test suite (#14516)
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-16 20:35:57 -07:00
8a5a9b70d7 [CI/Build] Update defaults for test reproducibility (#14893)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-17 10:38:15 +08:00
bb3aeddfaf [CI] Nightly Tests (#14898)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-03-17 02:06:43 +00:00
aecc780dba [V1] Enable Entrypoints Tests (#14903) 2025-03-16 17:56:16 -07:00
b9b5bdfc7d [Misc] Catching Ray Compiled Graph PP test failures for V1 (#14847) 2025-03-16 15:46:42 -07:00
fc1f67715d [BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-16 14:53:34 -07:00
d1ad2a57af [V1] [Spec Decode] Fix ngram tests (#14878) 2025-03-16 00:29:22 -07:00
def232e122 [VLM] Clean up Phi-4-MM ViT implementation (#14812)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-03-15 18:53:52 -07:00
61c6a5a796 [VLM] Merged multi-modal processor for Pixtral (#12211)
Signed-off-by: remi <remi@mistral.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-15 06:28:27 -07:00
74bc397b0a [Core] Expose API endpoint /is_sleeping (#14312)
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
2025-03-15 06:28:14 -07:00
3556a41434 [VLM] Limit multimodal input cache by memory (#14805)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-15 02:52:05 -07:00
e0fdfa1608 [CI/Build] Delete LoRA bias test (#14849)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-14 22:09:25 -07:00
5952d8ab61 [Attention] Get rid of mla cache alignment (#14842)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-15 05:08:25 +00:00
a2ae496589 [CPU] Support FP8 KV cache (#14741)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-03-14 22:07:36 -07:00
d4d93db2c5 [V1] V1 Enablement Oracle (#13726)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-03-14 22:02:20 -07:00
14f301b541 Update to torch==2.6.0 (#12721)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: luka <luka@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-14 16:58:30 -04:00
46f98893dd [V1] Fix model parameterization for structured output tests (#14833)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-14 20:55:18 +00:00
73deea2fdb [Frontend] track server_load (#13950) 2025-03-14 09:53:17 -07:00
613c5bb945 [Bugfix] Fix Aria test loading (#14823)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-14 09:11:23 -07:00
601bd3268e [Misc] Clean up type annotation for SupportsMultiModal (#14794)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-14 00:59:56 -07:00
0c2af17c76 [CI] Fix missing example model id in processor test (#14787)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-14 13:52:15 +08:00
d3d4956261 [Neuron] flatten test parameterization for neuron attention kernels (#14712) 2025-03-13 20:46:56 -07:00
0b1cfa6180 [Kernel] LoRA - Enable CUDAGraphs for V1 (#14626)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-13 20:42:04 -07:00
02fcaa3d0a [V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624)
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
2025-03-13 19:07:34 +00:00
8e9ffd37d6 [Misc] Clean up processor tests (#14771)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-13 18:25:37 +00:00
f53a0586b9 [Bugfix] Fix prompt format of GLM4V (#14539)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-13 11:37:17 +00:00
382403921f [VLM] Support pan-and-scan for Gemma3 multi-modal processor (#14672)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-13 02:23:12 -07:00
bd44b812cb [CI/Build] Delete ultravox LoRA test (#14730)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-13 07:57:39 +00:00
f5d3acd474 [BugFix][V1] Fix parallel sampling finishing/aborts (#14512)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-12 10:29:48 -07:00
916836bbfb [FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-03-12 09:31:19 -07:00
c0c25e25fa [Model] Add support for Gemma 3 (#14660)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-12 08:36:33 -07:00
ff47aab056 [CPU] Upgrade CPU backend to torch-2.6 (#13381)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-03-12 10:41:13 +00:00
debd6bbf09 [Kernel] Add ModelOpt FP4 Checkpoint Support (#12520)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-03-12 05:13:11 +00:00
5c538c37b2 [V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing (#14645)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-03-11 22:12:41 -07:00
e22ee1e7a2 [Kernel] GGUF MoE kernel (#14613)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
2025-03-12 03:33:27 +00:00
e392d85831 [Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization (#14545)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-11 20:12:52 -07:00
77a318bd01 [V1][Core] Support MistralTokenizer for Structured Output (#14625)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-12 10:40:09 +08:00
80e78d02ac [Model] Extend Ultravox to accept audio longer than 30s (#13631)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
2025-03-12 10:27:10 +08:00
47532cd9f4 [core][V1] pluggable scheduler (#14466)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-03-12 01:15:15 +00:00
4bf82d4b90 [V1] Add regex structured output support with xgrammar (#14590)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-11 23:03:44 +08:00
af295e9b01 [Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (#14609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-11 07:59:43 -07:00
a1c8f3796c dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
2025-03-11 10:54:56 -04:00
1fc973c0b5 [V1][Core] Fix memory issue with logits & sampling (#14508)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com>
2025-03-11 04:03:41 +00:00
c91b64f749 [neuron] add reshape_and_cache (#14391) 2025-03-10 18:37:29 -07:00
d6123170d5 [Neuron] Add Neuron device communicator for vLLM v1 (#14085) 2025-03-10 18:37:04 -07:00
5ff0d32580 [V1] LoRA - Add triton kernels for V1 (#13096)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-10 17:27:53 -04:00
3b352a2f92 Correct capitalisation: VLLM -> vLLM (#14562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 16:36:21 +00:00
89cdaa83e7 [Kernel] Add more dtype support for GGUF kernels (#14043)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
2025-03-10 07:30:04 -07:00