youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
vllmellm	2bb0e1a799	[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-17 11:33:35 +00:00
Lily Liu	8d6cf89526	[V1] [Spec Decode] Support random sampling for spec decode (#13933 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-16 22:00:20 -07:00
Sibi	a73e183e36	[Misc] Replace os environ to monkeypatch in test suite (#14516 ) Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-16 20:35:57 -07:00
Cyrus Leung	8a5a9b70d7	[CI/Build] Update defaults for test reproducibility (#14893 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-17 10:38:15 +08:00
Robert Shaw	bb3aeddfaf	[CI] Nightly Tests (#14898 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-03-17 02:06:43 +00:00
Robert Shaw	aecc780dba	[V1] Enable Entrypoints Tests (#14903 )	2025-03-16 17:56:16 -07:00
Rui Qiao	b9b5bdfc7d	[Misc] Catching Ray Compiled Graph PP test failures for V1 (#14847 )	2025-03-16 15:46:42 -07:00
Nick Hill	fc1f67715d	[BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-16 14:53:34 -07:00
Lily Liu	d1ad2a57af	[V1] [Spec Decode] Fix ngram tests (#14878 )	2025-03-16 00:29:22 -07:00
Isotr0py	def232e122	[VLM] Clean up Phi-4-MM ViT implementation (#14812 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-03-15 18:53:52 -07:00
Rémi Delacourt	61c6a5a796	[VLM] Merged multi-modal processor for Pixtral (#12211 ) Signed-off-by: remi <remi@mistral.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-15 06:28:27 -07:00
Jun Duan	74bc397b0a	[Core] Expose API endpoint `/is_sleeping` (#14312 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-03-15 06:28:14 -07:00
Cyrus Leung	3556a41434	[VLM] Limit multimodal input cache by memory (#14805 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-15 02:52:05 -07:00
Jee Jee Li	e0fdfa1608	[CI/Build] Delete LoRA bias test (#14849 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-14 22:09:25 -07:00
Lucas Wilkinson	5952d8ab61	[Attention] Get rid of mla cache alignment (#14842 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-15 05:08:25 +00:00
Li, Jiang	a2ae496589	[CPU] Support FP8 KV cache (#14741 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-03-14 22:07:36 -07:00
Robert Shaw	d4d93db2c5	[V1] V1 Enablement Oracle (#13726 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-03-14 22:02:20 -07:00
Michael Goin	14f301b541	Update to torch==2.6.0 (#12721 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: luka <luka@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-14 16:58:30 -04:00
Russell Bryant	46f98893dd	[V1] Fix model parameterization for structured output tests (#14833 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-14 20:55:18 +00:00
daniel-salib	73deea2fdb	[Frontend] track server_load (#13950 )	2025-03-14 09:53:17 -07:00
Cyrus Leung	613c5bb945	[Bugfix] Fix Aria test loading (#14823 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-14 09:11:23 -07:00
Cyrus Leung	601bd3268e	[Misc] Clean up type annotation for `SupportsMultiModal` (#14794 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-14 00:59:56 -07:00
Roger Wang	0c2af17c76	[CI] Fix missing example model id in processor test (#14787 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-03-14 13:52:15 +08:00
Liangfu Chen	d3d4956261	[Neuron] flatten test parameterization for neuron attention kernels (#14712 )	2025-03-13 20:46:56 -07:00
Varun Sundar Rabindranath	0b1cfa6180	[Kernel] LoRA - Enable CUDAGraphs for V1 (#14626 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-13 20:42:04 -07:00
afeldman-nm	02fcaa3d0a	[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>	2025-03-13 19:07:34 +00:00
Cyrus Leung	8e9ffd37d6	[Misc] Clean up processor tests (#14771 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-13 18:25:37 +00:00
Cyrus Leung	f53a0586b9	[Bugfix] Fix prompt format of GLM4V (#14539 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-13 11:37:17 +00:00
Cyrus Leung	382403921f	[VLM] Support pan-and-scan for Gemma3 multi-modal processor (#14672 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-03-13 02:23:12 -07:00
Jee Jee Li	bd44b812cb	[CI/Build] Delete ultravox LoRA test (#14730 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-13 07:57:39 +00:00
Nick Hill	f5d3acd474	[BugFix][V1] Fix parallel sampling finishing/aborts (#14512 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-12 10:29:48 -07:00
TJian	916836bbfb	[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-03-12 09:31:19 -07:00
Woosuk Kwon	c0c25e25fa	[Model] Add support for Gemma 3 (#14660 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-12 08:36:33 -07:00
Li, Jiang	ff47aab056	[CPU] Upgrade CPU backend to torch-2.6 (#13381 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-03-12 10:41:13 +00:00
Pavani Majety	debd6bbf09	[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-03-12 05:13:11 +00:00
Benjamin Chislett	5c538c37b2	[V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing (#14645 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-03-11 22:12:41 -07:00
Szymon Ożóg	e22ee1e7a2	[Kernel] GGUF MoE kernel (#14613 ) Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>	2025-03-12 03:33:27 +00:00
Isotr0py	e392d85831	[Core] Refactor `QKVCrossParallelLinear` implementation to support BNB 4-bit quantization (#14545 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-11 20:12:52 -07:00
Aaron Pham	77a318bd01	[V1][Core] Support MistralTokenizer for Structured Output (#14625 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-12 10:40:09 +08:00
Farzad Abdolhosseini	80e78d02ac	[Model] Extend Ultravox to accept audio longer than 30s (#13631 ) Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>	2025-03-12 10:27:10 +08:00
Joe Runde	47532cd9f4	[core][V1] pluggable scheduler (#14466 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-03-12 01:15:15 +00:00
Russell Bryant	4bf82d4b90	[V1] Add regex structured output support with xgrammar (#14590 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-03-11 23:03:44 +08:00
Cyrus Leung	af295e9b01	[Bugfix] Update `--hf-overrides` for `Alibaba-NLP/gte-Qwen2` (#14609 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-11 07:59:43 -07:00
Jeff Daily	a1c8f3796c	dynamic distpatch of fp8 kernels (#14245 ) Signed-off-by: Jeff Daily <jeff.daily@amd.com>	2025-03-11 10:54:56 -04:00
Roger Wang	1fc973c0b5	[V1][Core] Fix memory issue with logits & sampling (#14508 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com>	2025-03-11 04:03:41 +00:00
Liangfu Chen	c91b64f749	[neuron] add reshape_and_cache (#14391 )	2025-03-10 18:37:29 -07:00
gnovack	d6123170d5	[Neuron] Add Neuron device communicator for vLLM v1 (#14085 )	2025-03-10 18:37:04 -07:00
Varun Sundar Rabindranath	5ff0d32580	[V1] LoRA - Add triton kernels for V1 (#13096 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-10 17:27:53 -04:00
Harry Mellor	3b352a2f92	Correct capitalisation: `VLLM` -> `vLLM` (#14562 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-10 16:36:21 +00:00
Szymon Ożóg	89cdaa83e7	[Kernel] Add more dtype support for GGUF kernels (#14043 ) Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com> Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>	2025-03-10 07:30:04 -07:00

1 2 3 4 5 ...

1540 Commits