youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Russell Bryant	78aa341d12	[CI] Fix race condition in test_kv_cache_events test (#18169 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 16:27:48 -07:00
Jerry Zhang	7974736740	Add support for loading torchao models with `AOPerModuleConfig` (#17826 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-05-14 16:24:59 -07:00
Aaron Pham	2fc9075b82	[V1] Structured Outputs + Thinking compatibility (#16577 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 15:45:24 -07:00
Lucas Wilkinson	d93c976a0d	[Kernel] Have rotary embeddings support tensors (#18046 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-14 15:43:55 -07:00
David Xia	749f792553	[Frontend] decrease import time of vllm.multimodal (#18031 ) Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>	2025-05-14 15:43:32 -07:00
Robert Shaw	856865008e	[CI] Disable Failing Tests (#18165 )	2025-05-14 13:49:56 -07:00
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Ekagra Ranjan	418d2f8bfb	[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326 ) Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-14 12:31:46 -07:00
Chen Zhang	964472b966	[Doc] Update prefix cache metrics to counting tokens (#18138 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-14 15:23:30 +00:00
Nick Hill	59dd311cf5	[KVConnector] Keep KVTransferParams as a dict (#18033 )	2025-05-14 08:05:57 -07:00
Cyrus Leung	d066e52013	[Bugfix] Fix chat utils tests (#18139 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 05:38:21 -07:00
Harry Mellor	c8ea982d9b	Update deprecated type hinting in `platform`, `plugins`, `triton_utils`, `vllm_flash_attn` (#18129 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-14 05:28:16 -07:00
Harry Mellor	dc372b9c8a	Update deprecated type hinting in `vllm/device_allocator` and `vllm/distributed` (#18126 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-14 04:07:57 -07:00
Harry Mellor	9b5b39b650	Update deprecated type hinting in `vllm/lora` (#18128 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-14 03:57:59 -07:00
Reid	9ccc6ded42	[doc] add missing import (#18133 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-14 10:57:34 +00:00
Cyrus Leung	d62a076e84	[Model] GritLM supports other attention backends (#18109 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 03:33:19 -07:00
Jee Jee Li	259127f8b8	[Bugfix] Fix LoRA test (#18123 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-14 10:25:47 +00:00
TJian	612c2edb4f	[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support (#17110 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-14 03:03:11 -07:00
Andrzej Kotłowski	38fe728d60	[Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile (#17844 ) Signed-off-by: Andrzej Kotłowski <akotlowski@habana.ai>	2025-05-14 09:39:51 +00:00
rongfu.leng	82e7f9bb03	[Misc] replace does not exist model (#18119 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-05-14 02:13:47 -07:00
Jee Jee Li	63dc3426e0	[Model] Add packed_modules_mapping for Qwen3-MOE (#18118 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-14 02:13:19 -07:00
Cyrus Leung	8f5dc41481	[Bugfix] Fix entrypoints audio test failure (#18111 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-14 09:08:07 +00:00
wang.yuqi	63ad622233	[New Model]: support GTE NewModel (#17986 )	2025-05-14 01:31:31 -07:00
majianpeng	e7ef61c1f0	[Bugfix][Example] make lmcache v0 work. (#18051 ) Signed-off-by: Ma, Jianpeng <jianpeng.ma@intel.com>	2025-05-13 23:43:44 -07:00
Jinzhen Lin	d4154c35a2	[Bugfix] fix moe marlin `topk_weight` loading (#18080 ) Co-authored-by: mgoin <mgoin64@gmail.com>	2025-05-13 23:31:57 -07:00
lkchen	6685890d11	[Fix] Move "model_config" as keyword args in chat_utils.py (#18098 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-13 23:27:26 -07:00
Ecthlion_zyy	33011318c2	Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117 )	2025-05-13 23:19:14 -07:00
qli88	4f8b373225	[BugFix][AMD] Compatible patch for AITER lib after 04/20 (#17912 ) Signed-off-by: Qiang Li <qiang.li2@amd.com>	2025-05-13 23:05:20 -07:00
Charlie Fu	7b2f28deba	[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-13 22:13:56 -07:00
vllmellm	2d912fb66f	[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 (#17955 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 22:03:47 -07:00
Michael Goin	12e6c0b41c	[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig (#18086 )	2025-05-13 20:36:17 -07:00
Michael Goin	9a2a6357de	[Bugfix] Fix FP8 Marlin MoE and enable for compressed-tensors models (#18026 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-13 19:48:33 -07:00
youkaichao	6266c57bae	[core][distributed] add ep group and all2all interface (#18077 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-05-14 10:46:49 +08:00
Jon Gill	754b699cbe	[Bug]: Fix S3 model/tokenizer path resolution (#18083 ) Signed-off-by: Jon Gill <jon@yurts.ai>	2025-05-13 19:34:17 -07:00
Roger Wang	6e27c6d86b	[Misc] Remove unused numpy tensor (#18084 ) Signed-off-by: Roger Wang <hey@rogerw.me>	2025-05-13 19:33:40 -07:00
Nick Hill	d5af47a149	[P/D] Add some more debug logs to `NixlConnector` (#18102 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 19:33:03 -07:00
Pavani Majety	65f0f74b66	[Hardware/NVIDIA/Modelopt] Fix modelopt forward method for v1 torch.compile (#18101 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-05-13 19:33:00 -07:00
Luka Govedič	176a95c670	[Fix] Support CUDAGraph capture for encoder-decoder on ROCm (#18104 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-05-13 19:31:42 -07:00
Chen Zhang	f2ae883b67	[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager (#18001 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-13 19:09:39 -07:00
vllmellm	40de1ef455	[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature (#14968 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-13 19:08:20 -07:00
Russell Bryant	0189a65a2e	[Docs] Expand security doc with firewall info (#18081 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-13 19:36:00 +00:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Harry Mellor	0b217da646	Update deprecated type hinting in `vllm/adapter_commons` (#18073 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 08:32:51 -07:00
Harry Mellor	19324d660c	Update deprecated type hinting in `vllm/compilation` (#18072 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 08:32:48 -07:00
Harry Mellor	fc407a1425	Give auto-merge label workflow permission to add labels to issues (#18078 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 07:53:13 -07:00
Harry Mellor	009d9e7590	Convert `benchmarks` to `ruff format` (#18068 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 13:43:29 +00:00
Cyrus Leung	b922c2ebd2	[Bugfix] Fix entrypoints metrics tests (#18063 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-13 06:42:43 -07:00
Russell Bryant	00b14e0f16	[CI] set token permissions for pre-commit CI job (#17729 ) Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-05-13 13:38:30 +00:00
Russell Bryant	54e467e6f8	[CI] Add token permissions for add-ready-label CI job (#17730 ) Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-05-13 13:38:13 +00:00
Russell Bryant	79a1d25bbd	[CI] Add workflow permissions for helm CI job (#17727 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-05-13 12:49:07 +00:00

1 2 3 4 5 ...

6506 Commits