youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Mengqing Cao	238dfc8ac3	[MISC] tiny fixes (#13378 )	2025-02-17 00:57:13 -08:00
Huy Do	45186834a0	Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-17 08:16:32 +00:00
yankooo	f857311d13	Fix spelling error in index.md (#13369 )	2025-02-17 06:53:20 +00:00
shangmingc	46cdd59577	[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-16 19:32:26 -08:00
Jee Jee Li	2010f04c17	[V1][Misc] Avoid unnecessary log output (#13289 )	2025-02-16 19:26:24 -08:00
Woosuk Kwon	69e1d23e1e	[V1][BugFix] Clean up rejection sampler & Fix warning msg (#13362 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-16 12:25:29 -08:00
Isotr0py	d67cc21b78	[Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend edge case (#13358 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-02-16 18:55:27 +00:00
Woosuk Kwon	e18227b04a	[V1][PP] Cache Intermediate Tensors (#13353 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-16 10:02:27 -08:00
Woosuk Kwon	7b89386553	[V1][BugFix] Add __init__.py to v1/spec_decode/ (#13359 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-16 09:39:08 -08:00
凌	da833b0aee	[Docs] Change myenv to vllm. Update python_env_setup.inc.md (#13325 )	2025-02-16 16:04:21 +00:00
Cyrus Leung	5d2965b7d7	[Bugfix] Fix 2 Node and Spec Decode tests (#13341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-16 22:20:22 +08:00
youkaichao	a0231b7c25	[platform] add base class for communicators (#13208 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-16 22:14:22 +08:00
youkaichao	124776ebd5	[ci] skip failed tests for flashinfer (#13352 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-16 22:09:15 +08:00
Roger Wang	b7d309860e	[V1] Update doc and examples for H2O-VL (#13349 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-16 10:35:54 +00:00
wchen61	dc0f7ccf8b	[BugFix] Enhance test_pos_encoding to support execution on multi-devices (#13187 ) Signed-off-by: wchen61 <wchen61@foxmail.com>	2025-02-16 08:59:49 +00:00
Michael Goin	d3d547e057	[Bugfix] Pin xgrammar to 0.1.11 (#13338 )	2025-02-15 19:42:25 -08:00
Kyle Sayers	12913d17ba	[Quant] Add `SupportsQuant` to phi3 and clip (#13104 )	2025-02-15 19:28:33 -08:00
Lily Liu	80f63a3966	[V1][Spec Decode] Ngram Spec Decode (#12193 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-15 18:05:11 -08:00
Cyrus Leung	367cb8ce8c	[Doc] [2/N] Add Fuyu E2E example for multimodal processor (#13331 )	2025-02-15 07:06:23 -08:00
youkaichao	54ed913f34	[ci/build] update flashinfer (#13323 )	2025-02-15 05:33:13 -08:00
Cody Yu	9206b3d7ec	[V1][PP] Run engine busy loop with batch queue (#13064 )	2025-02-15 03:59:01 -08:00
rasmith	ed0de3e4b8	[AMD] [Model] DeepSeek tunings (#13199 )	2025-02-15 03:58:09 -08:00
Mark McLoughlin	2ad1bc7afe	[V1][Metrics] Add iteration_tokens_total histogram from V0 (#13288 )	2025-02-15 03:56:19 -08:00
Isotr0py	7fdaaf48ef	[Bugfix] Fix qwen2.5-vl image processor (#13286 )	2025-02-15 03:00:11 -08:00
Xu Song	067fa2255b	[Bugfix]Fix search start_index of stop_checker (#13280 )	2025-02-14 21:39:42 -08:00
Nick Hill	9076325677	[BugFix] Don't scan entire cache dir when loading model (#13302 )	2025-02-14 21:33:31 -08:00
Tyler Michael Smith	97a3d6d995	[Bugfix] Massage MLA's usage of flash attn for RoCM (#13310 )	2025-02-14 21:33:25 -08:00
Nicolò Lucchesi	579d7a63b2	[Bugfix][Docs] Fix offline Whisper (#13274 )	2025-02-14 21:32:37 -08:00
Sage Moore	c9f9d5b397	[Bugfix][AMD] Update torch_bindings so that scaled_fp4_quant isn't build on ROCm (#13235 )	2025-02-14 20:30:42 -08:00
Woosuk Kwon	0c73026844	[V1][PP] Fix memory profiling in PP (#13315 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-14 20:17:25 -08:00
Nick Hill	6a854c7a2b	[V1][Sampler] Don't apply temp for greedy-only (#13311 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-14 18:10:53 -08:00
Woosuk Kwon	e7eea5a520	[V1][CI] Fix failed v1-test because of min_p (#13316 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-14 17:29:51 -08:00
Aoyu	a12934d3ec	[V1][Core] min_p sampling support (#13191 ) Signed-off-by: Aoyu <aoyuzhan@amazon.com> Co-authored-by: Aoyu <aoyuzhan@amazon.com>	2025-02-14 15:50:05 -08:00
Joe Runde	3bcb8c75da	[Core] Reduce TTFT with concurrent partial prefills (#10235 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-14 15:36:07 -08:00
Michael Goin	5e5c8e091e	[Quant][Perf] Use moe_wna16 kernel by default for MoEs with many experts (#13236 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-14 12:53:42 -08:00
Yu-Zhou	c9e2d644e7	[Hardware][Gaudi][Bugfix] Fix error for guided decoding (#12317 )	2025-02-14 04:36:49 -08:00
Russell Bryant	7734e9a291	[Core] choice-based structured output with xgrammar (#12632 )	2025-02-14 04:36:05 -08:00
Lu Fang	6224a9f620	Support logit_bias in v1 Sampler (#13079 )	2025-02-14 04:34:59 -08:00
Nick Hill	085b7b2d6c	[V1] Simplify GPUModelRunner._update_states check (#13265 )	2025-02-14 04:33:43 -08:00
Cyrus Leung	4da1f667e9	[VLM] Keep track of whether prompt replacements have been applied (#13215 )	2025-02-14 04:20:46 -08:00
Jun Duan	556ef7f714	[Misc] Log time consumption of sleep and wake-up (#13115 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-02-14 20:10:21 +08:00
Xu Song	83481ceb49	[Bugfix] Fix missing parentheses (#13263 )	2025-02-14 01:07:10 -08:00
Pooya Davoodi	185cc19f92	[Frontend] Optionally remove memory buffer used for uploading to URLs in run_batch (#12927 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2025-02-14 08:22:42 +00:00
Alexander Matveev	45f90bcbba	[WIP] TPU V1 Support Refactored (#13049 )	2025-02-14 00:21:53 -08:00
Kero Liang	b0ccfc565a	[Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch (#13126 )	2025-02-13 22:39:20 -08:00
Sage Moore	ba59b78a9c	[ROCm][V1] Add intial ROCm support to V1 (#12790 )	2025-02-13 22:21:50 -08:00
Varun Sundar Rabindranath	cbc40128eb	[V1] LoRA - Enable Serving Usecase (#12883 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-14 14:21:12 +08:00
Michael Goin	f0b2da72a8	Expand MLA to support most types of quantization (#13181 )	2025-02-13 22:19:22 -08:00
Harry Mellor	f2b20fe491	Consolidate Llama model usage in tests (#13094 )	2025-02-13 22:18:03 -08:00
Wang Ran (汪然)	40932d7a05	[Misc] Remove redundant statements in scheduler.py (#13229 )	2025-02-13 22:07:25 -08:00

1 2 3 4 5 ...

4638 Commits