youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sage Moore	f8848bb201	misc fixes. lm_eval still gets a wrong answer but it no longer hangs Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-06-04 22:46:18 +00:00
Sage Moore	2e3484c237	debugging Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-06-03 19:25:01 +00:00
Sage Moore	18e7d6c7b8	Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing	2025-06-03 00:52:39 +00:00
Sage Moore	8332924320	dp format Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-06-02 19:15:23 +00:00
Sage Moore	8ea80fca4a	revert offline_inference/basic.py Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-06-02 18:05:48 +00:00
Sage Moore	21d9529a79	revert offline_inference/basic.py Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-06-02 18:05:26 +00:00
Nick Hill	9a1b9b99d7	[BugFix] Fix multi-node offline data-parallel (#18981 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-05-31 08:34:52 -07:00
Satyajith Chilappagari	2a50ef5760	[Neuron] Add Multi-Modal model support for Neuron (#18921 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com> Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com> Co-authored-by: FeliciaLuo <luof@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-31 10:39:11 +00:00
Sage Moore	62da375465	more fixes	2025-05-30 21:17:06 +00:00
Mark McLoughlin	0e98964e94	[V1][Metrics] Remove metrics that were deprecated in 0.8 (#18837 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-28 18:54:12 +00:00
Reid	435fa95444	[Frontend] add run batch to CLI (#18804 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-28 07:08:57 -07:00
wang.yuqi	3e9ce609bd	[Bugfix] Fix nomic max_model_len (#18755 )	2025-05-27 20:29:53 -07:00
Mark McLoughlin	06a0338015	[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-27 09:37:06 +00:00
Reid	fc6d0c290f	[Misc] improve docs (#18734 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-27 07:07:01 +00:00
Cyrus Leung	753944fa9b	[Doc] Update reproducibility doc and example (#18741 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-27 07:03:13 +00:00
Harry Mellor	27bebcd897	Convert `examples` to `ruff-format` (#18400 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-26 16:57:54 +00:00
Cyrus Leung	82e2339b06	[Doc] Move examples and further reorganize user guide (#18666 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 07:38:04 -07:00
AlexZhao	8820821b59	[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example (#18644 ) Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com> Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com> Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com>	2025-05-26 13:51:27 +08:00
Isotr0py	75f81750f3	[VLM] Initialize video input support for InternVL models (#18499 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-25 04:51:25 +00:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Lucas Wilkinson	18bf91e6a8	wip Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-05-23 03:31:49 +00:00
Chenheli Hua	04eb88dc80	Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-05-23 01:59:18 +00:00
Sanger Steel	c32e249a23	[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-05-22 18:44:18 -07:00
Kai Wu	c91fe7b1b9	[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917 ) Signed-off-by: Kai Wu <kaiwu@meta.com>	2025-05-22 16:44:08 -07:00
Lucas Wilkinson	9c60a6299d	tp1 working multistream tp > 1 broken Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-05-22 20:51:36 +00:00
Lucas Wilkinson	04f11d97a0	working but only on the same stream Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-05-22 20:51:36 +00:00
Lucas Wilkinson	ffb740ae95	manually manage stream Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-05-22 20:51:36 +00:00
Sage Moore	020269c4c5	added multhreading support Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-05-22 20:51:36 +00:00
Lucas Wilkinson	9ccfd094ff	fix dummy mode Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-05-22 20:51:35 +00:00
Lucas Wilkinson	f93bdd3151	support more args in dp example Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-05-22 20:51:35 +00:00
Lucas Wilkinson	df8f889f37	support MLA Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-05-22 20:51:35 +00:00
Lucas Wilkinson	37c9babaa0	enable naive microbatching Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-05-22 20:51:35 +00:00
Reid	cb506ecb5a	[Misc] improve Automatic Prefix Caching example (#18554 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-22 14:50:46 +00:00
Calvin Chen	3f505233fd	[Doc] Add stream flag for chat completion example (#18524 ) Signed-off-by: calvin chen <120380290@qq.com>	2025-05-22 14:07:10 +00:00
CYJiang	71075029f2	[Doc] Support --stream arg in openai_completion_client.py script (#18388 ) Signed-off-by: googs1025 <googs1025@gmail.com>	2025-05-22 13:20:17 +00:00
Reid	107f5fc4cb	[Misc] refactor disaggregated-prefill-v1 example (#18474 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-21 11:10:14 +00:00
Reid	8f55962a7f	[Misc] refactor prompt embedding examples (#18405 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-20 15:26:12 +00:00
Gong Shufan	8171221834	[Misc] Fix typo (#18330 )	2025-05-19 09:51:01 -07:00
Reid	27d0952600	[Misc] extract parser.parse_args() (#18323 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-19 04:06:26 +00:00
David Xia	5c04bb8b86	[doc] fix multimodal example script (#18089 ) Signed-off-by: David Xia <david@davidxia.com>	2025-05-16 06:05:34 +00:00
Lucia Fang	3d2779c29a	[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827 ) Signed-off-by: Lucia Fang <fanglu@fb.com>	2025-05-15 22:28:27 -07:00
Harry Mellor	51ff154639	Improve examples rendering in docs and GitHub (#18203 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-15 15:57:49 +00:00
omahs	a9944aabfa	fix: typos (#18151 ) Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>	2025-05-15 02:16:15 -07:00
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Ekagra Ranjan	418d2f8bfb	[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326 ) Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-14 12:31:46 -07:00
majianpeng	e7ef61c1f0	[Bugfix][Example] make lmcache v0 work. (#18051 ) Signed-off-by: Ma, Jianpeng <jianpeng.ma@intel.com>	2025-05-13 23:43:44 -07:00
Ecthlion_zyy	33011318c2	Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117 )	2025-05-13 23:19:14 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
Harry Mellor	72a3f6b898	Construct `KVTransferConfig` properly from Python instead of using JSON blobs without CLI (#17994 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-12 11:25:33 -07:00
Xu Wenqing	3a5ea75129	[Feature] Support DeepSeekV3 Function Call (#17784 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com> Signed-off-by: Xu Wenqing <xuwq1993@qq.com>	2025-05-12 00:45:21 -07:00

1 2 3 4 5 ...

462 Commits