youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Yuqi Zhang	d0bc2f810b	[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430 ) Signed-off-by: Yuqi Zhang <yuqizhang@google.com> Co-authored-by: Yuqi Zhang <yuqizhang@google.com>	2025-05-23 01:41:37 -07:00
Chauncey	b046cf792d	[Feature][V1]: suupports cached_tokens in response usage (#18149 ) Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-05-23 01:41:03 -07:00
Michael Goin	54af915949	[Doc] Update quickstart and install for cu128 using `--torch-backend=auto` (#18505 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-23 08:36:37 +00:00
cascade	71ea614d4a	[Feature]Add async tensor parallelism using compilation pass (#17882 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-23 01:03:34 -07:00
RonaldBXu	4c611348a7	[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034 ) Signed-off-by: Ronald Xu <ronaldxu@amazon.com>	2025-05-23 00:37:18 -07:00
Ning Xie	60cad94b86	[Hardware] correct method signatures for HPU,ROCm,XPU (#18551 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-22 22:31:59 -07:00
Shanshan Shen	9c1baa5bc6	[Misc] Replace `cuda` hard code with `current_platform` (#16983 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-05-23 04:38:50 +00:00
Teruaki Ishizaki	4be2255c81	[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (#17291 ) Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>	2025-05-23 12:30:47 +08:00
aws-elaineyz	ed5d408255	[Neuron] Remove bypass on EAGLEConfig and add a test (#18514 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-22 21:26:32 -07:00
Benjamin Chislett	583507d130	[Spec Decode] Make EAGLE3 draft token ID mapping optional (#18488 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-22 20:17:39 -07:00
lkchen	e44d8ce8c7	[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576 ) Signed-off-by: Linkun Chen <github@lkchen.net>	2025-05-23 02:54:42 +00:00
Nick Hill	93ecb8139c	[BugFix] Increase TP execute_model timeout (#18558 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-23 10:22:11 +08:00
CYJiang	fae453f8ce	[Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs (#18482 ) Signed-off-by: googs1025 <googs1025@gmail.com>	2025-05-23 10:15:32 +08:00
Harry Mellor	4b0da7b60e	Enable hybrid attention models for Transformers backend (#18494 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 10:12:08 +08:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
Chenheli Hua	04eb88dc80	Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-05-23 01:59:18 +00:00
rasmith	46791e1b4b	[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-05-22 18:45:35 -07:00
Sanger Steel	c32e249a23	[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-05-22 18:44:18 -07:00
Kai Wu	c91fe7b1b9	[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917 ) Signed-off-by: Kai Wu <kaiwu@meta.com>	2025-05-22 16:44:08 -07:00
Ekagra Ranjan	a04720bc36	[V1][Spec Decode][Bugfix] Load quantize weights for EAGLE (#18290 )	2025-05-22 15:17:33 -07:00
lkchen	7b9d832c80	[Tool] Add NIXL installation script (#18172 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-22 14:33:16 -07:00
Tyler Michael Smith	6e588da0f4	[Build/CI] Fix CUDA 11.8 build (#17679 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-22 12:13:54 -07:00
Mengqing Cao	f8d2cc5f55	[Compile][Platform] Make PiecewiseBackend pluggable and extendable (#18076 ) Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-05-22 12:11:53 -07:00
wangxiyuan	721fb9b181	[Platform] Move platform check to right place (#18470 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-22 12:11:28 -07:00
David Xia	1f3a1200e4	[Bugfix] make `test_openai_schema.py` pass (#18224 ) Signed-off-by: David Xia <david@davidxia.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 18:34:06 +00:00
Lukas Geiger	54631f8262	[Misc] Call `ndarray.tobytes()` directly instead of `ndarray.data.tobytes()` (#18347 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-05-22 09:00:13 -07:00
Reid	cb506ecb5a	[Misc] improve Automatic Prefix Caching example (#18554 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-05-22 14:50:46 +00:00
Li, Jiang	93f71673ce	[BugFix][CPU] Fix x86 SHM distributed module initialization (#18536 ) Signed-off-by: jiang.li <jiang1.li@intel.com>	2025-05-22 07:35:00 -07:00
Calvin Chen	3f505233fd	[Doc] Add stream flag for chat completion example (#18524 ) Signed-off-by: calvin chen <120380290@qq.com>	2025-05-22 14:07:10 +00:00
Bowen Wang	4e04eceb58	[Bugfix] Use random hidden states in dummy sampler run (#18543 ) Signed-off-by: Bowen Wang <abmfy@icloud.com>	2025-05-22 06:48:56 -07:00
CYJiang	71075029f2	[Doc] Support --stream arg in openai_completion_client.py script (#18388 ) Signed-off-by: googs1025 <googs1025@gmail.com>	2025-05-22 13:20:17 +00:00
Harry Mellor	ca86a7cf6e	[CI/Build] Update bamba test model location (#18544 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-22 06:01:07 -07:00
lkchen	a35a494745	[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (#18513 ) Signed-off-by: Linkun <github@lkchen.net>	2025-05-22 05:24:43 -07:00
燃	f6037d1907	[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18526 ) Co-authored-by: 松灵 <wpf272043@alibaba-inc.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-22 05:22:53 -07:00
aws-elaineyz	fa72f9a812	Order sequence ids + config update to support specifying custom quantization layers (#18279 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Tailin Pan <tailinpa@amazon.com> Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Maxwell Goldberg <mgld@amazon.com> Co-authored-by: Aakash Shetty <sheaak@amazon.com>	2025-05-22 02:20:36 -07:00
aws-elaineyz	ebed81fbf5	Update default neuron config for speculation (#18274 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Aakash Shetty <sheaak@amazon.com>	2025-05-22 02:18:55 -07:00
Satyajith Chilappagari	e2d7d31244	[Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) (#18512 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-22 02:17:34 -07:00
Cyrus Leung	23b67b37b2	[Doc] Fix invalid JSON in example args (#18527 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-22 07:11:46 +00:00
Jee Jee Li	db5a29ba19	[Bugfix] Fix LoRA test (#18518 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-21 21:48:53 -07:00
Shane A	51797775c3	[Bugfix][Model] Make Olmo2Model weight loading return loaded weights (#18504 ) Signed-off-by: Shane A <shanea@allenai.org>	2025-05-21 21:17:03 -07:00
Nick Hill	cf5984b2fe	[BugFix][DP] Send DP wave completion only from `dp_rank==0` (#18502 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: kourosh hakhamaneshi <kourosh@anyscale.com>	2025-05-21 20:25:25 -07:00
youngrok cha	d022115cc6	[Bugfix] Inconsistent token calculation compared to HF in llava family (#18479 ) Signed-off-by: jaycha <jaycha@ncsoft.com>	2025-05-21 20:21:47 -07:00
Rabi Mishra	acb54ca8e1	Intialize io_thread_pool attribute in the beginning. (#18331 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-21 20:21:14 -07:00
Russell Bryant	6e0fd34d3c	[CI] Fix race condition with StatelessProcessGroup.barrier (#18506 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-21 20:19:13 -07:00
Ning Xie	176d62e4ea	[MISC] update project urls in pyproject.toml (#18519 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-21 20:17:34 -07:00
Dhia Eddine Rhaiem	20bd6f4d2e	[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) (#18500 ) Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>	2025-05-21 19:23:59 -07:00
Sebastian Schoennenbeck	1f079540db	[Bugfix] Consistent ascii handling in tool parsers (#17704 ) Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>	2025-05-21 20:41:23 +00:00
vllmellm	94d8ec8d2b	[FEAT][ROCm] Upgrade AITER MLA v1 backend (#18338 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-05-21 10:34:28 -07:00
Mark McLoughlin	bb0a311213	Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945 ) (#18459 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-21 10:25:23 -07:00
Hosang	dd5fa7e04f	[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004 ) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>	2025-05-21 08:35:00 -07:00

1 2 3 4 5 ...

6768 Commits