|
|
1f1b1bc03b
|
[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-27 04:40:28 +00:00 |
|
|
|
82e2339b06
|
[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 07:38:04 -07:00 |
|
|
|
5a2c76cbe1
|
[CI] fix dump_input for str type (#18697)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-26 18:23:35 +08:00 |
|
|
|
38b13dfe78
|
[CI/Build] Replace math.isclose with pytest.approx (#18703)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-26 02:05:17 -07:00 |
|
|
|
4ea62c0ea0
|
[CI] add missing argument (#18694)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-26 00:22:04 -07:00 |
|
|
|
fba0642704
|
[CI/Build][Doc] Update gte-Qwen2-1.5B-instruct usage (#18683)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-05-25 20:27:50 -07:00 |
|
|
|
57fd13a707
|
[Bugfix] Fix profiling dummy data for Pixtral (#18677)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-25 14:05:30 +00:00 |
|
|
|
63934543a0
|
Speed up the kernels/quantization/ tests (#18669)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-25 05:02:59 +00:00 |
|
|
|
75f81750f3
|
[VLM] Initialize video input support for InternVL models (#18499)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-25 04:51:25 +00:00 |
|
|
|
6ab681bcbe
|
[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (#18655)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-05-25 04:51:21 +00:00 |
|
|
|
c1e4a4052d
|
[V1][Spec Decode] Support multi-layer eagle draft model (#18030)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-05-24 09:45:34 +00:00 |
|
|
|
a859320575
|
[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) (#18647)
|
2025-05-24 09:15:36 +00:00 |
|
|
|
d55e446d13
|
[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (#18424)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-05-24 06:51:22 +00:00 |
|
|
|
2b10ba7491
|
[Bugfix][Nixl] Fix Preemption Bug (#18631)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-05-23 23:30:16 +00:00 |
|
|
|
4fc1bf813a
|
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454)
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com>
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>
|
2025-05-23 16:16:26 -07:00 |
|
|
|
0ddf88e16e
|
[CI] Enable test_initialization to run on V1 (#16736)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-23 15:09:44 -07:00 |
|
|
|
6550114c9c
|
[v1] Redo "Support multiple KV cache groups in GPU model runner (#17945)" (#18593)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-23 09:39:47 -07:00 |
|
|
|
cd821ea5d2
|
[CI] fix kv_cache_type argument (#18594)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-05-23 04:49:18 -07:00 |
|
|
|
b046cf792d
|
[Feature][V1]: suupports cached_tokens in response usage (#18149)
Co-authored-by: simon-mo <xmo@berkeley.edu>
|
2025-05-23 01:41:03 -07:00 |
|
|
|
71ea614d4a
|
[Feature]Add async tensor parallelism using compilation pass (#17882)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-23 01:03:34 -07:00 |
|
|
|
ed5d408255
|
[Neuron] Remove bypass on EAGLEConfig and add a test (#18514)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-22 21:26:32 -07:00 |
|
|
|
e44d8ce8c7
|
[Bugfix] Set KVTransferConfig.engine_id in post_init (#18576)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-05-23 02:54:42 +00:00 |
|
|
|
4b0da7b60e
|
Enable hybrid attention models for Transformers backend (#18494)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-23 10:12:08 +08:00 |
|
|
|
c6b636f9fb
|
[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-23 02:05:44 +00:00 |
|
|
|
04eb88dc80
|
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-05-23 01:59:18 +00:00 |
|
|
|
46791e1b4b
|
[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-05-22 18:45:35 -07:00 |
|
|
|
c32e249a23
|
[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-05-22 18:44:18 -07:00 |
|
|
|
c91fe7b1b9
|
[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917)
Signed-off-by: Kai Wu <kaiwu@meta.com>
|
2025-05-22 16:44:08 -07:00 |
|
|
|
6e588da0f4
|
[Build/CI] Fix CUDA 11.8 build (#17679)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-22 12:13:54 -07:00 |
|
|
|
1f3a1200e4
|
[Bugfix] make test_openai_schema.py pass (#18224)
Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-22 18:34:06 +00:00 |
|
|
|
ca86a7cf6e
|
[CI/Build] Update bamba test model location (#18544)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-22 06:01:07 -07:00 |
|
|
|
a35a494745
|
[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (#18513)
Signed-off-by: Linkun <github@lkchen.net>
|
2025-05-22 05:24:43 -07:00 |
|
|
|
fa72f9a812
|
Order sequence ids + config update to support specifying custom quantization layers (#18279)
Signed-off-by: Elaine Zhao <elaineyz@amazon.com>
Co-authored-by: Tailin Pan <tailinpa@amazon.com>
Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com>
Co-authored-by: Yishan McNabb <yishanm@amazon.com>
Co-authored-by: Patrick Lange <patlange@amazon.com>
Co-authored-by: Maxwell Goldberg <mgld@amazon.com>
Co-authored-by: Aakash Shetty <sheaak@amazon.com>
|
2025-05-22 02:20:36 -07:00 |
|
|
|
db5a29ba19
|
[Bugfix] Fix LoRA test (#18518)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-21 21:48:53 -07:00 |
|
|
|
6e0fd34d3c
|
[CI] Fix race condition with StatelessProcessGroup.barrier (#18506)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-21 20:19:13 -07:00 |
|
|
|
bb0a311213
|
Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945) (#18459)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-21 10:25:23 -07:00 |
|
|
|
dd5fa7e04f
|
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
|
2025-05-21 08:35:00 -07:00 |
|
|
|
c6c10ca920
|
[Bugfix] Reduce moe_sum test size to avoid OOM (#18484)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-21 06:46:39 -07:00 |
|
|
|
eca18691d2
|
[MODEL] FalconH1 (#18406)
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae>
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>
|
2025-05-21 04:59:06 -07:00 |
|
|
|
61acfc45bc
|
[Bugfix][Failing Test] Fix test_events.py (#18460)
Signed-off-by: rabi <ramishra@redhat.com>
|
2025-05-21 04:57:28 -07:00 |
|
|
|
92247c522e
|
[Bug] Fix moe_sum signature (#18440)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-05-20 22:37:08 -07:00 |
|
|
|
f4a8a37465
|
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-20 09:08:37 -07:00 |
|
|
|
86847700d7
|
[CI] Add mteb testing to test the accuracy of the embedding model (#17175)
|
2025-05-20 06:51:12 -07:00 |
|
|
|
6b35cb10a0
|
[Misc] Add LoRA code owner (#18387)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-20 03:27:30 -07:00 |
|
|
|
9609327fa4
|
[Core] [Bugfix]: tensor parallel with prompt embeds (#18171)
Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
|
2025-05-19 20:21:27 -07:00 |
|
|
|
f07a673eb2
|
[Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name (#18358)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-19 20:20:12 -07:00 |
|
|
|
dc1440cf9f
|
Neuron up mistral (#18222)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
|
2025-05-19 09:54:47 -07:00 |
|
|
|
e2ee1e8e9e
|
[Feature]Add support for models quantized with AutoRound (#17850)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
|
2025-05-19 09:38:53 -07:00 |
|
|
|
6781af5608
|
[Quantization] Pool model support bitsandbytes (#18087)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-19 09:03:43 -07:00 |
|
|
|
221cfc2fea
|
Feature/vllm/input embedding completion api (#17590)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com>
Co-authored-by: Bryce1010 <bryceyx@gmail.com>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-18 20:18:05 -07:00 |
|