Commit Graph

6626 Commits

Author SHA1 Message Date
f0b66d6929 prints
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-27 18:37:43 +00:00
a743a35948 fixes
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-27 18:14:59 +00:00
7b31e8a8ff wip seperate comm and compute threads
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-27 16:51:27 +00:00
2f3920638c add comment
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-27 14:45:02 +00:00
020d9b05bc fix dp=2 tp=2 hang
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-05-26 18:37:03 +00:00
37bdf9f324 better logging
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 18:34:08 +00:00
e4419df256 better debug utils
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 18:23:29 +00:00
952f3c5c1e tone down prints
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 18:18:05 +00:00
9edd08231b debugging hang
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 15:22:50 +00:00
2dc3b8b0a2 wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 03:32:25 +00:00
18bf91e6a8 wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-23 03:31:49 +00:00
00f526f55b seperate gpu wait
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 21:52:27 +00:00
a8439e2fd4 dp working no yields
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 21:49:14 +00:00
2a7f25fbe2 fix hang 2025-05-22 20:51:36 +00:00
9c60a6299d tp1 working multistream tp > 1 broken
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
2259b47951 use vllm current_stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
04f11d97a0 working but only on the same stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
ffb740ae95 manually manage stream
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:36 +00:00
020269c4c5 added multhreading support
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-05-22 20:51:36 +00:00
9ccfd094ff fix dummy mode
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
f93bdd3151 support more args in dp example
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
df8f889f37 support MLA
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
37c9babaa0 enable naive microbatching
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
8293182c8c wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-05-22 20:51:35 +00:00
980a172474 [Kernel] update comment for KV shape in unified triton attn (#18099)
Signed-off-by: haochengxia <xhc_1007@163.com>
2025-05-20 11:19:34 -07:00
e1f5a71ed7 [Model] use AutoWeightsLoader for bloom (#18300)
Signed-off-by: calvin chen <120380290@qq.com>
2025-05-20 09:40:05 -07:00
f4a8a37465 [Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-20 09:08:37 -07:00
8f55962a7f [Misc] refactor prompt embedding examples (#18405)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-20 15:26:12 +00:00
be48360c1f [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>
2025-05-20 06:59:48 -07:00
86847700d7 [CI] Add mteb testing to test the accuracy of the embedding model (#17175) 2025-05-20 06:51:12 -07:00
d6c86d09ae Update cpu.txt (#18398)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
2025-05-20 10:53:23 +00:00
6b35cb10a0 [Misc] Add LoRA code owner (#18387)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-20 03:27:30 -07:00
1b1e8e05ff [doc] update env variable export (#18391)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-20 08:53:27 +00:00
bca55b556f [Bugfix] fix adding bias twice in ipex GPTQ quantization (#18363)
Signed-off-by: rand-fly <randfly@outlook.com>
2025-05-20 00:54:33 -07:00
d981396778 [release] Change dockerhub username for TPU release (#18389) 2025-05-19 23:49:23 -07:00
9609327fa4 [Core] [Bugfix]: tensor parallel with prompt embeds (#18171)
Signed-off-by: Nan2018 <nan@protopia.ai>
Co-authored-by: Andrew Sansom <andrew@protopia.ai>
2025-05-19 20:21:27 -07:00
f07a673eb2 [Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name (#18358)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-19 20:20:12 -07:00
d565e0976f [neuron] fix authorization issue (#18364)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
2025-05-19 23:30:32 +00:00
258bf621d5 fix CUDA_check redefinition in #17918 (#18287)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
2025-05-19 13:42:35 -07:00
dc1440cf9f Neuron up mistral (#18222)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
2025-05-19 09:54:47 -07:00
8171221834 [Misc] Fix typo (#18330) 2025-05-19 09:51:01 -07:00
7937c2fd52 Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup (#18337) 2025-05-19 09:49:57 -07:00
e2ee1e8e9e [Feature]Add support for models quantized with AutoRound (#17850)
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>
2025-05-19 09:38:53 -07:00
20d8ce81eb [Frontend] add --quick option for vllm chat/complete (#18297)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-19 09:36:13 -07:00
84ab4feb7e [Doc] Fix typo (#18355) 2025-05-19 16:05:16 +00:00
6781af5608 [Quantization] Pool model support bitsandbytes (#18087)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-19 09:03:43 -07:00
1b15df2546 [BugFix] Fix handling of num_computed_tokens with connector (#18232)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2025-05-19 09:03:25 -07:00
43b5f61dce [Doc] Move input-related docs to Features (#18353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-19 15:08:39 +00:00
c5bb0ebdc6 [Doc] Fix prompt embedding examples (#18350)
Signed-off-by: wangli <wangli858794774@gmail.com>
2025-05-19 06:48:16 -07:00
d637b96099 [BugFix] [Vul] Add missing usedforsecurity=False in MD5 hashing to enable FIPS (#18319)
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: shaoyuyoung <shaoyuyoung@gmail.com>
Co-authored-by: cascade <cascade812@outlook.com>
2025-05-19 01:31:23 -07:00