|
|
a92842454c
|
[Bugfix][ROCm] Using device_type because on ROCm the API is still torch.cuda (#17601)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-02 22:25:47 -07:00 |
|
|
|
c8386fa61d
|
[Build/CI] Upgrade CUTLASS to 3.9.1 (#17602)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-05-02 22:25:14 -07:00 |
|
|
|
87baebebd8
|
[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name (#17508)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-05-02 21:42:44 -07:00 |
|
|
|
e3d0a1d190
|
[Quantizaton] [AMD] Add support for running DeepSeek int8 w8a8 MoE on ROCm (#17558)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-05-02 21:41:10 -07:00 |
|
|
|
d47b605eca
|
Update test requirements to CUDA 12.8 (#17576)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-05-02 21:40:15 -07:00 |
|
|
|
22c6f6397f
|
[Neuron][Build] Require setuptools >= 77.0.3 for PEP 639 (#17603)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-05-03 02:41:59 +00:00 |
|
|
|
3ec97e2cc5
|
[release] Add command to clean up Docker containers/images in TPU release machine (#17606)
|
2025-05-02 18:54:34 -07:00 |
|
|
|
9b103a1d76
|
fix typo in logging (#17605)
|
2025-05-02 18:04:40 -07:00 |
|
|
|
b90b0852e9
|
[easy] Print number of needed GPUs in skip message (#17594)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-02 15:27:43 -07:00 |
|
|
|
9352cdb56d
|
[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Lu Fang <lufang@fb.com>
|
2025-05-02 19:44:19 +00:00 |
|
|
|
182f40ea8b
|
Add NVIDIA TensorRT Model Optimizer in vLLM documentation (#17561)
|
2025-05-02 11:36:46 -07:00 |
|
|
|
3e887d2e0c
|
permute/unpermute kernel for moe optimization (#14568)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-05-02 11:31:55 -07:00 |
|
|
|
0f87d8f7b2
|
[BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (#17574)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-02 11:01:38 -07:00 |
|
|
|
4c33d67321
|
[Bugfix] fix tmp_out and exp_sums dimensions (#17438)
Signed-off-by: Hui Liu <96135754+hliuca@users.noreply.github.com>
|
2025-05-02 16:44:07 +00:00 |
|
|
|
cb234955df
|
[Misc] Clean up input processing (#17582)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-02 08:11:53 -07:00 |
|
|
|
3a500cd0b6
|
[doc] miss result (#17589)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-02 07:04:49 -07:00 |
|
|
|
868c546da4
|
Support W8A8 INT8 MoE for compressed-tensors (#16745)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-02 10:03:32 -04:00 |
|
|
|
99404f53c7
|
[Security] Fix image hash collision (#17378)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-02 08:36:39 -04:00 |
|
|
|
785d75a03b
|
Automatically tell users that dict args must be valid JSON in CLI (#17577)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-02 05:24:55 -07:00 |
|
|
|
6d1479ca4b
|
[doc] add the print result (#17584)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-02 05:24:45 -07:00 |
|
|
|
b8b0859b5c
|
add more pytorch related tests for torch nightly (#17422)
Signed-off-by: Yang Wang <elainewy@meta.com>
|
2025-05-02 03:29:59 -07:00 |
|
|
|
d7543862bd
|
[Misc] Rename assets for testing (#17575)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-02 03:29:25 -07:00 |
|
|
|
c777df79f7
|
[BugFix] Fix Memory Leak (#17567)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-05-02 01:07:03 -07:00 |
|
|
|
cc2a77d7f1
|
[Core] [Bugfix] Add Input Embeddings (#15428)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com>
Co-authored-by: Bryce1010 <bryceyx@gmail.com>
Co-authored-by: Nan2018 <nan@protopia.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-02 01:06:39 -07:00 |
|
|
|
9e2de9b9e9
|
[Bugifx] Remove TritonPlaceholder from sys.modules (#17317)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-02 00:45:01 -07:00 |
|
|
|
109e15a335
|
Add pt_load_map_location to allow loading to cuda (#16869)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-05-01 23:23:42 -07:00 |
|
|
|
f192ca90e6
|
Fix PixtralHF missing spatial_merge_size (#17571)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-01 22:14:09 -07:00 |
|
|
|
f89d0e11bf
|
[Misc] Continue refactoring model tests (#17573)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-01 22:06:08 -07:00 |
|
|
|
b4003d11fc
|
Check if bitblas is installed during support check (#17572)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-02 04:32:54 +00:00 |
|
|
|
292fc59d61
|
[CI] Actually run tests/kv_transfer/test_disagg.py in CI (#17555)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-02 04:05:04 +00:00 |
|
|
|
afcb3f8863
|
[Attention] MLA move o_proj q_proj into cuda-graph region (#17484)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-02 03:16:26 +00:00 |
|
|
|
afb12e4294
|
[Doc] note that not all unit tests pass on CPU platforms (#17554)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-05-02 02:57:21 +00:00 |
|
|
|
24aebae177
|
[Bugfix] Disable gptq_bitblas for <SM80 to fix GPTQ on V100/T4 (#17541)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-01 17:59:35 -07:00 |
|
|
|
39c0813a7f
|
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 (#17504)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-05-01 16:19:30 -07:00 |
|
|
|
9b70e2b4c1
|
[Misc][Tools][Benchmark] Publish script to auto tune server parameters (#17207)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-05-01 19:53:03 +00:00 |
|
|
|
173daac19d
|
[Bug]change the position of cuda_graph_sizes in dataclasses (#17548)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
|
2025-05-01 11:52:37 -07:00 |
|
|
|
04f2cfc894
|
Remove duplicate code from dbrx.py (#17550)
|
2025-05-01 11:51:58 -07:00 |
|
|
|
811a6c0972
|
[ROCM] Add gfx950 to the custom attention archs (#16034)
Signed-off-by: jpvillam <Juan.Villamizar@amd.com>
Signed-off-by: seungrokjung <seungrok.jung@amd.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: seungrokjung <seungrok.jung@amd.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-01 11:18:28 -07:00 |
|
|
|
9b1769dd9a
|
[Bugfix] Fix lint error (#17547)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-01 11:12:19 -07:00 |
|
|
|
61c299f81f
|
[Misc]add configurable cuda graph size (#17201)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-01 11:04:50 -07:00 |
|
|
|
4acfa3354a
|
[ROCm] update installation guide to include build aiter from source instructions (#17542)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-05-01 11:01:28 -07:00 |
|
|
|
88c8304104
|
[Model] Refactor Ovis2 to support original tokenizer (#17537)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-01 11:00:53 -07:00 |
|
|
|
6768ff4a22
|
Move the last arguments in arg_utils.py to be in their final groups (#17531)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-01 10:31:44 -07:00 |
|
|
|
f2e7af9b86
|
[CI/Build] Remove awscli dependency (#17532)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-01 09:20:54 -07:00 |
|
|
|
7423cf0a9b
|
[Misc] refactor example - cpu_offload_lmcache (#17460)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-01 15:05:24 +00:00 |
|
|
|
460a2b1100
|
[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (#10867)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-05-01 07:59:28 -07:00 |
|
|
|
28566d73b3
|
[ROCm] remove unsupported archs from rocm triton flash-attention supported list (#17536)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
|
2025-05-01 07:54:25 -07:00 |
|
|
|
98060b001d
|
[Feature][Frontend]: Deprecate --enable-reasoning (#17452)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-01 06:46:16 -07:00 |
|
|
|
f5a3c655b2
|
[FEAT] [ROCm]: Add Qwen/Qwen3-235B-A22B-FP8 TP4 triton fused moe config (#17535)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-05-01 06:37:17 -07:00 |
|
|
|
7169f87ad0
|
[doc] add streamlit integration (#17522)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-01 13:34:02 +00:00 |
|