Commit Graph

7475 Commits

Author SHA1 Message Date
631be12edb refactoring pplx_prepare_finalize.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:16:34 +00:00
a9d47e8652 remove always_microbatch_if_enabled
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:09:33 +00:00
fc562e22e2 cleanup gpu_worker.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:07:59 +00:00
1ca65412b8 cleanup backends/utils.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:07:33 +00:00
3112714bdc cleanup logger.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:05:38 +00:00
0c03d154b5 cleanup config.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:03:26 +00:00
9b7edc0343 cleanup data_parallel.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:02:12 +00:00
be2e1632fd delete basic-ub.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-03 13:01:01 +00:00
ce3ef95c11 turn yields on for pplx
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 22:34:02 +00:00
18f7bfb501 ubatching fix
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 22:22:41 +00:00
3d833aa759 cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 21:20:21 +00:00
0e499c4f4d first round of cleanups
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 21:11:28 +00:00
0767d9863f fix data_parallel.py
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 19:25:59 +00:00
c0efbbb5de misc changes
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:56:30 +00:00
f7a3ee0ea1 Merge remote-tracking branch 'origin/main' into lwilkinson/attn-slicing
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-02 16:52:19 +00:00
57d404bbb8 misc
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-02 16:37:58 +00:00
e28533a16f [Bugfix] Fix include prompt in stream response when echo=true (#15233)
Signed-off-by: Yuan Fang <yuanfang@alauda.io>
2025-07-01 01:30:14 +00:00
6d42ce8315 [CLI] Improve CLI arg parsing for -O/--compilation-config (#20156)
Signed-off-by: luka <luka@neuralmagic.com>
2025-07-01 01:03:13 +00:00
ded1fb635b [Bugfix][V1][P/D]Fix the issue of occasional garbled output for P2pNcclConnector (#20263)
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-06-30 16:45:14 -07:00
97d9524fe9 [Refactor] Remove useless pdb comment (#20266)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-30 18:15:24 +00:00
d8cf819a9a [Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models (#20058)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-06-30 17:26:49 +00:00
d833982e48 random push
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-30 17:08:51 +00:00
551ef1631a [Unit Test] Add unit test for deep gemm (#20090)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-30 10:26:42 -06:00
2863befce3 [Optimization] Use Shared CachedRequestData Instance Across All Requests (#20232)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 09:07:50 -07:00
2965c99c86 [Spec Decode] Clean up spec decode example (#20240)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 08:28:13 -07:00
2062c0723d [Spec Decode] Refactor spec decoding into a separate function (#20238)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-30 08:13:50 -07:00
1c50e100a9 [Bugfix] fix quark ptpc (#20251)
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: Haoyang Li <307790822@qq.com>
2025-06-30 22:24:50 +09:00
3ee56e26be [Docs] Fix 1-2-3 list in v1/prefix_caching.md (#20243)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-06-30 11:20:51 +00:00
8fe7fc8634 [Quantization] Improve BitsAndBytesModelLoader (#20242)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-06-30 18:22:09 +08:00
e936e401de [Bugfix] Fix processor initialization in transformers 4.53.0 (#20244)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-30 10:16:16 +00:00
f5dfa07531 [Bugfix] Skip loading extra parameters for modelopt Qwen3 MoE model (#19598)
Signed-off-by: noiji <>
2025-06-30 18:21:56 +09:00
022c58b80f [doc] Add Slack and Forum to the top navigation (#20208)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-06-30 07:53:45 +00:00
19108ef311 [Misc] Fix import (#20233)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-06-29 20:34:54 -07:00
5a52f389dd [BUGFIX][DEEPSEEK][MODEL_LOAD] fix w13, w2 weight not initialized assert (#20202)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-06-29 19:46:19 -07:00
65b1cbb138 [Model] support dots1 (#18254)
Signed-off-by: redmoe-moutain <agiredmoe@gmail.com>
2025-06-29 19:34:36 -07:00
6c9837a761 Fix cuda_archs_loose_intersection when handling sm_*a (#20207)
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-06-29 16:52:34 -07:00
6f2f53a82d [Quantization] Add compressed-tensors NVFP4 MoE Support (#19990)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
2025-06-29 22:05:40 +00:00
7b1895e6ce [CI Fix] Try fixing eagle e2e test OOM by reducing block allocation (#20213)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-29 10:31:37 +08:00
4d36693687 [Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx (#20187)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-28 22:06:38 +00:00
4672c72f44 capture works replay does not
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-06-28 19:14:48 +00:00
daec9dea6e [Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution (#20137)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
2025-06-28 08:16:41 -07:00
daceac57c7 [Frontend] Generalize v1/audio/transcriptions endpoint (#20179)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-06-28 08:15:26 -07:00
8615d9776f [CI/Build] Add new CI job to validate Hybrid Models for every PR (#20147)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-06-27 23:00:25 -07:00
7b460c25f9 [BugFix] Fix the incorrect func name in the comments. (config.py) (#20185) 2025-06-27 22:51:16 -07:00
f719772281 [Bugfix] Properly reject requests with empty list guided_choice (#20195)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-27 22:50:52 -07:00
d45417b804 fix ci issue distributed 4 gpu test (#20204)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-06-27 22:50:00 -07:00
a29e62ea34 Fix num_token_padding support for static per-tensor scaled_fp8_quant (#20188)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-27 22:48:13 -07:00
e53be6f00a [Misc] Add type assertion of request_id for LLMEngine.add_request (#19700)
Signed-off-by: n2ptr <xuzhanchaomail@163.com>
2025-06-27 22:47:36 -07:00
c329ceca6d [CI Fix] Pin tests/models/registry.py MiniMaxText01ForCausalLM to revision due to model changes (#20199)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-28 13:43:06 +08:00
3c545c0c3b [CI/Build] Allow hermetic builds (#18064)
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Fabien Dupont <fabiendupont@pm.me>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Elias Levy <eliaslevy@google.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-06-27 09:04:39 -07:00