cdc1fa12eb
Remove unused kwargs from model definitions ( #13555 )
2025-02-24 17:13:52 -08:00
f61528d46d
[Misc][Chore] Clean Up AsyncOutputProcessing Logs ( #13780 )
2025-02-24 16:39:07 -08:00
1f0ae3ed0a
[Misc] Clean Up EngineArgs.create_engine_config ( #13734 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
2025-02-24 13:52:21 -05:00
db986c19ea
Fix precommit fail in fused_moe intermediate_cache2 chunking ( #13772 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-02-24 09:25:47 -08:00
227578480d
Revert "[V1][Core] Fix memory issue with logits & sampling" ( #13775 )
2025-02-24 09:16:05 -08:00
befc402d34
[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) ( #10980 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-02-24 08:29:41 -08:00
444b0f0f62
[Misc][Docs] Raise error when flashinfer is not installed and VLLM_ATTENTION_BACKEND is set ( #12513 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-02-24 10:43:21 -05:00
ccc00515fd
[BugFix] Illegal memory access for MoE On H20 ( #13693 )
2025-02-24 07:37:32 -08:00
781096e385
Expert Parallelism (EP) Support for DeepSeek V2 ( #12583 )
2025-02-24 07:33:20 -08:00
7940d8a6a7
[CI/Build] add python-json-logger to requirements-common ( #12842 )
2025-02-24 06:10:33 -08:00
c0e3ecd6d2
[Bugfix] fix(logging): add missing opening square bracket ( #13011 )
2025-02-24 06:10:25 -08:00
23eca9cf68
[model][refactor] remove cuda hard code in models and layers ( #13658 )
2025-02-24 06:10:14 -08:00
437b76ff59
[V1][Core] Fix memory issue with logits & sampling ( #13721 )
2025-02-24 06:10:06 -08:00
f90a375593
[ci] Add logic to change model to S3 path only when S3 CI env var is on ( #13727 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-63-253.us-west-2.compute.internal >
2025-02-24 06:32:11 +00:00
e7ef74e26e
Fix some issues with benchmark data output ( #13641 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-02-24 10:23:18 +08:00
cbae7af552
[V1][BugFix] Fix engine core client shutdown hangs ( #13298 )
...
Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method.
Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context.
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-02-23 13:07:43 -08:00
eb24dc4a45
[v1] torchrun compatibility ( #13642 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-02-23 22:47:24 +08:00
9bebc9512f
[Misc] Deprecate --dataset from benchmark_serving.py ( #13708 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-02-23 13:32:20 +00:00
5a2ba16f5c
[Core][Distributed] Use IPC (domain socket) ZMQ socket for local comms ( #13688 )
2025-02-23 02:54:29 -08:00
ba5106e519
[LMM] Implement merged multimodal processor for whisper ( #13278 )
2025-02-23 01:46:03 -08:00
d5ca2110f1
[Quant] BaiChuan SupportsQuant ( #13710 )
2025-02-22 19:21:15 -08:00
2c5e637b57
[ci] Use env var to control whether to use S3 bucket in CI ( #13634 )
2025-02-22 19:19:45 -08:00
322d2a27d6
[BugFix] Minor: logger import in attention backend ( #13706 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-02-22 16:51:13 -08:00
82e0d601fc
[CI/Build] Fix pre-commit errors from #13571 ( #13709 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-02-22 16:50:38 -08:00
78ac0f591d
[CI/Build] fix uv caching in Dockerfile ( #13611 )
2025-02-22 08:25:20 -08:00
b56155e7f3
[XPU]fix setuptools version for xpu ( #13548 )
2025-02-22 08:05:35 -08:00
382f66fb08
[Bugfix] Fix boolean conversion for OpenVINO env variable ( #13615 )
2025-02-22 08:04:12 -08:00
8354f6640c
[Doc] Dockerfile instructions for optional dependencies and dev transformers ( #13699 )
2025-02-22 06:04:31 -08:00
c904fdddf6
[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm ( #13231 )
2025-02-22 05:54:38 -08:00
558db8083c
[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths ( #13095 )
2025-02-22 05:25:41 -08:00
e109e598c7
[NVIDIA] Support nvfp4 cutlass gemm ( #13571 )
2025-02-22 05:24:05 -08:00
8db1b9d0a1
Support SSL Key Rotation in HTTP Server ( #13495 )
2025-02-22 05:17:44 -08:00
2382ad29d1
[ci] fix linter ( #13701 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-02-22 20:28:59 +08:00
3e472d882a
[core] set up data parallel communication ( #13591 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-02-22 19:28:59 +08:00
7f6bae561c
[CI/Build] Fix pre-commit errors ( #13696 )
2025-02-22 00:31:26 -08:00
105b8ce4c0
[Misc] Reduce LoRA-related static variable ( #13166 )
2025-02-22 00:21:30 -08:00
2cb8c1540e
[Metrics] Add --show-hidden-metrics-for-version CLI arg ( #13295 )
2025-02-22 00:20:45 -08:00
1cd981da4f
[V1][Metrics] Support vllm:cache_config_info ( #13299 )
2025-02-22 00:20:00 -08:00
fca20841c2
Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size ( #13660 )
2025-02-22 00:19:10 -08:00
da31b5333e
[Bugfix] V1 Memory Profiling: V0 Sampler Integration without Rejection Sampler ( #13594 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-02-22 00:08:29 -08:00
bb78fb318e
[v1] Support allowed_token_ids in v1 Sampler ( #13210 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-02-22 14:13:05 +08:00
8aca27fa11
[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len ( #13691 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-02-22 14:10:38 +08:00
95c617e04b
[Misc] Bump compressed-tensors ( #13619 )
2025-02-21 22:09:04 -08:00
9a1f1da5d1
[Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA ( #13687 )
2025-02-21 22:07:45 -08:00
68d630a0c7
[ROCM] fix native attention function call ( #13650 )
2025-02-21 22:07:04 -08:00
68d535ef44
[Misc] Capture and log the time of loading weights ( #13666 )
2025-02-21 22:06:34 -08:00
c6ed93860f
[Bugfix][API Server] Fix invalid usage of 'ge' and 'le' in port valid… ( #13672 )
2025-02-21 22:05:28 -08:00
0ffdf8ce0c
[HTTP Server] Make model param optional in request ( #13568 )
2025-02-21 21:55:50 -08:00
8c0dd3d4df
docs: Add a note on full CI run in contributing guide ( #13646 )
2025-02-21 21:53:59 -08:00
ada7c780d5
[Misc] Fix yapf linting tools etc not running on pre-commit ( #13695 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-02-22 13:10:43 +08:00