youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Junlin Zhou	75e9d49796	[Bugfix] Initialize attention bias on the same device as Query/Key/Value (#13468 )	2025-02-25 02:13:09 -08:00
Chen1022	32c3b6bfd1	[Misc]Clarify Error Handling for Non-existent Model Paths and HF Repo IDs (#13724 ) Signed-off-by: Chen-0210 <chenjincong11@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-02-25 10:12:19 +00:00
Jee Jee Li	37b6cb4985	[CI/Build] Fix V1 LoRA failure (#13767 )	2025-02-25 02:01:15 -08:00
Gregory Shtrasberg	aabeb2688f	[ROCm][Quantization][Kernel] Using HIP FP8 header (#12593 )	2025-02-25 00:39:59 -08:00
Jiayi Yao	2f42a4888c	[Feature] Support KV cache offloading and disagg prefill with LMCache connector. (#12953 )	2025-02-25 00:38:42 -08:00
Rui Qiao	3173c3b34e	[misc] Clean up ray compiled graph type hints (#13731 )	2025-02-25 00:37:08 -08:00
Shanshan Shen	2d87d7d1ac	[Bugfix] Modify modelscope api usage in transformer_utils (#13807 )	2025-02-25 00:36:07 -08:00
Russell Bryant	aab392774b	[Core] xgrammar: Expand list of unsupported jsonschema keywords (#13783 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-25 08:21:25 +00:00
Cyrus Leung	6724e79164	[Misc] Check that the model can be inspected upon registration (#13743 )	2025-02-25 00:18:19 -08:00
Varun Sundar Rabindranath	03f48b3db6	[Core] LoRA V1 - Add add/pin/list/remove_lora functions (#13705 )	2025-02-25 00:18:02 -08:00
Michael Goin	4d251ad00e	Fix CompressedTensorsWNA16MoE with grouped scales (#13769 )	2025-02-25 00:17:14 -08:00
Michael Goin	18e505930d	[Bugfix] Support MLA for CompressedTensorsWNA16 (#13725 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-25 06:10:31 +00:00
Lucas Wilkinson	4a8cfc7551	[Bugfix] Fix deepseek-v2 error: "missing 1 required positional argument: 'residual'" (#13802 )	2025-02-24 20:33:59 -08:00
Mark McLoughlin	bc32bc73aa	[V1][Metrics] Implement vllm:lora_requests_info metric (#13504 )	2025-02-24 20:01:33 -08:00
wangxiyuan	ab1091d5f2	[Misc][Attention][Quantization] init property earlier (#13733 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-25 03:19:30 +00:00
Tyler Michael Smith	1e15aaef56	[Bugfix][Quantization] Fix FP8 + EP (#13784 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-25 10:54:17 +08:00
cjackal	51010a1807	[Misc] set single whitespace between log sentences (#13771 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-02-25 10:26:12 +08:00
Eli Boyarski	7196a3b1db	[Doc] arg_utils.py: fixed a typo (#13785 )	2025-02-24 18:23:04 -08:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
Robert Shaw	f61528d46d	[Misc][Chore] Clean Up `AsyncOutputProcessing` Logs (#13780 )	2025-02-24 16:39:07 -08:00
Robert Shaw	1f0ae3ed0a	[Misc] Clean Up `EngineArgs.create_engine_config` (#13734 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-02-24 13:52:21 -05:00
Michael Goin	db986c19ea	Fix precommit fail in fused_moe intermediate_cache2 chunking (#13772 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-24 09:25:47 -08:00
Roger Wang	227578480d	Revert "[V1][Core] Fix memory issue with logits & sampling" (#13775 )	2025-02-24 09:16:05 -08:00
afeldman-nm	befc402d34	[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) (#10980 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-24 08:29:41 -08:00
Nicolò Lucchesi	444b0f0f62	[Misc][Docs] Raise error when flashinfer is not installed and `VLLM_ATTENTION_BACKEND` is set (#12513 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-02-24 10:43:21 -05:00
Zhonghua Deng	ccc00515fd	[BugFix] Illegal memory access for MoE On H20 (#13693 )	2025-02-24 07:37:32 -08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Roger Meier	7940d8a6a7	[CI/Build] add python-json-logger to requirements-common (#12842 )	2025-02-24 06:10:33 -08:00
Roger Meier	c0e3ecd6d2	[Bugfix] fix(logging): add missing opening square bracket (#13011 )	2025-02-24 06:10:25 -08:00
Mengqing Cao	23eca9cf68	[model][refactor] remove cuda hard code in models and layers (#13658 )	2025-02-24 06:10:14 -08:00
Roger Wang	437b76ff59	[V1][Core] Fix memory issue with logits & sampling (#13721 )	2025-02-24 06:10:06 -08:00
Kevin H. Luu	f90a375593	[ci] Add logic to change model to S3 path only when S3 CI env var is on (#13727 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-63-253.us-west-2.compute.internal>	2025-02-24 06:32:11 +00:00
Huy Do	e7ef74e26e	Fix some issues with benchmark data output (#13641 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-24 10:23:18 +08:00
Nick Hill	cbae7af552	[V1][BugFix] Fix engine core client shutdown hangs (#13298 ) Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method. Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-23 13:07:43 -08:00
youkaichao	eb24dc4a45	[v1] torchrun compatibility (#13642 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-23 22:47:24 +08:00
Roger Wang	9bebc9512f	[Misc] Deprecate `--dataset` from `benchmark_serving.py` (#13708 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-23 13:32:20 +00:00
Nick Hill	5a2ba16f5c	[Core][Distributed] Use IPC (domain socket) ZMQ socket for local comms (#13688 )	2025-02-23 02:54:29 -08:00
Isotr0py	ba5106e519	[LMM] Implement merged multimodal processor for whisper (#13278 )	2025-02-23 01:46:03 -08:00
Kyle Sayers	d5ca2110f1	[Quant] BaiChuan SupportsQuant (#13710 )	2025-02-22 19:21:15 -08:00
Kevin H. Luu	2c5e637b57	[ci] Use env var to control whether to use S3 bucket in CI (#13634 )	2025-02-22 19:19:45 -08:00
Andy Lo	322d2a27d6	[BugFix] Minor: logger import in attention backend (#13706 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-02-22 16:51:13 -08:00
Roger Wang	82e0d601fc	[CI/Build] Fix pre-commit errors from #13571 (#13709 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-22 16:50:38 -08:00
Daniele	78ac0f591d	[CI/Build] fix uv caching in Dockerfile (#13611 )	2025-02-22 08:25:20 -08:00
Yan Ma	b56155e7f3	[XPU]fix setuptools version for xpu (#13548 )	2025-02-22 08:05:35 -08:00
Helena Kloosterman	382f66fb08	[Bugfix] Fix boolean conversion for OpenVINO env variable (#13615 )	2025-02-22 08:04:12 -08:00
Cyrus Leung	8354f6640c	[Doc] Dockerfile instructions for optional dependencies and dev transformers (#13699 )	2025-02-22 06:04:31 -08:00
Gregory Shtrasberg	c904fdddf6	[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm (#13231 )	2025-02-22 05:54:38 -08:00
Sage Moore	558db8083c	[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths (#13095 )	2025-02-22 05:25:41 -08:00
Kaixi Hou	e109e598c7	[NVIDIA] Support nvfp4 cutlass gemm (#13571 )	2025-02-22 05:24:05 -08:00
Keyun Tong	8db1b9d0a1	Support SSL Key Rotation in HTTP Server (#13495 )	2025-02-22 05:17:44 -08:00

1 2 3 4 5 ...

4795 Commits