youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
mgoin	09318caeba	Combine loader _process_weights_after_loading Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-11 19:55:18 +00:00
mgoin	d56ef8b685	Support AWQMarlin with MLA Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-10 20:53:29 +00:00
மனோஜ்குமார் பழனிச்சாமி	2ae889052c	Fix seed parameter behavior in vLLM (#13007 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-02-10 23:26:50 +08:00
Cyrus Leung	51f0b5f7f6	[Bugfix] Clean up and fix multi-modal processors (#13012 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-10 10:45:21 +00:00
Kevin H. Luu	fde71262e0	[misc] Add retries with exponential backoff for HF file existence check (#13008 )	2025-02-10 01:15:02 -08:00
Yuan Tang	243137143c	[Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-10 06:09:33 +00:00
youkaichao	b2496bb07f	[core] fix sleep mode and pytorch checkpoint compatibility (#13001 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-10 13:03:43 +08:00
Yuan Tang	44607e07d3	Check if selected backend is None in get_attn_backend_cls() (#12975 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-10 11:45:07 +08:00
Nick Hill	67c4637ccf	[V1] Use msgpack for core request serialization (#12918 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-10 11:35:56 +08:00
youkaichao	aa0ca5ebb7	[core][rlhf] add colocate example for RLHF (#12984 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-10 10:28:59 +08:00
youkaichao	59fff4a01a	[core] improve error handling when wake up from sleep mode (#12981 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-10 09:38:57 +08:00
Lu Fang	29f1d47e73	[MISC] Always import version library first in the vllm package (#12979 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-09 18:56:40 +08:00
youkaichao	cf797aa856	[core] port pynvml into vllm codebase (#12963 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-09 15:00:00 +08:00
Woosuk Kwon	24700c346b	[V1] Cache `uses_mrope` in GPUModelRunner (#12969 )	2025-02-08 15:32:32 -08:00
Patrick von Platen	d366ccc4e3	[RFC] [Mistral] FP8 format (#10130 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-02-08 14:12:53 -07:00
Woosuk Kwon	870c37481e	[V1][Minor] Remove outdated comment (#12968 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-08 12:48:30 -08:00
Jee Jee Li	86222a3dab	[VLM] Merged multi-modal processor for GLM4V (#12449 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-02-08 20:32:16 +00:00
youkaichao	fe743b798d	[bugfix] fix early import of flash attention (#12959 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-09 00:06:56 +08:00
shangmingc	913df14da3	[Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU (#12935 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-08 14:46:19 +00:00
Cyrus Leung	8a69e0e20e	[CI/Build] Auto-fix Markdown files (#12941 )	2025-02-08 04:25:15 -08:00
Isotr0py	4c8dd12ef3	[Misc] Add qwen2.5-vl BNB support (#12944 )	2025-02-08 04:24:47 -08:00
Jun Duan	256a2d29dc	[Doc] Correct HF repository for TeleChat2 models (#12949 )	2025-02-08 01:42:15 -08:00
Liangfu Chen	c45d398e6f	[CI] Resolve transformers-neuronx version conflict (#12925 )	2025-02-08 01:41:35 -08:00
Jun Duan	011e612d92	[Misc] Log time consumption on weight downloading (#12926 )	2025-02-08 09:16:42 +00:00
Varun Sundar Rabindranath	7e1837676a	[misc] Add LoRA to benchmark_serving (#12898 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-08 17:15:44 +08:00
Sanju C Sudhakaran	2880e21e3d	[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi (#12812 ) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>	2025-02-08 17:15:30 +08:00
wangxiyuan	407b5537db	[Build] Make pypi install work on CPU platform (#12874 )	2025-02-08 01:15:15 -08:00
Woosuk Kwon	4ea48fb35c	[V1][Minor] Move cascade attn logic outside _prepare_inputs (#12943 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-08 00:39:09 -08:00
Shaoting	e31498bdcb	[Misc] Add offline test for disaggregated prefill (#12418 )	2025-02-08 08:38:20 +00:00
youkaichao	91dd8f7aa6	[bugfix] respect distributed_executor_backend in world_size=1 (#12934 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-08 16:17:08 +08:00
zifeitong	d01f66b039	[Bugfix] Fix multi-round chat error when mistral tokenizer is used (#12859 ) Signed-off-by: Zifei Tong <zifeitong@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-02-08 07:04:34 +00:00
Ke Zhao	cc01223f3b	[Misc] Fix typo in the example file (#12896 ) Signed-off-by: Zhao Ke <yingxiongraomingzk@gmail.com>	2025-02-08 06:56:43 +00:00
Jee Jee Li	306923da82	[Bugfix] Fix Qwen2_5_VLForConditionalGeneration packed_modules_mapping (#12905 )	2025-02-07 21:02:53 -08:00
Woosuk Kwon	3243158336	[V1] Move KV block hashes from Request to KVCacheManager (#12922 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-07 19:14:10 -08:00
Woosuk Kwon	b21f0f9d17	[V1][Minor] Remove outdated comment (#12928 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-07 19:07:37 -08:00
Lu Fang	45cbc4991d	[Bugfix] Fix disagg hang caused by the prefill and decode communication issues (#12723 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-07 16:39:50 -08:00
Robert Shaw	932c6b7461	[V1] LM Eval With Streaming Integration Tests (#11590 )	2025-02-07 15:07:03 -08:00
TJian	eaa92d4437	[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501 )	2025-02-07 08:13:43 -08:00
afeldman-nm	0630d4537a	[V1] Logprobs and prompt logprobs support (#9880 ) This PR is adding support for sample logprobs & prompt logprobs to vLLM v1. New behavior: - During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order. - In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized. - During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.) - Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer. Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-07 07:26:20 -08:00
Amit Garg	538fab93cd	PR #12718 (#12718 )	2025-02-07 06:22:37 -08:00
Cyrus Leung	ce26b16268	[Misc] Remove unnecessary detokenization in multimodal processing (#12868 )	2025-02-07 06:21:17 -08:00
Lu Fang	1918aa1b80	[MISC][EASY] Break check file names into entry and args in the pre-commit hooks (#12880 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-07 13:04:39 +00:00
Maximilien de Bayser	6e1fc61f0f	Prevent unecessary requests to huggingface hub (#12837 )	2025-02-06 21:37:41 -08:00
Szymon Ożóg	aa375dca9f	[Bugfix] Missing quant_config in deepseek embedding layer (#12836 )	2025-02-06 21:35:09 -08:00
ZSL98	433c4a4923	Make vllm compatible with verl (#12824 ) Co-authored-by: zhangshulai <zhangshulai@bytedance.com>	2025-02-07 11:54:20 +08:00
Lucas Wilkinson	ef533d25fb	[Bugfix] FA2 illegal memory access (#12848 )	2025-02-06 19:54:07 -08:00
Kevin H. Luu	b260782357	[misc] Revert # 12833 (#12857 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-06 16:29:12 -08:00
Lu Fang	741429a4cd	[MISC] Check space in the file names in the pre commit checks (#12804 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-06 15:36:21 -08:00
Yu Chin Fabian Lim	aff404571b	Add Bamba Model (#10909 ) Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-06 15:22:42 -08:00
Varun Sundar Rabindranath	467a96a541	[V1] LoRA Support (#10957 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-06 09:32:51 -08:00

1 2 3 4 5 ...

4514 Commits