Commit Graph

69 Commits

Author SHA1 Message Date
3a500cd0b6 [doc] miss result (#17589)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-02 07:04:49 -07:00
6d1479ca4b [doc] add the print result (#17584)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-02 05:24:45 -07:00
98060b001d [Feature][Frontend]: Deprecate --enable-reasoning (#17452)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-05-01 06:46:16 -07:00
1144a8efe7 [Bugfix] Temporarily disable gptq_bitblas on ROCm (#17411)
Signed-off-by: Yan Cangang <nalanzeyu@gmail.com>
2025-04-30 19:51:45 -07:00
2ac74d098e [doc] add install tips (#17373)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-30 17:02:41 +00:00
a39203f99e [Bugfix] add qwen3 reasoning-parser fix content is None when disable … (#17369)
Signed-off-by: mofanke <mofanke@gmail.com>
2025-04-29 16:32:40 +00:00
3ad986c28b [doc] update wrong model id (#17287)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-28 04:20:51 -07:00
52b4f4a8d7 [Docs] Update structured output doc for V1 (#17135)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-26 15:12:18 +00:00
df5c879527 [doc] update wrong hf model links (#17184)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-25 16:40:54 +00:00
f851b84266 [Doc] Add two links to disagg_prefill.md (#17168)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-25 10:23:57 +00:00
ef19e67d2c [Doc] Add headings to improve gptqmodel.md (#17164)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-25 01:13:13 -07:00
05e1fbfc52 Add chat template for Llama 4 models (#16428)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-04-24 20:19:36 +00:00
9c1244de57 [doc] update to hyperlink (#17096)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-24 00:58:08 -07:00
db2f8d915c [V1] Update structured output (#16812)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-23 23:57:17 -07:00
f7912cba3d [Doc] Add top anchor and a note to quantization/bitblas.md (#17042)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-23 07:32:16 -07:00
8d32dc603d [Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036)
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>
2025-04-22 09:01:36 +01:00
490b1698a5 [Doc] Updated Llama section in tool calling docs to have llama 3.2 config info (#16857)
Signed-off-by: jmho <jaylenho734@gmail.com>
2025-04-18 23:28:53 +00:00
fdcb850f14 [Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546)
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local>
2025-04-15 22:31:38 +00:00
16eda8c43a [Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>
2025-04-12 06:26:17 +08:00
ed37599544 Update supported_hardware.md for TPU INT8 (#16437) 2025-04-11 12:28:07 +08:00
652907b354 Torchao (#14231)
Signed-off-by: drisspg <drisspguessous@gmail.com>
2025-04-07 19:39:28 -04:00
95d63f38c0 doc: fix some typos in doc (#16154)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-07 05:32:06 +00:00
4285e423a6 [Misc] Auto detect bitsandbytes pre-quantized models (#16027)
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
2025-04-04 23:30:45 -07:00
cefb9e5a28 [Frontend] Implement Tool Calling with tool_choice='required' (#13483)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2025-04-02 07:45:45 -07:00
2041c0e360 [Doc] Quark quantization documentation (#15861)
Signed-off-by: chaow <chaow@amd.com>
2025-04-01 08:32:45 -07:00
239b7befdd [V1][Spec Decode] Remove deprecated spec decode config params (#15466)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-03-31 09:19:35 -07:00
762b424a52 [Docs] Document v0 engine support in reasoning outputs (#15739)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-03-29 03:46:57 +00:00
1711b929b6 [Model] Add Reasoning Parser for Granite Models (#14202)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
2025-03-26 14:28:07 +00:00
3892e58ad7 [Misc] Upgrade BNB version (#15183) 2025-03-24 05:51:42 +00:00
d6cd59f122 [Frontend] Support tool calling and reasoning parser (#14511)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-23 14:00:07 -07:00
50c9636d87 [V1][Usage] Refactor speculative decoding configuration and tests (#14434)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-03-22 19:28:10 -10:00
10f55fe6c5 [Misc] Clean up the BitsAndBytes arguments (#15140)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-20 19:17:12 -07:00
9ed6ee92d6 [Bugfix] EAGLE output norm bug (#14464)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
2025-03-15 06:50:33 +00:00
3fb17d26c8 [Doc] Fix typo in documentation (#14783)
Signed-off-by: yasu52 <tsuguro4649@gmail.com>
2025-03-13 20:33:09 -07:00
c908a07f57 [Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-08 07:07:32 +00:00
7b6fd6e486 [Doc]add doc for Qwen models tool calling (#14478)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-08 06:58:46 +00:00
0ddc991f5c [Doc] Update reasoning with stream example to use OpenAI library (#14077)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
2025-03-06 13:20:37 +00:00
f5f7f00cd9 [Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 (#14114) 2025-03-06 03:49:20 +00:00
cd1d3c3df8 [Docs] Add GPTQModel (#14056)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-03-03 21:59:09 +00:00
cf069aa8aa Update deprecated Python 3.8 typing (#13971) 2025-03-02 17:34:51 -08:00
bf33700ecd [v0][structured output] Support reasoning output (#12955)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-03-02 14:49:42 -05:00
f58f8b5c96 Update AutoAWQ docs (#14042)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-28 15:20:29 +00:00
7f0be2aa24 [Model] Deepseek GGUF support (#13167) 2025-02-27 02:08:35 -08:00
5157338ed9 [Misc] Improve LoRA spelling (#13831) 2025-02-25 23:43:01 -08:00
bfbc0b32c6 [Frontend] Add backend-specific options for guided decoding (#13505)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-20 15:07:58 -05:00
00b69c2d27 [Misc] Remove dangling references to --use-v2-block-manager (#13492)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-19 03:37:26 +00:00
2358ca527b [Doc]: Improve feature tables (#13224)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-18 18:52:39 +08:00
46cdd59577 [Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-02-16 19:32:26 -08:00
51f0b5f7f6 [Bugfix] Clean up and fix multi-modal processors (#13012)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-10 10:45:21 +00:00
243137143c [Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-10 06:09:33 +00:00