|
|
e97f802b2d
|
[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2025-01-23 18:04:03 +00:00 |
|
|
|
528dbcac7d
|
[Model][Bugfix]: correct Aria model output (#12309)
Signed-off-by: xffxff <1247714429@qq.com>
|
2025-01-22 11:39:19 +00:00 |
|
|
|
c222f47992
|
[core][bugfix] configure env var during import vllm (#12209)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-20 19:35:59 +08:00 |
|
|
|
b37d82791e
|
[Model] Upgrade Aria to transformers 4.48 (#12203)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-20 17:58:48 +08:00 |
|
|
|
02798ecabe
|
[Model] Port deepseek-vl2 processor, remove dependency (#12169)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-18 13:59:39 +08:00 |
|
|
|
d75ab55f10
|
[Misc] Add deepseek_vl2 chat template (#12143)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-17 06:34:48 +00:00 |
|
|
|
ebc73f2828
|
[Bugfix] Fix a path bug in disaggregated prefill example script. (#12121)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-01-17 11:12:41 +08:00 |
|
|
|
d06e824006
|
[Bugfix] Set enforce_eager automatically for mllama (#12127)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-16 15:30:08 -05:00 |
|
|
|
62b06ba23d
|
[Model] Add support for deepseek-vl2-tiny model (#12068)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-01-16 17:14:48 +00:00 |
|
|
|
92e793d91a
|
[core] LLM.collective_rpc interface and RLHF example (#12084)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-16 20:19:52 +08:00 |
|
|
|
bf53e0c70b
|
Support torchrun and SPMD-style offline inference (#12071)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-16 19:58:53 +08:00 |
|
|
|
d14e98d924
|
[Model] Support GGUF models newly added in transformers 4.46.0 (#9685)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-01-13 00:13:44 +00:00 |
|
|
|
f967e51f38
|
[Model] Initialize support for Deepseek-VL2 models (#11578)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-01-12 00:17:24 -08:00 |
|
|
|
482cdc494e
|
[Doc] Rename offline inference examples (#11927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-10 23:50:29 +08:00 |
|
|
|
d85c47d6ad
|
Replace "online inference" with "online serving" (#11923)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-10 12:05:56 +00:00 |
|
|
|
9a228348d2
|
[Misc] Provide correct Pixtral-HF chat template (#11891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-09 10:19:37 -07:00 |
|
|
|
aba8d6ee00
|
[Doc] Move examples into categories (#11840)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-08 13:09:53 +00:00 |
|
|
|
91445c7bc8
|
[Bugfix] Fix image input for Pixtral-HF (#11741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-08 10:17:16 +08:00 |
|
|
|
5950f555a1
|
[Doc] Group examples into categories (#11782)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-08 09:20:12 +08:00 |
|
|
|
898cdf033e
|
[CI] Fix neuron CI and run offline tests (#11779)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-01-06 21:36:10 -08:00 |
|
|
|
e1a5c2f0a1
|
[Model] Whisper model implementation (#11280)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2025-01-03 16:39:19 +08:00 |
|
|
|
68d37809b9
|
[Misc] Minimum requirements for SageMaker compatibility (#11576)
|
2025-01-02 15:59:25 -08:00 |
|
|
|
e7c7c5e822
|
[V1][VLM] V1 support for selected single-image models. (#11632)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-31 21:17:22 +00:00 |
|
|
|
a60731247f
|
[Doc] Update mllama example based on official doc (#11567)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2024-12-28 00:31:10 +00:00 |
|
|
|
b85a977822
|
[Doc] Add video example to openai client for multimodal (#11521)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-26 17:31:29 +00:00 |
|
|
|
9edca6bf8f
|
[Frontend] Online Pooling API (#11457)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 17:54:30 +08:00 |
|
|
|
e24113a8fe
|
[Model] Refactor Qwen2-VL to use merged multimodal processor (#11258)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 16:28:00 +00:00 |
|
|
|
5aef49806d
|
[Feature] Add load generation config from model (#11164)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-12-19 10:50:38 +00:00 |
|
|
|
6142ef0ada
|
[VLM] Merged multimodal processor for Qwen2-Audio (#11303)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-19 06:14:17 +00:00 |
|
|
|
fdea8ec167
|
[V1] VLM - enable processor cache by default (#11305)
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
|
2024-12-18 18:54:46 -05:00 |
|
|
|
66d4b16724
|
[Frontend] Add OpenAI API support for input_audio (#11027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-16 22:09:58 -08:00 |
|
|
|
efbce85f4d
|
[misc] Layerwise profile updates (#10242)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-16 18:14:57 +00:00 |
|
|
|
2ca830dbaa
|
[Doc] Reorder vision language examples in alphabet order (#11228)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-12-16 11:23:33 +00:00 |
|
|
|
d927dbcd88
|
[Model] Refactor Ultravox to use merged input processor (#11198)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-16 10:09:53 +00:00 |
|
|
|
b10609e6a1
|
[Misc] Clean up multi-modal processor (#11207)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-15 06:30:28 +00:00 |
|
|
|
93abf23a64
|
[VLM] Fully dynamic prompt replacement in merged input processor (#11199)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-14 17:52:18 +00:00 |
|
|
|
0920ab9131
|
[Doc] Reorganize online pooling APIs (#11172)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-14 00:22:22 +08:00 |
|
|
|
eeec9e3390
|
[Frontend] Separate pooling APIs in offline inference (#11129)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-13 10:40:07 +00:00 |
|
|
|
7cd7409142
|
PaliGemma 2 support (#11142)
|
2024-12-13 07:40:07 +00:00 |
|
|
|
4816d20aa4
|
[V1] Fix torch profiling for offline inference (#11125)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-12 15:51:53 +00:00 |
|
|
|
4e11683368
|
[V1] VLM preprocessor hashing (#11020)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-12 00:55:30 +00:00 |
|
|
|
8f10d5e393
|
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 01:28:00 -08:00 |
|
|
|
fe2e10c71b
|
Add example of helm chart for vllm deployment on k8s (#9199)
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
|
2024-12-10 09:19:27 +00:00 |
|
|
|
39e227c7ae
|
[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:10:05 +00:00 |
|
|
|
1c768fe537
|
[Doc] Explicitly state that InternVL 2.5 is supported (#10978)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 16:58:02 +00:00 |
|
|
|
39c89e71a8
|
[Misc] Update llama 3.2 template to support system prompt with images (#10901)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-12-05 05:54:06 +00:00 |
|
|
|
0590ec3fd9
|
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2024-12-01 19:01:00 -06:00 |
|
|
|
d2f058e76c
|
[Misc] Rename embedding classes to pooling (#10801)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 14:36:51 +08:00 |
|
|
|
a6760f6456
|
[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228)
Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-11-25 18:32:39 -08:00 |
|
|
|
b1d920531f
|
[Model]: Add support for Aria model (#10514)
Signed-off-by: xffxff <1247714429@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-11-25 18:10:55 +00:00 |
|