|
|
f19da64871
|
[Core] Refactor GGUF parameters packing and forwarding (#8859)
|
2024-10-07 10:01:46 +00:00 |
|
|
|
4f95ffee6f
|
[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089)
|
2024-10-07 06:50:35 +00:00 |
|
|
|
8c6de96ea1
|
[Model] Explicit interface for vLLM models and support OOT embedding models (#9108)
|
2024-10-07 06:10:35 +00:00 |
|
|
|
cfadb9c687
|
[Bugfix] Deprecate registration of custom configs to huggingface (#9083)
|
2024-10-05 21:56:40 +08:00 |
|
|
|
15986f598c
|
[Model] Support Gemma2 embedding model (#9004)
|
2024-10-05 06:57:05 +00:00 |
|
|
|
26aa325f4f
|
[Core][VLM] Test registration for OOT multimodal models (#8717)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:38:25 -07:00 |
|
|
|
0e36fd4909
|
[Misc] Move registry to its own file (#9064)
|
2024-10-04 10:01:37 +00:00 |
|
|
|
0f6d7a9a34
|
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:56:58 +08:00 |
|
|
|
19f0d25796
|
[Model] Adding Granite MoE. (#8206)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-03 09:33:57 +08:00 |
|
|
|
f13a07b1f8
|
[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533)
|
2024-09-29 17:35:58 -04:00 |
|
|
|
26a68d5d7e
|
[CI/Build] Add test decorator for minimum GPU memory (#8925)
|
2024-09-29 02:50:51 +00:00 |
|
|
|
e1a3f5e831
|
[CI/Build] Update models tests & examples (#8874)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-28 09:54:35 -07:00 |
|
|
|
6d792d2f31
|
[Bugfix][VLM] Fix Fuyu batching inference with max_num_seqs>1 (#8892)
|
2024-09-27 01:15:58 -07:00 |
|
|
|
4b377d6feb
|
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829)
|
2024-09-26 16:46:43 -07:00 |
|
|
|
770ec6024f
|
[Model] Add support for the multi-modal Llama 3.2 model (#8811)
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-25 13:29:32 -07:00 |
|
|
|
8ff7ced996
|
[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-24 07:36:46 +00:00 |
|
|
|
9b8c8ba119
|
[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-09-23 07:44:48 +00:00 |
|
|
|
5b59532760
|
[Model][VLM] Add LLaVA-Onevision model support (#8486)
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-22 10:51:44 -07:00 |
|
|
|
b4e4eda92e
|
[Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640)
|
2024-09-20 14:33:03 -07:00 |
|
|
|
a8c1d161a7
|
[Core] *Prompt* logprobs support in Multi-step (#8199)
|
2024-09-18 08:38:43 -07:00 |
|
|
|
6ffa3f314c
|
[CI/Build] Avoid CUDA initialization (#8534)
|
2024-09-18 10:38:11 +00:00 |
|
|
|
a54ed80249
|
[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-17 17:50:37 +00:00 |
|
|
|
8a0cf1ddc3
|
[Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-14 14:50:26 +00:00 |
|
|
|
a84e598e21
|
[CI/Build] Reorganize models tests (#7820)
|
2024-09-13 10:20:06 -07:00 |
|
|
|
8427550488
|
[CI/Build] Update pixtral tests to use JSON (#8436)
|
2024-09-13 03:47:52 +00:00 |
|
|
|
d31174a4e1
|
[Hotfix][Pixtral] Fix multiple images bugs (#8415)
|
2024-09-12 15:21:51 -07:00 |
|
|
|
c6202daeed
|
[Model] Support multiple images for qwen-vl (#8247)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-12 10:10:54 -07:00 |
|
|
|
e56bf27741
|
[Bugfix] Fix InternVL2 inference with various num_patches (#8375)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-12 10:10:35 -07:00 |
|
|
|
d394787e52
|
Pixtral (#8377)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-11 14:41:55 -07:00 |
|
|
|
73202dbe77
|
[Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2024-09-11 12:52:19 -07:00 |
|
|
|
3b7fea770f
|
[Model][VLM] Add Qwen2-VL model support (#7905)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-11 09:31:19 -07:00 |
|
|
|
6a512a00df
|
[model] Support for Llava-Next-Video model (#7559)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-10 22:21:36 -07:00 |
|
|
|
efcf946a15
|
[Hardware][NV] Add support for ModelOpt static scaling checkpoints. (#6112)
|
2024-09-11 00:38:40 -04:00 |
|
|
|
e807125936
|
[Model][VLM] Support multi-images inputs for InternVL2 models (#8201)
|
2024-09-07 16:38:23 +08:00 |
|
|
|
2f707fcb35
|
[Model] Multi-input support for LLaVA (#8238)
|
2024-09-07 02:57:24 +00:00 |
|
|
|
29f49cd6e3
|
[Model] Allow loading from original Mistral format (#8168)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-09-06 17:02:05 -06:00 |
|
|
|
9da25a88aa
|
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-05 12:48:10 +00:00 |
|
|
|
2ad2e5608e
|
[MISC] Consolidate FP8 kv-cache tests (#8131)
|
2024-09-04 18:53:25 +00:00 |
|
|
|
2be8ec6e71
|
[Model] Add Ultravox support for multiple audio chunks (#7963)
|
2024-09-04 04:38:21 +00:00 |
|
|
|
f8d60145b4
|
[Model] Add Granite model (#7436)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-09-01 18:37:18 -07:00 |
|
|
|
5b86b19954
|
[Misc] Optional installation of audio related packages (#8063)
|
2024-09-01 14:46:57 -07:00 |
|
|
|
622f8abff8
|
[Bugfix] bugfix and add model test for flashinfer fp8 kv cache. (#8013)
|
2024-08-30 22:18:50 -07:00 |
|
|
|
1248e8506a
|
[Model] Adding support for MSFT Phi-3.5-MoE (#7729)
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
|
2024-08-30 13:42:57 -06:00 |
|
|
|
f97be32d1d
|
[VLM][Model] TP support for ViTs (#7186)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-08-30 08:19:27 -07:00 |
|
|
|
428dd1445e
|
[Core] Logprobs support in Multi-step (#7652)
|
2024-08-29 19:19:08 -07:00 |
|
|
|
4abed65c58
|
[VLM] Disallow overflowing max_model_len for multimodal models (#7998)
|
2024-08-29 17:49:04 -07:00 |
|
|
|
5340a2dccf
|
[Model] Add multi-image input support for LLaVA-Next offline inference (#7230)
|
2024-08-28 07:09:02 +08:00 |
|
|
|
9db642138b
|
[CI/Build][VLM] Cleanup multiple images inputs model test (#7897)
|
2024-08-27 15:28:30 +00:00 |
|
|
|
6fc4e6e07a
|
[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739)
|
2024-08-27 12:40:02 +00:00 |
|
|
|
2059b8d9ca
|
[Misc] Remove snapshot_download usage in InternVL2 test (#7835)
|
2024-08-25 15:53:09 +00:00 |
|