Commit Graph

33 Commits

Author SHA1 Message Date
e97f802b2d [FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-01-23 18:04:03 +00:00
64ea24d0b3 [ci/lint] Add back default arg for pre-commit (#12279)
Signed-off-by: kevin <kevin@anyscale.com>
2025-01-22 01:15:27 +00:00
5fe6bf29d6 [BugFix] Fix GGUF tp>1 when vocab_size is not divisible by 64 (#12230)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-01-21 12:23:14 +08:00
59a0192fb9 [Core] Interface for accessing model from VllmRunner (#10353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 15:00:59 +08:00
d14e98d924 [Model] Support GGUF models newly added in transformers 4.46.0 (#9685)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-13 00:13:44 +00:00
aa1e77a19c [Hardware][CPU] Support MOE models on x86 CPU (#11831)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-01-10 11:07:58 -05:00
8b79f9e107 [Bugfix] Fix guided decoding with tokenizer mode mistral (#11046) 2024-12-17 22:34:08 -08:00
be39e3cd18 [core] clean up cudagraph batchsize padding logic (#10996)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-13 06:57:50 +00:00
dc5ce861bf [torch.compile] remove compilation_context and simplify code (#10838)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-03 06:19:02 +00:00
197b4484a3 [Bugfix][Mamba] Fix Multistep on Mamba-like models (#10705)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
2024-11-27 19:02:27 +00:00
b40cf6402e [Model] Support Qwen2 embeddings and use tags to select model tests (#10184) 2024-11-14 20:23:09 -08:00
11cd1ae6ad [Tool parsing] Improve / correct mistral tool parsing (#10333) 2024-11-15 00:42:49 +00:00
51c2e1fcef [CI/Build] Split up models tests (#10069)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 11:39:14 -08:00
a4b3e0c1e9 [Hardware][CPU] Update torch 2.5 (#9911)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2024-11-07 04:43:08 +00:00
2bcbae704c [Bugfix] Fix edge-case crash when using chat with the Mistral Tekken Tokenizer (#10051)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-11-06 04:28:29 +00:00
02462465ea [CI] Prune tests/models/decoder_only/language/* tests (#9940)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-05 16:02:23 -05:00
cc98f1e079 [CI/Build] VLM Test Consolidation (#9372)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-30 09:32:17 -07:00
3cb07a36a2 [Misc] Upgrade to pytorch 2.5 (#9588)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-27 09:44:24 +00:00
3ddbe25502 [Hardware][CPU] using current_platform.is_cpu (#9536) 2024-10-22 00:50:43 -07:00
f6b97293aa [Model] FalconMamba Support (#9325) 2024-10-21 12:50:16 -04:00
696b01af8f [CI/Build] Split up decoder-only LM tests (#9488)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-20 21:27:50 -07:00
fb60ae9b91 [Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189) 2024-10-16 12:12:43 -04:00
7342a7d7f8 [Model] Support Mamba (#6484) 2024-10-11 15:40:06 +00:00
f19da64871 [Core] Refactor GGUF parameters packing and forwarding (#8859) 2024-10-07 10:01:46 +00:00
19f0d25796 [Model] Adding Granite MoE. (#8206)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-03 09:33:57 +08:00
f13a07b1f8 [Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533) 2024-09-29 17:35:58 -04:00
26a68d5d7e [CI/Build] Add test decorator for minimum GPU memory (#8925) 2024-09-29 02:50:51 +00:00
4b377d6feb [BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) 2024-09-26 16:46:43 -07:00
b4e4eda92e [Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640) 2024-09-20 14:33:03 -07:00
6ffa3f314c [CI/Build] Avoid CUDA initialization (#8534) 2024-09-18 10:38:11 +00:00
a54ed80249 [Model] Add mistral function calling format to all models loaded with "mistral" format (#8515)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-17 17:50:37 +00:00
8a0cf1ddc3 [Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
a84e598e21 [CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00