|
|
a5bba7d234
|
[Model] Add Idefics3 support (#9767)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>
|
2024-11-06 11:41:17 +00:00 |
|
|
|
a5fda50a10
|
[CI/Build] Fix large_gpu_mark reason (#10070)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-06 08:50:37 +00:00 |
|
|
|
21063c11c7
|
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-11-06 07:11:55 +00:00 |
|
|
|
4be3a45158
|
[distributed] add function to create ipc buffers directly (#10064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-05 22:35:03 -08:00 |
|
|
|
2bcbae704c
|
[Bugfix] Fix edge-case crash when using chat with the Mistral Tekken Tokenizer (#10051)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-11-06 04:28:29 +00:00 |
|
|
|
0c63c34f72
|
[Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode (#9730)
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2024-11-06 01:45:45 +00:00 |
|
|
|
966e31697b
|
[Bugfix] Fix pickle of input when async output processing is on (#9931)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-11-06 00:39:26 +00:00 |
|
|
|
ca9844b340
|
[bugfix] fix weak ref in piecewise cudagraph and tractable test (#10048)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-05 14:49:20 -08:00 |
|
|
|
235366fe2e
|
[CI] Prune back the number of tests in tests/kernels/* (#9932)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-05 16:02:32 -05:00 |
|
|
|
02462465ea
|
[CI] Prune tests/models/decoder_only/language/* tests (#9940)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-05 16:02:23 -05:00 |
|
|
|
bbc3619dc8
|
[Core] Make encoder-decoder inputs a nested structure to be more composable (#9604)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-05 10:07:31 +08:00 |
|
|
|
ac04a97a9f
|
[Frontend] Add max_tokens prometheus metric (#9881)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-11-04 22:53:24 +00:00 |
|
|
|
5208dc7a20
|
[Bugfix][CI/Build][Hardware][AMD] Shard ID parameters in AMD tests running parallel jobs (#9279)
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>
|
2024-11-04 11:37:46 -08:00 |
|
|
|
1c45f4c385
|
[CI] Basic Integration Test For TPU (#9968)
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-11-04 11:34:26 -08:00 |
|
|
|
ac6b8f19b9
|
[Frontend] Multi-Modality Support for Loading Local Image Files (#9915)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-04 15:34:57 +00:00 |
|
|
|
54597724f4
|
[Model] Add support for H2OVL-Mississippi models (#9747)
Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-11-04 00:15:36 +00:00 |
|
|
|
cea808f325
|
[3/N] model runner pass the whole config to model (#9958)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-02 12:08:49 -07:00 |
|
|
|
e893795443
|
[2/N] executor pass the complete config to worker/modelrunner (#9938)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2024-11-02 07:35:05 -07:00 |
|
|
|
a78dd3303e
|
[Encoder Decoder] Add flash_attn kernel support for encoder-decoder models (#9559)
|
2024-11-01 23:22:49 -07:00 |
|
|
|
6c0b7f548d
|
[Core][VLM] Add precise multi-modal placeholder tracking (#8346)
Signed-off-by: Peter Salas <peter@fixie.ai>
|
2024-11-01 16:21:10 -07:00 |
|
|
|
598b6d7b07
|
[Bugfix/Core] Flashinfer k_scale and v_scale (#9861)
|
2024-11-01 12:15:05 -07:00 |
|
|
|
1dd4cb2935
|
[Bugfix] Fix edge cases for MistralTokenizer (#9625)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2024-11-01 10:33:15 -07:00 |
|
|
|
ba0d892074
|
[Frontend] Use a proper chat template for VLM2Vec (#9912)
|
2024-11-01 14:09:07 +00:00 |
|
|
|
30a2e80742
|
[CI/Build] Add Model Tests for PixtralHF (#9813)
|
2024-11-01 07:55:29 -06:00 |
|
|
|
06386a64dd
|
[Frontend] Chat-based Embeddings API (#9759)
|
2024-11-01 08:13:35 +00:00 |
|
|
|
2b5bf20988
|
[torch.compile] Adding torch compile annotations to some models (#9876)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-01 00:25:47 -07:00 |
|
|
|
566cd27797
|
[torch.compile] rework test plans (#9866)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-31 22:20:17 -07:00 |
|
|
|
96e0c9cbbd
|
[torch.compile] directly register custom op (#9896)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-31 21:56:09 -07:00 |
|
|
|
031a7995f3
|
[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-11-01 01:09:46 +00:00 |
|
|
|
9fb12f7848
|
[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 (#9838)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
|
2024-10-31 20:06:25 +00:00 |
|
|
|
55650c83a0
|
[Bugfix] Fix illegal memory access error with chunked prefill, prefix caching, block manager v2 and xformers enabled together (#9532)
Signed-off-by: sasha0552 <admin@sasha0552.org>
|
2024-10-31 11:46:36 -07:00 |
|
|
|
16b8f7a86f
|
[CI/Build] Add Model Tests for Qwen2-VL (#9846)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-31 09:10:52 -07:00 |
|
|
|
abbfb6134d
|
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837)
|
2024-10-30 18:15:56 -07:00 |
|
|
|
64384bbcdf
|
[torch.compile] upgrade tests (#9858)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-30 16:34:22 -07:00 |
|
|
|
00d91c8a2c
|
[CI/Build] Simplify exception trace in api server tests (#9787)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-10-30 14:52:05 -07:00 |
|
|
|
3b3f1e7436
|
[Bugfix][core] replace heartbeat with pid check (#9818)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-30 09:34:07 -07:00 |
|
|
|
9ff4511e43
|
[Misc] Add chunked-prefill support on FlashInfer. (#9781)
|
2024-10-30 09:33:53 -07:00 |
|
|
|
cc98f1e079
|
[CI/Build] VLM Test Consolidation (#9372)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-30 09:32:17 -07:00 |
|
|
|
ff5ed6e1bc
|
[torch.compile] rework compile control with piecewise cudagraph (#9715)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-29 23:03:49 -07:00 |
|
|
|
882a1ad0de
|
[Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
|
2024-10-29 15:07:37 -07:00 |
|
|
|
67bdf8e523
|
[Bugfix][Frontend] Guard against bad token ids (#9634)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-29 14:13:20 -07:00 |
|
|
|
ab6f981671
|
[CI][Bugfix] Skip chameleon for transformers 4.46.1 (#9808)
|
2024-10-29 11:12:43 -07:00 |
|
|
|
622b7ab955
|
[Hardware] using current_platform.seed_everything (#9785)
Signed-off-by: wangshuai09 <391746016@qq.com>
|
2024-10-29 14:47:44 +00:00 |
|
|
|
ef7865b4f9
|
[Frontend] re-enable multi-modality input in the new beam search implementation (#9427)
Signed-off-by: Qishuai Ferdinandzhong@gmail.com
|
2024-10-29 11:49:47 +00:00 |
|
|
|
5f8d8075f9
|
[Model][VLM] Add multi-video support for LLaVA-Onevision (#8905)
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-28 18:04:10 +00:00 |
|
|
|
32176fee73
|
[torch.compile] support moe models (#9632)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-27 21:58:04 -07:00 |
|
|
|
4e2d95e372
|
[Hardware][ROCM] using current_platform.is_rocm (#9642)
Signed-off-by: wangshuai09 <391746016@qq.com>
|
2024-10-28 04:07:00 +00:00 |
|
|
|
34a9941620
|
[Bugfix] Fix load config when using bools (#9533)
|
2024-10-27 13:46:41 -04:00 |
|
|
|
3cb07a36a2
|
[Misc] Upgrade to pytorch 2.5 (#9588)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-10-27 09:44:24 +00:00 |
|
|
|
6650e6a930
|
[Model] Add classification Task with Qwen2ForSequenceClassification (#9704)
Signed-off-by: Kevin-Yang <ykcha9@gmail.com>
Co-authored-by: Kevin-Yang <ykcha9@gmail.com>
|
2024-10-26 17:53:35 +00:00 |
|