|
|
06386a64dd
|
[Frontend] Chat-based Embeddings API (#9759)
|
2024-11-01 08:13:35 +00:00 |
|
|
|
031a7995f3
|
[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-11-01 01:09:46 +00:00 |
|
|
|
abbfb6134d
|
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837)
|
2024-10-30 18:15:56 -07:00 |
|
|
|
c2cd1a2142
|
[doc] update pp support (#9853)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-30 13:36:51 -07:00 |
|
|
|
33d257735f
|
[Doc] link bug for multistep guided decoding (#9843)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-30 17:28:29 +00:00 |
|
|
|
882a1ad0de
|
[Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
|
2024-10-29 15:07:37 -07:00 |
|
|
|
33bab41060
|
[Bugfix]: Make chat content text allow type content (#9358)
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
|
2024-10-24 05:05:49 +00:00 |
|
|
|
208cb34c81
|
[Doc]: Update tensorizer docs to include vllm[tensorizer] (#7889)
Co-authored-by: Kaunil Dhruv <dhruv.kaunil@gmail.com>
|
2024-10-22 15:43:25 -07:00 |
|
|
|
32a1ee74a0
|
[Hardware][Intel CPU][DOC] Update docs for CPU backend (#6212)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Gubrud, Aaron D <aaron.d.gubrud@intel.com>
Co-authored-by: adgubrud <96072084+adgubrud@users.noreply.github.com>
|
2024-10-22 10:38:04 -07:00 |
|
|
|
d2b1bf55ec
|
[Frontend][Feature] Add jamba tool parser (#9154)
|
2024-10-18 10:27:48 +00:00 |
|
|
|
8baf85e4e9
|
[Doc] Compatibility matrix for mutual exclusive features (#8512)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-10-11 11:18:50 -07:00 |
|
|
|
acce7630c1
|
Update link to KServe deployment guide (#9173)
|
2024-10-09 03:58:49 +00:00 |
|
|
|
93cf74a8a7
|
[Doc]: Add deploying_with_k8s guide (#8451)
|
2024-10-07 13:31:45 -07:00 |
|
|
|
5df1834895
|
[Bugfix] Fix order of arguments matters in config.yaml (#8960)
|
2024-10-05 17:35:11 +00:00 |
|
|
|
3dbb215b38
|
[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405)
|
2024-10-04 10:36:39 +08:00 |
|
|
|
344cd2b6f4
|
[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-09-26 17:01:42 -07:00 |
|
|
|
2febcf2777
|
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962)
|
2024-09-05 16:25:29 -04:00 |
|
|
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
|
|
058344f89a
|
[Frontend]-config-cli-args (#7737)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
|
2024-08-30 08:21:02 -07:00 |
|
|
|
22b39e11f2
|
llama_index serving integration documentation (#6973)
Co-authored-by: pavanmantha <pavan.mantha@thevaslabs.io>
|
2024-08-14 15:38:37 -07:00 |
|
|
|
fc912e0886
|
[Models] Support Qwen model with PP (#6974)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-08-01 12:40:43 -07:00 |
|
|
|
150a1ffbfd
|
[Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283)
|
2024-07-26 14:39:10 -07:00 |
|
|
|
f3ff63c3f4
|
[doc][distributed] improve multinode serving doc (#6804)
|
2024-07-25 15:38:32 -07:00 |
|
|
|
71950af726
|
[doc][distributed] fix doc argument order (#6691)
|
2024-07-23 08:55:33 -07:00 |
|
|
|
c051bfe4eb
|
[doc][distributed] doc for setting up multi-node environment (#6529)
[doc][distributed] add more doc for setting up multi-node environment (#6529)
|
2024-07-22 21:22:09 -07:00 |
|
|
|
45ceb85a0c
|
[Docs] Update PP docs (#6598)
|
2024-07-19 16:38:21 -07:00 |
|
|
|
a38524f338
|
[DOC] - Add docker image to Cerebrium Integration (#6510)
|
2024-07-17 10:22:53 -07:00 |
|
|
|
5bf35a91e4
|
[Doc][CI/Build] Update docs and tests to use vllm serve (#6431)
|
2024-07-17 07:43:21 +00:00 |
|
|
|
94b82e8c18
|
[doc][distributed] add suggestion for distributed inference (#6418)
|
2024-07-15 09:45:51 -07:00 |
|
|
|
dbfe254eda
|
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-07-14 15:36:43 -07:00 |
|
|
|
673dd4cae9
|
[Docs] Docs update for Pipeline Parallel (#6222)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-07-09 16:24:58 -07:00 |
|
|
|
8e0817c262
|
[Bugfix][Doc] Fix Doc Formatting (#6048)
|
2024-07-01 15:09:11 -07:00 |
|
|
|
83bdcb6ac3
|
add FAQ doc under 'serving' (#5946)
|
2024-07-01 14:11:36 -07:00 |
|
|
|
4050d646e5
|
[doc][misc] remove deprecated api server in doc (#6037)
|
2024-07-01 12:52:43 -04:00 |
|
|
|
3fd02bda51
|
[doc][misc] add note for Kubernetes users (#5916)
|
2024-06-27 10:07:07 -07:00 |
|
|
|
294104c3f9
|
[doc] update usage of env var to avoid conflict (#5873)
|
2024-06-26 17:57:12 -04:00 |
|
|
|
c246212952
|
[doc][faq] add warning to download models for every nodes (#5783)
|
2024-06-24 15:37:42 +08:00 |
|
|
|
e83db9e7e3
|
[Doc] Update docker references (#5614)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-06-19 15:01:45 -07:00 |
|
|
|
2bd231a7b7
|
[Doc] Added cerebrium as Integration option (#5553)
|
2024-06-18 15:56:59 -07:00 |
|
|
|
6e2527a7cb
|
[Doc] Update documentation on Tensorizer (#5471)
|
2024-06-14 11:27:57 -07:00 |
|
|
|
99dac099ab
|
[Core][Doc] Default to multiprocessing for single-node distributed case (#5230)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2024-06-11 11:10:41 -07:00 |
|
|
|
7a9cb294ae
|
[Frontend] Add OpenAI Vision API Support (#5237)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-06-07 11:23:32 -07:00 |
|
|
|
f775a07e30
|
[FRONTEND] OpenAI tools support named functions (#5032)
|
2024-06-03 18:25:29 -05:00 |
|
|
|
429d89720e
|
add doc about serving option on dstack (#3074)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-05-30 10:11:07 -07:00 |
|
|
|
4fbcb0f27e
|
[Doc][Build] update after removing vllm-nccl (#5103)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-05-29 23:51:18 +00:00 |
|
|
|
5ae5ed1e60
|
[Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-05-28 13:29:31 -07:00 |
|
|
|
8e7fb5d43a
|
Support to serve vLLM on Kubernetes with LWS (#4829)
Signed-off-by: kerthcet <kerthcet@gmail.com>
|
2024-05-16 16:37:29 -07:00 |
|
|
|
4bfa7e7f75
|
[Doc] Add API reference for offline inference (#4710)
|
2024-05-13 17:47:42 -07:00 |
|
|
|
a3c124570a
|
[Bugfix] Fix CLI arguments in OpenAI server docs (#4709)
|
2024-05-09 09:53:14 -07:00 |
|
|
|
2d7bce9cd5
|
[Doc] add env vars to the doc (#4572)
|
2024-05-03 05:13:49 +00:00 |
|