81fbb3655f
[CI/Build] Test both text and token IDs in batched OpenAI Completions API ( #5568 )
2024-06-15 07:29:42 -04:00
0e9164b40a
[mypy] Enable type checking for test directory ( #5017 )
2024-06-15 04:45:31 +00:00
39873476f8
[CI/Build] Simplify OpenAI server setup in tests ( #5100 )
2024-06-13 11:21:53 -07:00
640052b069
[Bugfix][Frontend] Cleanup "fix chat logprobs" ( #5026 )
2024-06-10 22:36:46 -07:00
351d5e7b82
[Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs ( #5312 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-06-11 10:30:31 +08:00
774d1035e4
[Feature][Frontend]: Continued stream_options implementation also in CompletionRequest ( #5319 )
2024-06-10 14:22:09 +00:00
7a9cb294ae
[Frontend] Add OpenAI Vision API Support ( #5237 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-06-07 11:23:32 -07:00
baa15a9ec3
[Feature][Frontend]: Add support for stream_options in ChatCompletionRequest ( #5135 )
2024-06-07 03:29:24 +00:00
828da0d44e
[Frontend] enable passing multiple LoRA adapters at once to generate() ( #5300 )
2024-06-06 15:48:13 -05:00
7b0a0dfb22
[Frontend][Core] Update Outlines Integration from FSM to Guide ( #4109 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Breno Faria <breno.faria@intrafind.com >
2024-06-05 16:49:12 -07:00
06b2550cbb
[Bugfix] Support prompt_logprobs==0 ( #5217 )
2024-06-03 17:59:30 -07:00
f775a07e30
[FRONTEND] OpenAI tools support named functions ( #5032 )
2024-06-03 18:25:29 -05:00
87d41c849d
[BUGFIX] [FRONTEND] Correct chat logprobs ( #5029 )
...
Co-authored-by: Breno Faria <breno.faria@intrafind.com >
2024-05-30 02:52:14 -07:00
5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines ( #4328 )
...
Co-authored-by: Roger Wang <ywang@roblox.com >
2024-05-28 13:29:31 -07:00
52f8107cf2
[Frontend] Support OpenAI batch file format ( #4794 )
...
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com >
2024-05-15 19:13:36 -04:00
fc0d9dfc3a
[Frontend] Re-enable custom roles in Chat Completions API ( #4758 )
2024-05-15 14:58:46 -07:00
350f9e107f
[CI/Build] Move test_utils.py to tests/utils.py ( #4425 )
...
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)
Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
e254497b66
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API ( #3734 )
2024-05-11 11:30:37 -07:00
f12b20decc
[Frontend] Move async logic outside of constructor ( #4674 )
2024-05-08 22:48:33 -07:00
f8e7adda21
Fix/async chat serving ( #2727 )
2024-05-03 11:04:14 -07:00
c47ba4aaa9
[Bugfix] Add validation for seed ( #4529 )
2024-05-01 19:31:22 +00:00
c3845d82dc
Allow user to define whitespace pattern for outlines ( #4305 )
2024-04-30 20:48:39 -07:00
a494140433
[Frontend] Support complex message content for chat completions endpoint ( #3467 )
...
Co-authored-by: Lily Liu <lilyliupku@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2024-04-30 16:28:46 -07:00
8947bc3c15
[Frontend][Bugfix] Disallow extra fields in OpenAI API ( #4355 )
2024-04-27 05:08:24 +00:00
91528575ec
[Frontend] multiple sampling params support ( #3570 )
2024-04-20 00:11:57 -07:00
138485a82d
[Bugfix] Add fix for JSON whitespace ( #4189 )
...
Co-authored-by: Ubuntu <ubuntu@ip-172-31-13-147.ec2.internal >
2024-04-19 20:49:22 -07:00
e1bb2fd52d
[Bugfix] Support logprobs when using guided_json and other constrained decoding fields ( #4149 )
2024-04-18 21:12:55 +00:00
05434764cd
LM Format Enforcer Guided Decoding Support ( #3868 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com >
2024-04-16 05:54:57 +00:00
95e7d4a97c
Fix echo/logprob OpenAI completion bug ( #3441 )
...
Co-authored-by: Dylan Hawk <dylanwawk@gmail.com >
2024-04-11 22:15:50 +00:00
67b4221a61
[Core][5/N] Fully working chunked prefill e2e ( #3884 )
2024-04-10 17:56:48 -07:00
95baec828f
[Core] enable out-of-tree model register ( #3871 )
2024-04-06 17:11:41 -07:00
f510395bbf
[BugFix][Frontend] Fix completion logprobs=0 error ( #3731 )
2024-03-29 09:38:21 -07:00
0b4997e05c
[Bugfix] API stream returning two stops ( #3450 )
...
Co-authored-by: Dylan Hawk <dylanwawk@gmail.com >
2024-03-25 10:14:34 -07:00
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
120157fd2a
Support arbitrary json_object in OpenAI and Context Free Grammar ( #3211 )
2024-03-16 13:35:27 -07:00
2f8844ba08
Re-enable the 80 char line width limit ( #3305 )
2024-03-10 19:49:14 -07:00
22de45235c
Push logprob generation to LLMEngine ( #3065 )
...
Co-authored-by: Avnish Narayan <avnish@anyscale.com >
2024-03-04 19:54:06 +00:00
703e42ee4b
Add guided decoding for OpenAI API server ( #2819 )
...
Co-authored-by: br3no <breno@veltefaria.de >
Co-authored-by: simon-mo <simon.mo@hey.com >
2024-02-29 22:13:08 +00:00
e0ade06d63
Support logit bias for OpenAI API ( #3027 )
2024-02-27 11:51:53 +08:00
70f3e8e3a1
Add LogProbs for Chat Completions in OpenAI ( #2918 )
2024-02-26 10:39:34 +08:00
8f36444c4f
multi-LoRA as extra models in OpenAI server ( #2775 )
...
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py )):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs
no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
3a7dd7e367
Support Batch Completion in Server ( #2529 )
2024-01-24 17:11:07 -08:00
dd7e8f5f64
refactor complemention api for readability ( #2499 )
2024-01-18 16:45:14 -08:00
14cc317ba4
OpenAI Server refactoring ( #2360 )
2024-01-16 21:33:14 -08:00