Commit Graph

28 Commits

Author SHA1 Message Date
87d41c849d [BUGFIX] [FRONTEND] Correct chat logprobs (#5029)
Co-authored-by: Breno Faria <breno.faria@intrafind.com>
2024-05-30 02:52:14 -07:00
eecd864388 [Bugfix][CI/Build] Fix test and improve code for merge_async_iterators (#5096) 2024-05-29 16:02:25 -07:00
5ae5ed1e60 [Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
350f9e107f [CI/Build] Move test_utils.py to tests/utils.py (#4425)
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)

Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
f12b20decc [Frontend] Move async logic outside of constructor (#4674) 2024-05-08 22:48:33 -07:00
f8e7adda21 Fix/async chat serving (#2727) 2024-05-03 11:04:14 -07:00
dfea173148 [Bugfix] Abort requests when the connection to /v1/completions is interrupted (#4363) 2024-04-27 09:48:37 -07:00
Roy
7134303cbb [Bugfix][Core] Fix get decoding config from ray (#4335) 2024-04-27 11:30:08 +00:00
1e8f4252aa [Bugfix][Frontend] Raise exception when file-like chat template fails to be opened (#4292) 2024-04-23 18:19:03 +00:00
4e7ee664e2 [Core] Fix engine-use-ray broken (#4105) 2024-04-16 05:24:53 +00:00
01bfb22b41 [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
fb96c1e98c Asynchronous tokenization (#2879) 2024-03-15 23:37:01 +00:00
2f8844ba08 Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
Roy
9e8744a545 [BugFix] Fix get tokenizer when using ray (#3301) 2024-03-10 19:17:16 -07:00
ff578cae54 Add health check, make async Engine more robust (#3015)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-03-04 22:01:40 +00:00
017d9f1515 Add metrics to RequestOutput (#2876) 2024-02-20 21:55:57 -08:00
9b945daaf1 [Experimental] Add multi-LoRA support (#1804)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
2024-01-23 15:26:37 -08:00
14cc317ba4 OpenAI Server refactoring (#2360) 2024-01-16 21:33:14 -08:00
6e01e8c1c8 [CI] Add Buildkite (#2355) 2024-01-14 12:37:58 -08:00
fd4ea8ef5c Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
66b108d142 [BUGFIX] Fix API server test (#2270) 2023-12-26 10:37:06 -08:00
66785cc05c Support chat template and echo for chat API (#1756) 2023-11-30 16:43:13 -08:00
5ffc0d13a2 Migrate linter from pylint to ruff (#1665) 2023-11-20 11:58:01 -08:00
9d9072a069 Implement prompt logprobs & Batched topk for computing logprobs (#1328)
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
2023-10-16 10:56:50 -07:00
ba0bfd40e2 TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-02 15:36:09 -07:00
ff36139ffc Remove AsyncLLMEngine busy loop, shield background task (#1059) 2023-09-17 00:29:08 -07:00
080438477f Start background task in AsyncLLMEngine.generate (#988)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-08 00:03:39 -07:00
c07ece5ca4 Make AsyncLLMEngine more robust & fix batched abort (#969)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
2023-09-07 13:43:45 -07:00