d4d93db2c5
[V1] V1 Enablement Oracle ( #13726 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2025-03-14 22:02:20 -07:00
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
df5dafaa5b
[Misc] Remove deprecated code ( #12383 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-01-24 14:45:20 -05:00
d11bf435a0
[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py ( #9510 )
2024-10-18 14:30:55 -07:00
3b00b9c26c
[Core] renamePromptInputs and inputs ( #8876 )
2024-09-26 20:35:15 -07:00
4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility ( #8760 ) ( #8810 )
2024-09-25 10:36:26 -07:00
28e1299e60
rename PromptInputs and inputs with backward compatibility ( #8760 )
2024-09-25 09:36:47 -07:00
7c7714d856
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH ( #8157 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2024-09-18 13:56:58 +00:00
acd5511b6d
[BugFix] Fix clean shutdown issues ( #8492 )
2024-09-16 09:33:46 -07:00
18e9e1f7b3
[HotFix] Fix final output truncation with stop string + streaming ( #8468 )
2024-09-13 11:31:12 -07:00
551ce01078
[Core] Add engine option to return only deltas or final output ( #7381 )
2024-09-12 12:02:00 -07:00
f842a7aff1
[misc] remove engine_use_ray ( #8126 )
2024-09-11 18:23:36 -07:00
8c054b7a62
[Frontend] Clean up type annotations for mistral tokenizer ( #8314 )
2024-09-10 16:49:11 +00:00
39178c7fbc
[Tests] Disable retries and use context manager for openai client ( #7565 )
2024-08-26 21:33:17 -07:00
c75363fbc0
[BugFix] Avoid premature async generator exit and raise all exception variations ( #7698 )
2024-08-21 11:45:55 -04:00
70b746efcf
[Misc] Deprecation Warning when setting --engine-use-ray ( #7424 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Nick Hill <nickhill@us.ibm.com >
Co-authored-by: youkaichao <youkaichao@126.com >
2024-08-14 09:44:27 -07:00
b4e9528f95
[Core] Streamline stream termination in AsyncLLMEngine ( #7336 )
2024-08-09 07:06:36 +00:00
66d617e343
[Frontend] Gracefully handle missing chat template and fix CI failure ( #7238 )
...
Co-authored-by: Roger Wang <ywang@roblox.com >
2024-08-07 09:12:05 +00:00
9a3f49ae07
[BugFix] Overhaul async request cancellation ( #7111 )
2024-08-07 13:21:41 +08:00
d7f4178dd9
[Frontend] Move chat utils ( #6602 )
...
Co-authored-by: Roger Wang <ywang@roblox.com >
2024-07-21 08:38:17 +08:00
e2fbaee725
[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs ( #6227 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2024-07-18 15:13:30 +08:00
5bf35a91e4
[Doc][CI/Build] Update docs and tests to use vllm serve ( #6431 )
2024-07-17 07:43:21 +00:00
7a3d2a5b95
[Frontend] Support for chat completions input in the tokenize endpoint ( #5923 )
2024-07-16 20:18:09 +08:00
41708e5034
[ci] try to add multi-node tests ( #6280 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai >
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai >
2024-07-12 21:51:48 -07:00
c5832d2ae9
[Core] Pipeline Parallel Support ( #4412 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai >
2024-07-02 10:58:08 -07:00
dd793d1de5
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes ( #5422 )
2024-06-25 15:56:15 -07:00
8065a7e220
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names ( #5718 )
2024-06-20 17:00:13 -06:00
78687504f7
[Bugfix] AsyncLLMEngine hangs with asyncio.run ( #5654 )
2024-06-19 13:57:12 -07:00
39873476f8
[CI/Build] Simplify OpenAI server setup in tests ( #5100 )
2024-06-13 11:21:53 -07:00
640052b069
[Bugfix][Frontend] Cleanup "fix chat logprobs" ( #5026 )
2024-06-10 22:36:46 -07:00
87d41c849d
[BUGFIX] [FRONTEND] Correct chat logprobs ( #5029 )
...
Co-authored-by: Breno Faria <breno.faria@intrafind.com >
2024-05-30 02:52:14 -07:00
eecd864388
[Bugfix][CI/Build] Fix test and improve code for merge_async_iterators ( #5096 )
2024-05-29 16:02:25 -07:00
5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines ( #4328 )
...
Co-authored-by: Roger Wang <ywang@roblox.com >
2024-05-28 13:29:31 -07:00
350f9e107f
[CI/Build] Move test_utils.py to tests/utils.py ( #4425 )
...
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)
Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
f12b20decc
[Frontend] Move async logic outside of constructor ( #4674 )
2024-05-08 22:48:33 -07:00
f8e7adda21
Fix/async chat serving ( #2727 )
2024-05-03 11:04:14 -07:00
dfea173148
[Bugfix] Abort requests when the connection to /v1/completions is interrupted ( #4363 )
2024-04-27 09:48:37 -07:00
7134303cbb
[Bugfix][Core] Fix get decoding config from ray ( #4335 )
2024-04-27 11:30:08 +00:00
1e8f4252aa
[Bugfix][Frontend] Raise exception when file-like chat template fails to be opened ( #4292 )
2024-04-23 18:19:03 +00:00
4e7ee664e2
[Core] Fix engine-use-ray broken ( #4105 )
2024-04-16 05:24:53 +00:00
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
fb96c1e98c
Asynchronous tokenization ( #2879 )
2024-03-15 23:37:01 +00:00
2f8844ba08
Re-enable the 80 char line width limit ( #3305 )
2024-03-10 19:49:14 -07:00
9e8744a545
[BugFix] Fix get tokenizer when using ray ( #3301 )
2024-03-10 19:17:16 -07:00
ff578cae54
Add health check, make async Engine more robust ( #3015 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
2024-03-04 22:01:40 +00:00
017d9f1515
Add metrics to RequestOutput ( #2876 )
2024-02-20 21:55:57 -08:00
9b945daaf1
[Experimental] Add multi-LoRA support ( #1804 )
...
Co-authored-by: Chen Shen <scv119@gmail.com >
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com >
Co-authored-by: Avnish Narayan <avnish@anyscale.com >
2024-01-23 15:26:37 -08:00
14cc317ba4
OpenAI Server refactoring ( #2360 )
2024-01-16 21:33:14 -08:00
6e01e8c1c8
[CI] Add Buildkite ( #2355 )
2024-01-14 12:37:58 -08:00