Commit Graph

52 Commits

Author SHA1 Message Date
a5fda50a10 [CI/Build] Fix large_gpu_mark reason (#10070)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-06 08:50:37 +00:00
566cd27797 [torch.compile] rework test plans (#9866)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-31 22:20:17 -07:00
00d91c8a2c [CI/Build] Simplify exception trace in api server tests (#9787)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-30 14:52:05 -07:00
cc98f1e079 [CI/Build] VLM Test Consolidation (#9372)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-30 09:32:17 -07:00
4e2d95e372 [Hardware][ROCM] using current_platform.is_rocm (#9642)
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-28 04:07:00 +00:00
f085995a7b [CI/Build] Remove unnecessary fork_new_process (#9484) 2024-10-21 19:47:29 -07:00
051eaf6db3 [Model] Add user-configurable task for models that support both generation and embedding (#9424) 2024-10-18 11:31:58 -07:00
81ede99ca4 [Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
f3a507f1d3 [Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149) 2024-10-10 14:17:17 +08:00
c8627cd41b [ci][test] use load dummy for testing (#9165) 2024-10-09 00:38:40 -07:00
04c12f8157 [misc] update utils to support comparing multiple settings (#9140) 2024-10-08 02:51:49 +00:00
b22b798471 [Model] PP support for embedding models and update docs (#9090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-10-06 16:35:27 +08:00
0f6d7a9a34 [Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
26a68d5d7e [CI/Build] Add test decorator for minimum GPU memory (#8925) 2024-09-29 02:50:51 +00:00
19d02ff938 [Bugfix] Fix PP for Multi-Step (#8887) 2024-09-28 08:52:46 -07:00
a8c1d161a7 [Core] *Prompt* logprobs support in Multi-step (#8199) 2024-09-18 08:38:43 -07:00
7c7714d856 [Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
a84e598e21 [CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00
40c396533d [Bugfix] Mapping physical device indices for e2e test utils (#8290) 2024-09-13 11:06:28 +08:00
9f68e00d27 [Bugfix] Fix broken OpenAI tensorizer test (#8258) 2024-09-07 08:02:39 +00:00
8685ba1a1e Inclusion of InternVLChatModel In PP_SUPPORTED_MODELS(Pipeline Parallelism) (#7860) 2024-09-05 11:33:37 +00:00
428dd1445e [Core] Logprobs support in Multi-step (#7652) 2024-08-29 19:19:08 -07:00
39178c7fbc [Tests] Disable retries and use context manager for openai client (#7565) 2024-08-26 21:33:17 -07:00
029c71de11 [CI/Build] Avoid downloading all HF files in RemoteOpenAIServer (#7836) 2024-08-26 05:31:10 +00:00
aab0fcdb63 [ci][test] fix RemoteOpenAIServer (#7838) 2024-08-24 17:31:28 +00:00
ea9fa160e3 [ci][test] exclude model download time in server start time (#7834) 2024-08-24 01:03:27 -07:00
9e51b6a626 [ci][test] adjust max wait time for cpu offloading test (#7709) 2024-08-20 17:12:44 -07:00
832163b875 [ci][test] allow longer wait time for api server (#7629) 2024-08-17 11:26:38 -07:00
4706eb628e [aDAG] Unflake aDAG + PP tests (#7600) 2024-08-16 20:49:30 -07:00
ea49e6a3c8 [misc][ci] fix cpu test with plugins (#7489) 2024-08-13 19:27:46 -07:00
9ba85bc152 [mypy] Misc. typing improvements (#7417) 2024-08-13 09:20:20 +08:00
66d617e343 [Frontend] Gracefully handle missing chat template and fix CI failure (#7238)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-08-07 09:12:05 +00:00
f9a5600649 [Bugfix] Fix GPTQ and GPTQ Marlin CPU Offloading (#7225) 2024-08-06 18:34:26 -07:00
44dcb52e39 [ci][test] finalize fork_new_process_for_each_test (#7114) 2024-08-03 10:44:53 -07:00
69ea15e5cc [ci][distributed] shorten wait time if server hangs (#7098) 2024-08-02 21:05:16 -07:00
05308891e2 [Core] Pipeline parallel with Ray ADAG (#6837)
Support pipeline-parallelism with Ray accelerated DAG.

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-02 13:55:40 -07:00
252357793d [ci][distributed] try to fix pp test (#7054) 2024-08-01 22:03:12 -07:00
5cf9254a9c [BugFix] Fix use of per-request seed with pipeline parallel (#6698) 2024-07-30 10:40:08 -07:00
f53b8f0d05 [ci][test] add correctness test for cpu offloading (#6549) 2024-07-18 23:41:06 +00:00
5bf35a91e4 [Doc][CI/Build] Update docs and tests to use vllm serve (#6431) 2024-07-17 07:43:21 +00:00
dbfe254eda [Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-07-14 15:36:43 -07:00
41708e5034 [ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-12 21:51:48 -07:00
c5832d2ae9 [Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 10:58:08 -07:00
9d47f64eb6 [CI/Build] [3/3] Reorganize entrypoints tests (#5966) 2024-06-30 12:58:49 +08:00
dd793d1de5 [Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422) 2024-06-25 15:56:15 -07:00
5d4d90536f [Distributed] Add send and recv helpers (#5719) 2024-06-23 14:42:28 -07:00
78687504f7 [Bugfix] AsyncLLMEngine hangs with asyncio.run (#5654) 2024-06-19 13:57:12 -07:00
0e9164b40a [mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
39873476f8 [CI/Build] Simplify OpenAI server setup in tests (#5100) 2024-06-13 11:21:53 -07:00
f775a07e30 [FRONTEND] OpenAI tools support named functions (#5032) 2024-06-03 18:25:29 -05:00