6a0f617210
[Core] Fix circular reference which leaked llm instance in local dev env ( #4737 )
...
Storing exception frame is extremely prone to circular refernece because it contains the reference to objects.
When tensorizer is not installed, it leaks llm instance because error frame has references to various modules which cause circular reference problem.
I also found spec decoding has a circular reference issue, and I solved it using weakref.proxy.
2024-05-10 23:54:32 +09:00
43c413ec57
[Kernel] Use flashinfer for decoding ( #4353 )
...
Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com >
2024-05-03 15:51:27 -07:00
0f8a91401c
[Core] Ignore infeasible swap requests. ( #4557 )
2024-05-02 14:31:20 -07:00
0d62fe58db
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption ( #4451 )
2024-05-01 19:24:13 -07:00
36729bac13
[Test] Test multiple attn backend for chunked prefill. ( #4023 )
2024-04-12 09:56:57 -07:00
e42df7227d
[Test] Add xformer and flash attn tests ( #3961 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com >
2024-04-11 03:09:50 +00:00
67b4221a61
[Core][5/N] Fully working chunked prefill e2e ( #3884 )
2024-04-10 17:56:48 -07:00
26422e477b
[Test] Make model tests run again and remove --forked from pytest ( #3631 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com >
2024-03-28 21:06:40 -07:00
6e435de766
[1/n][Chunked Prefill] Refactor input query shapes ( #3236 )
2024-03-20 14:46:05 -07:00
a61f0521b8
[Test] Add basic correctness test ( #2908 )
2024-02-18 16:44:50 -08:00