Logo
Explore Help
Sign In
youngkingdom/vllm
1
0
Fork 0
You've already forked vllm
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
1,272 Commits 157 Branches 93 Tags
021b1a2ab7497769dae8a67ea3467e4bafb474c5
Commit Graph

9 Commits

Author SHA1 Message Date
Lily Liu
43c413ec57 [Kernel] Use flashinfer for decoding (#4353)
Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>
2024-05-03 15:51:27 -07:00
SangBin Cho
0f8a91401c [Core] Ignore infeasible swap requests. (#4557) 2024-05-02 14:31:20 -07:00
SangBin Cho
0d62fe58db [Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451) 2024-05-01 19:24:13 -07:00
SangBin Cho
36729bac13 [Test] Test multiple attn backend for chunked prefill. (#4023) 2024-04-12 09:56:57 -07:00
SangBin Cho
e42df7227d [Test] Add xformer and flash attn tests (#3961)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-04-11 03:09:50 +00:00
SangBin Cho
67b4221a61 [Core][5/N] Fully working chunked prefill e2e (#3884) 2024-04-10 17:56:48 -07:00
SangBin Cho
26422e477b [Test] Make model tests run again and remove --forked from pytest (#3631)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-03-28 21:06:40 -07:00
SangBin Cho
6e435de766 [1/n][Chunked Prefill] Refactor input query shapes (#3236) 2024-03-20 14:46:05 -07:00
Zhuohan Li
a61f0521b8 [Test] Add basic correctness test (#2908) 2024-02-18 16:44:50 -08:00
Powered by Gitea Version: 1.24.2 Page: 102ms Template: 8ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API