Commit Graph

8 Commits

Author SHA1 Message Date
81ede99ca4 [Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
8e836d982a [Doc] Fix code formatting in spec_decode.rst (#9348) 2024-10-14 21:29:11 -07:00
2febcf2777 [Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962) 2024-09-05 16:25:29 -04:00
0e12cd67a8 [Doc] add online speculative decoding example (#7243) 2024-08-07 09:58:02 -07:00
789937af2e [Doc] [SpecDecode] Update MLPSpeculator documentation (#7100)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-05 23:29:43 +00:00
6ef3bf912c Remove unnecessary trailing period in spec_decode.rst (#6405) 2024-07-14 07:58:09 +00:00
89ec06c33b [Docs] [Spec decode] Fix docs error in code example (#5427) 2024-06-11 10:31:56 -07:00
4c2ffb28ff [Speculative decoding] Initial spec decode docs (#5400) 2024-06-11 10:15:40 -07:00