[Bugfix] In LongRoPE, decide short vs long based on max_model_len (#27431)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
This commit is contained in:
@ -29,7 +29,7 @@ def multimodal_server(): # noqa: F811
|
||||
"--dtype",
|
||||
"half",
|
||||
"--max-model-len",
|
||||
"12800",
|
||||
"4096",
|
||||
"--enforce-eager",
|
||||
# lora config below
|
||||
"--enable-lora",
|
||||
|
||||
Reference in New Issue
Block a user