[Bugfix] In LongRoPE, decide short vs long based on max_model_len (#27431)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
This commit is contained in:
Matthew Bonanni
2025-10-28 08:00:56 -04:00
committed by GitHub
parent 7a865f2325
commit 44b5ce956d
3 changed files with 39 additions and 11 deletions

View File

@ -29,7 +29,7 @@ def multimodal_server(): # noqa: F811
"--dtype",
"half",
"--max-model-len",
"12800",
"4096",
"--enforce-eager",
# lora config below
"--enable-lora",