[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@ -83,10 +83,24 @@ def run_phi3v(question, modality):
|
||||
|
||||
# In this example, we override max_num_seqs to 5 while
|
||||
# keeping the original context length of 128k.
|
||||
|
||||
# num_crops is an override kwarg to the multimodal image processor;
|
||||
# For some models, e.g., Phi-3.5-vision-instruct, it is recommended
|
||||
# to use 16 for single frame scenarios, and 4 for multi-frame.
|
||||
#
|
||||
# Generally speaking, a larger value for num_crops results in more
|
||||
# tokens per image instance, because it may scale the image more in
|
||||
# the image preprocessing. Some references in the model docs and the
|
||||
# formula for image tokens after the preprocessing
|
||||
# transform can be found below.
|
||||
#
|
||||
# https://huggingface.co/microsoft/Phi-3.5-vision-instruct#loading-the-model-locally
|
||||
# https://huggingface.co/microsoft/Phi-3.5-vision-instruct/blob/main/processing_phi3_v.py#L194
|
||||
llm = LLM(
|
||||
model="microsoft/Phi-3-vision-128k-instruct",
|
||||
trust_remote_code=True,
|
||||
max_num_seqs=5,
|
||||
mm_processor_kwargs={"num_crops": 16},
|
||||
)
|
||||
stop_token_ids = None
|
||||
return llm, prompt, stop_token_ids
|
||||
|
||||
Reference in New Issue
Block a user