[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126)
This commit is contained in:
@ -17,4 +17,4 @@ Input Processing Pipeline
|
||||
|
||||
6. If the data contains multi-modal data, convert it into keyword arguments using :meth:`MULTIMODAL_REGISTRY.map_input <vllm.multimodal.MultiModalRegistry.map_input>`.
|
||||
|
||||
- For example, convert a :class:`PIL.Image.Image` input to its pixel values for a vision language model.
|
||||
- For example, convert a :class:`PIL.Image.Image` input to its pixel values for a vision model.
|
||||
|
||||
@ -15,6 +15,9 @@ by following :ref:`this guide <adding_multimodal_plugin>`.
|
||||
|
||||
Looking to add your own multi-modal model? Please follow the instructions listed :ref:`here <enabling_multimodal_inputs>`.
|
||||
|
||||
..
|
||||
TODO: Add usage of --limit-mm-per-prompt when multi-image input is officially supported
|
||||
|
||||
Guides
|
||||
++++++
|
||||
|
||||
|
||||
@ -66,7 +66,7 @@ A default mapper is available for each modality in the core vLLM library. This i
|
||||
3. Register maximum number of multi-modal tokens
|
||||
------------------------------------------------
|
||||
|
||||
For each modality type that the model accepts as input, calculate the maximum possible number of tokens
|
||||
For each modality type that the model accepts as input, calculate the maximum possible number of tokens per data instance
|
||||
and register it via :meth:`INPUT_REGISTRY.register_dummy_data <vllm.inputs.registry.InputRegistry.register_max_multimodal_tokens>`.
|
||||
|
||||
.. code-block:: diff
|
||||
|
||||
Reference in New Issue
Block a user