Files
ragflow/api/utils/tenant_utils.py
NeedmeFordev 78c3583964 Fix memory resolution regression for multimodal Gemini models (#14209)
### What problem does this PR solve?

Fixes #14206.

This issue is a regression. PR #9520 previously changed Gemini models
from `image2text` to `chat` to fix chat-side resolution, but PR #13073
later restored those Gemini entries to `image2text` during model-list
updates, which reintroduced the bug.

The underlying problem is that Gemini models are multimodal and
advertise both `CHAT` and `IMAGE2TEXT`, while tenant model resolution
still depends on a single stored `model_type`. That makes chat-only
flows such as memory extraction fragile when a compatible model is
stored as `image2text`.

This PR fixes the issue at the model resolution layer instead of
changing `llm_factories.json` again:
- keep the stored tenant model type unchanged
- try exact `model_type` lookup first
- if no exact match is found, fall back only when the model metadata
shows the requested capability is supported
- coerce the runtime config to the requested type for chat callers
- fail fast in memory creation instead of silently persisting
`tenant_llm_id=0`

This preserves existing multimodal and `image2text` behavior while
restoring chat compatibility for memory-related flows.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Testing

- Re-checked the current memory creation and memory message extraction
paths against the updated resolution logic
- Verified locally that a Gemini-style tenant model stored as
`image2text` but tagged with `CHAT` can still be resolved for `chat`
- Verified `get_model_config_by_type_and_name(..., CHAT, ...)` returns a
chat-compatible runtime config
- Verified `get_model_config_by_id(..., CHAT)` also returns a
chat-compatible runtime config
- Verified strict resolution still fails when the model metadata does
not advertise chat capability
2026-04-20 16:37:36 +08:00

46 lines
2.0 KiB
Python

#
# Copyright 2026 The InfiniFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from common.constants import LLMType
from common.exceptions import ArgumentException
from api.db.services.tenant_llm_service import TenantLLMService
_KEY_TO_MODEL_TYPE = {
"llm_id": LLMType.CHAT,
"embd_id": LLMType.EMBEDDING,
"asr_id": LLMType.SPEECH2TEXT,
"img2txt_id": LLMType.IMAGE2TEXT,
"rerank_id": LLMType.RERANK,
"tts_id": LLMType.TTS,
}
def ensure_tenant_model_id_for_params(tenant_id: str, param_dict: dict, *, strict: bool = False) -> dict:
for key in ["llm_id", "embd_id", "asr_id", "img2txt_id", "rerank_id", "tts_id"]:
if param_dict.get(key) and not param_dict.get(f"tenant_{key}"):
model_type = _KEY_TO_MODEL_TYPE.get(key)
tenant_model = TenantLLMService.get_api_key(tenant_id, param_dict[key], model_type)
if not tenant_model and model_type == LLMType.CHAT:
tenant_model = TenantLLMService.get_api_key(tenant_id, param_dict[key])
if tenant_model:
param_dict.update({f"tenant_{key}": tenant_model.id})
else:
if strict:
model_type_val = model_type.value if hasattr(model_type, "value") else model_type
raise ArgumentException(
f"Tenant Model with name {param_dict[key]} and type {model_type_val} not found"
)
param_dict.update({f"tenant_{key}": 0})
return param_dict