feat: update styling and improve accessibility in retrieval modal; add translation for retrieval method

feat: enhance styling and add history icon to dataset components
Merge branch 'main' into feat/knowledge-dark-mode
2026-02-02 09:56:36 +08:00 · 2025-02-08 14:08:53 +08:00 · 2025-02-08 11:52:57 +08:00 · 2025-02-08 10:53:46 +08:00 · 2025-02-08 10:39:28 +08:00 · 2025-02-08 10:28:31 +08:00
260 changed files with 7507 additions and 7007 deletions
--- a/.github/workflows/docker-build.yml
+++ b/.github/workflows/docker-build.yml
@ -0,0 +1,47 @@
+name: Build docker image
+
+on:
+  pull_request:
+    branches:
+      - "main"
+    paths:
+      - api/Dockerfile
+      - web/Dockerfile
+
+concurrency:
+  group: docker-build-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  build-docker:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        include:
+          - service_name: "api-amd64"
+            platform: linux/amd64
+            context: "api"
+          - service_name: "api-arm64"
+            platform: linux/arm64
+            context: "api"
+          - service_name: "web-amd64"
+            platform: linux/amd64
+            context: "web"
+          - service_name: "web-arm64"
+            platform: linux/arm64
+            context: "web"
+    steps:
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v3
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Build Docker Image
+        uses: docker/build-push-action@v6
+        with:
+          push: false
+          context: "{{defaultContext}}:${{ matrix.context }}"
+          platforms: ${{ matrix.platform }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
--- a/README.md
+++ b/README.md
@ -25,6 +25,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="follow on X(Twitter)"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="follow on LinkedIn"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/README_AR.md
+++ b/README_AR.md
@ -21,6 +21,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="follow on X(Twitter)"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="follow on LinkedIn"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/README_CN.md
+++ b/README_CN.md
@ -21,6 +21,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="follow on X(Twitter)"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="follow on LinkedIn"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/README_ES.md
+++ b/README_ES.md
@ -21,6 +21,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="seguir en X(Twitter)"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="seguir en LinkedIn"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Descargas de Docker" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/README_FR.md
+++ b/README_FR.md
@ -21,6 +21,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="suivre sur X(Twitter)"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="suivre sur LinkedIn"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Tirages Docker" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/README_JA.md
+++ b/README_JA.md
@ -21,6 +21,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="X(Twitter)でフォロー"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="LinkedInでフォロー"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/README_KL.md
+++ b/README_KL.md
@ -21,6 +21,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="follow on X(Twitter)"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="follow on LinkedIn"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/README_KR.md
+++ b/README_KR.md
@ -21,6 +21,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="follow on X(Twitter)"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="follow on LinkedIn"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/README_PT.md
+++ b/README_PT.md
@ -25,6 +25,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="follow on X(Twitter)"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="follow on LinkedIn"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/README_SI.md
+++ b/README_SI.md
@ -22,6 +22,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="follow on X(Twitter)"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="follow on LinkedIn"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/README_TR.md
+++ b/README_TR.md
@ -21,6 +21,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="X(Twitter)'da takip et"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="LinkedIn'da takip et"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Docker Çekmeleri" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
@ -62,8 +65,6 @@ Görsel bir arayüz üzerinde güçlü AI iş akışları oluşturun ve test edi
 ![providers-v5](https://github.com/langgenius/dify/assets/13230914/5a17bdbe-097a-4100-8363-40255b70f6e3)


-Özür dilerim, haklısınız. Daha anlamlı ve akıcı bir çeviri yapmaya çalışayım. İşte güncellenmiş çeviri:
-
 **3. Prompt IDE**: 
  Komut istemlerini oluşturmak, model performansını karşılaştırmak ve sohbet tabanlı uygulamalara metin-konuşma gibi ek özellikler eklemek için kullanıcı dostu bir arayüz.

@ -150,8 +151,6 @@ Görsel bir arayüz üzerinde güçlü AI iş akışları oluşturun ve test edi
 ## Dify'ı Kullanma

 - **Cloud </br>**
-İşte verdiğiniz metnin Türkçe çevirisi, kod bloğu içinde:
- 
 Herkesin sıfır kurulumla denemesi için bir [Dify Cloud](https://dify.ai) hizmeti sunuyoruz. Bu hizmet, kendi kendine dağıtılan versiyonun tüm yeteneklerini sağlar ve sandbox planında 200 ücretsiz GPT-4 çağrısı içerir.

 - **Dify Topluluk Sürümünü Kendi Sunucunuzda Barındırma</br>**
@ -177,8 +176,6 @@ GitHub'da Dify'a yıldız verin ve yeni sürümlerden anında haberdar olun.
 >- RAM >= 4GB

 </br>
-İşte verdiğiniz metnin Türkçe çevirisi, kod bloğu içinde:
-
 Dify sunucusunu başlatmanın en kolay yolu, [docker-compose.yml](docker/docker-compose.yaml) dosyamızı çalıştırmaktır. Kurulum komutunu çalıştırmadan önce, makinenizde [Docker](https://docs.docker.com/get-docker/) ve [Docker Compose](https://docs.docker.com/compose/install/)'un kurulu olduğundan emin olun:

 ```bash
--- a/README_VI.md
+++ b/README_VI.md
@ -21,6 +21,9 @@
    <a href="https://twitter.com/intent/follow?screen_name=dify_ai" target="_blank">
        <img src="https://img.shields.io/twitter/follow/dify_ai?logo=X&color=%20%23f5f5f5"
            alt="theo dõi trên X(Twitter)"></a>
+    <a href="https://www.linkedin.com/company/langgenius/" target="_blank">
+        <img src="https://custom-icon-badges.demolab.com/badge/LinkedIn-0A66C2?logo=linkedin-white&logoColor=fff"
+            alt="theo dõi trên LinkedIn"></a>
    <a href="https://hub.docker.com/u/langgenius" target="_blank">
        <img alt="Docker Pulls" src="https://img.shields.io/docker/pulls/langgenius/dify-web?labelColor=%20%23FDB062&color=%20%23f79009"></a>
    <a href="https://github.com/langgenius/dify/graphs/commit-activity" target="_blank">
--- a/api/Dockerfile
+++ b/api/Dockerfile
@ -48,18 +48,18 @@ ENV TZ=UTC

 WORKDIR /app/api

-RUN apt-get update \
-    && apt-get install -y --no-install-recommends curl nodejs libgmp-dev libmpfr-dev libmpc-dev \
-    # if you located in China, you can use aliyun mirror to speed up
-    # && echo "deb http://mirrors.aliyun.com/debian testing main" > /etc/apt/sources.list \
-    && echo "deb http://deb.debian.org/debian bookworm main" > /etc/apt/sources.list \
-    && apt-get update \
-    # For Security
-    && apt-get install -y --no-install-recommends expat libldap-2.5-0 perl libsqlite3-0 zlib1g \
-    # install a chinese font to support the use of tools like matplotlib
-    && apt-get install -y fonts-noto-cjk \
-    # install libmagic to support the use of python-magic guess MIMETYPE
-    && apt-get install -y libmagic1 \
+RUN \
+    apt-get update \
+    # Install dependencies
+    && apt-get install -y --no-install-recommends \
+        # basic environment
+        curl nodejs libgmp-dev libmpfr-dev libmpc-dev \
+        # For Security
+        expat libldap-2.5-0 perl libsqlite3-0 zlib1g \
+        # install a chinese font to support the use of tools like matplotlib
+        fonts-noto-cjk \
+        # install libmagic to support the use of python-magic guess MIMETYPE
+        libmagic1 \
    && apt-get autoremove -y \
    && rm -rf /var/lib/apt/lists/*

@ -78,7 +78,6 @@ COPY . /app/api/
 COPY docker/entrypoint.sh /entrypoint.sh
 RUN chmod +x /entrypoint.sh

-
 ARG COMMIT_SHA
 ENV COMMIT_SHA=${COMMIT_SHA}

--- a/api/configs/feature/init.py
+++ b/api/configs/feature/init.py
@ -498,6 +498,11 @@ class AuthConfig(BaseSettings):
        default=86400,
    )

+    FORGOT_PASSWORD_LOCKOUT_DURATION: PositiveInt = Field(
+        description="Time (in seconds) a user must wait before retrying password reset after exceeding the rate limit.",
+        default=86400,
+    )
+

 class ModerationConfig(BaseSettings):
    """
--- a/api/configs/feature/hosted_service/init.py
+++ b/api/configs/feature/hosted_service/init.py
@ -1,9 +1,40 @@
 from typing import Optional

-from pydantic import Field, NonNegativeInt
+from pydantic import Field, NonNegativeInt, computed_field
 from pydantic_settings import BaseSettings


+class HostedCreditConfig(BaseSettings):
+    HOSTED_MODEL_CREDIT_CONFIG: str = Field(
+        description="Model credit configuration in format 'model:credits,model:credits', e.g., 'gpt-4:20,gpt-4o:10'",
+        default="",
+    )
+
+    def get_model_credits(self, model_name: str) -> int:
+        """
+        Get credit value for a specific model name.
+        Returns 1 if model is not found in configuration (default credit).
+
+        :param model_name: The name of the model to search for
+        :return: The credit value for the model
+        """
+        if not self.HOSTED_MODEL_CREDIT_CONFIG:
+            return 1
+
+        try:
+            credit_map = dict(
+                item.strip().split(":", 1) for item in self.HOSTED_MODEL_CREDIT_CONFIG.split(",") if ":" in item
+            )
+
+            # Search for matching model pattern
+            for pattern, credit in credit_map.items():
+                if pattern.strip() == model_name:
+                    return int(credit)
+            return 1  # Default quota if no match found
+        except (ValueError, AttributeError):
+            return 1  # Return default quota if parsing fails
+
+
 class HostedOpenAiConfig(BaseSettings):
    """
    Configuration for hosted OpenAI service
@ -202,5 +233,7 @@ class HostedServiceConfig(
    HostedZhipuAIConfig,
    # moderation
    HostedModerationConfig,
+    # credit config
+    HostedCreditConfig,
 ):
    pass
--- a/api/configs/packaging/init.py
+++ b/api/configs/packaging/init.py
@ -9,7 +9,7 @@ class PackagingInfo(BaseSettings):

    CURRENT_VERSION: str = Field(
        description="Dify version",
-        default="0.15.2",
+        default="0.15.3",
    )

    COMMIT_SHA: str = Field(
--- a/api/controllers/console/auth/error.py
+++ b/api/controllers/console/auth/error.py
@ -59,3 +59,9 @@ class EmailCodeAccountDeletionRateLimitExceededError(BaseHTTPException):
    error_code = "email_code_account_deletion_rate_limit_exceeded"
    description = "Too many account deletion emails have been sent. Please try again in 5 minutes."
    code = 429
+
+
+class EmailPasswordResetLimitError(BaseHTTPException):
+    error_code = "email_password_reset_limit"
+    description = "Too many failed password reset attempts. Please try again in 24 hours."
+    code = 429
--- a/api/controllers/console/auth/forgot_password.py
+++ b/api/controllers/console/auth/forgot_password.py
@ -6,7 +6,13 @@ from flask_restful import Resource, reqparse  # type: ignore

 from constants.languages import languages
 from controllers.console import api
-from controllers.console.auth.error import EmailCodeError, InvalidEmailError, InvalidTokenError, PasswordMismatchError
+from controllers.console.auth.error import (
+    EmailCodeError,
+    EmailPasswordResetLimitError,
+    InvalidEmailError,
+    InvalidTokenError,
+    PasswordMismatchError,
+)
 from controllers.console.error import AccountInFreezeError, AccountNotFound, EmailSendIpLimitError
 from controllers.console.wraps import setup_required
 from events.tenant_event import tenant_was_created
@ -62,6 +68,10 @@ class ForgotPasswordCheckApi(Resource):

        user_email = args["email"]

+        is_forgot_password_error_rate_limit = AccountService.is_forgot_password_error_rate_limit(args["email"])
+        if is_forgot_password_error_rate_limit:
+            raise EmailPasswordResetLimitError()
+
        token_data = AccountService.get_reset_password_data(args["token"])
        if token_data is None:
            raise InvalidTokenError()
@ -70,8 +80,10 @@ class ForgotPasswordCheckApi(Resource):
            raise InvalidEmailError()

        if args["code"] != token_data.get("code"):
+            AccountService.add_forgot_password_error_rate_limit(args["email"])
            raise EmailCodeError()

+        AccountService.reset_forgot_password_error_rate_limit(args["email"])
        return {"is_valid": True, "email": token_data.get("email")}


--- a/api/core/helper/ssrf_proxy.py
+++ b/api/core/helper/ssrf_proxy.py
@ -11,15 +11,6 @@ from configs import dify_config

 SSRF_DEFAULT_MAX_RETRIES = dify_config.SSRF_DEFAULT_MAX_RETRIES

-proxy_mounts = (
-    {
-        "http://": httpx.HTTPTransport(proxy=dify_config.SSRF_PROXY_HTTP_URL),
-        "https://": httpx.HTTPTransport(proxy=dify_config.SSRF_PROXY_HTTPS_URL),
-    }
-    if dify_config.SSRF_PROXY_HTTP_URL and dify_config.SSRF_PROXY_HTTPS_URL
-    else None
-)
-
 BACKOFF_FACTOR = 0.5
 STATUS_FORCELIST = [429, 500, 502, 503, 504]

@ -51,7 +42,11 @@ def make_request(method, url, max_retries=SSRF_DEFAULT_MAX_RETRIES, **kwargs):
            if dify_config.SSRF_PROXY_ALL_URL:
                with httpx.Client(proxy=dify_config.SSRF_PROXY_ALL_URL) as client:
                    response = client.request(method=method, url=url, **kwargs)
-            elif proxy_mounts:
+            elif dify_config.SSRF_PROXY_HTTP_URL and dify_config.SSRF_PROXY_HTTPS_URL:
+                proxy_mounts = {
+                    "http://": httpx.HTTPTransport(proxy=dify_config.SSRF_PROXY_HTTP_URL),
+                    "https://": httpx.HTTPTransport(proxy=dify_config.SSRF_PROXY_HTTPS_URL),
+                }
                with httpx.Client(mounts=proxy_mounts) as client:
                    response = client.request(method=method, url=url, **kwargs)
            else:
--- a/api/core/model_runtime/entities/init.py
+++ b/api/core/model_runtime/entities/init.py
@ -1,4 +1,4 @@
-from .llm_entities import LLMResult, LLMResultChunk, LLMResultChunkDelta, LLMUsage
+from .llm_entities import LLMMode, LLMResult, LLMResultChunk, LLMResultChunkDelta, LLMUsage
 from .message_entities import (
    AssistantPromptMessage,
    AudioPromptMessageContent,
@ -23,6 +23,7 @@ __all__ = [
    "AudioPromptMessageContent",
    "DocumentPromptMessageContent",
    "ImagePromptMessageContent",
+    "LLMMode",
    "LLMResult",
    "LLMResultChunk",
    "LLMResultChunkDelta",
--- a/api/core/model_runtime/entities/llm_entities.py
+++ b/api/core/model_runtime/entities/llm_entities.py
@ -1,5 +1,5 @@
 from decimal import Decimal
-from enum import Enum
+from enum import StrEnum
 from typing import Optional

 from pydantic import BaseModel
@ -8,7 +8,7 @@ from core.model_runtime.entities.message_entities import AssistantPromptMessage,
 from core.model_runtime.entities.model_entities import ModelUsage, PriceInfo


-class LLMMode(Enum):
+class LLMMode(StrEnum):
    """
    Enum class for large language model mode.
    """
--- a/api/core/model_runtime/model_providers/__base/large_language_model.py
+++ b/api/core/model_runtime/model_providers/__base/large_language_model.py
@ -30,6 +30,11 @@ from core.model_runtime.model_providers.__base.ai_model import AIModel

 logger = logging.getLogger(__name__)

+HTML_THINKING_TAG = (
+    '<details style="color:gray;background-color: #f8f8f8;padding: 8px;border-radius: 4px;" open> '
+    "<summary> Thinking... </summary>"
+)
+

 class LargeLanguageModel(AIModel):
    """
@ -400,6 +405,40 @@ if you are not sure about the structure.
                    ),
                )

+    def _wrap_thinking_by_reasoning_content(self, delta: dict, is_reasoning: bool) -> tuple[str, bool]:
+        """
+        If the reasoning response is from delta.get("reasoning_content"), we wrap
+        it with HTML details tag.
+
+        :param delta: delta dictionary from LLM streaming response
+        :param is_reasoning: is reasoning
+        :return: tuple of (processed_content, is_reasoning)
+        """
+
+        content = delta.get("content") or ""
+        reasoning_content = delta.get("reasoning_content")
+
+        if reasoning_content:
+            if not is_reasoning:
+                content = HTML_THINKING_TAG + reasoning_content
+                is_reasoning = True
+            else:
+                content = reasoning_content
+        elif is_reasoning:
+            content = "</details>" + content
+            is_reasoning = False
+        return content, is_reasoning
+
+    def _wrap_thinking_by_tag(self, content: str) -> str:
+        """
+        if the reasoning response is a <think>...</think> block from delta.get("content"),
+        we replace <think> to <detail>.
+
+        :param content: delta.get("content")
+        :return: processed_content
+        """
+        return content.replace("<think>", HTML_THINKING_TAG).replace("</think>", "</details>")
+
    def _invoke_result_generator(
        self,
        model: str,
--- a/api/core/model_runtime/model_providers/_position.yaml
+++ b/api/core/model_runtime/model_providers/_position.yaml
@ -1,4 +1,5 @@
 - openai
+- deepseek
 - anthropic
 - azure_openai
 - google
@ -32,7 +33,6 @@
 - localai
 - volcengine_maas
 - openai_api_compatible
- deepseek
 - hunyuan
 - siliconflow
 - perfxcloud
--- a/api/core/model_runtime/model_providers/azure_ai_studio/azure_ai_studio.yaml
+++ b/api/core/model_runtime/model_providers/azure_ai_studio/azure_ai_studio.yaml
@ -51,6 +51,40 @@ model_credential_schema:
      show_on:
        - variable: __model_type
          value: llm
+    - variable: mode
+      show_on:
+        - variable: __model_type
+          value: llm
+      label:
+        en_US: Completion mode
+      type: select
+      required: false
+      default: chat
+      placeholder:
+        zh_Hans: 选择对话类型
+        en_US: Select completion mode
+      options:
+        - value: completion
+          label:
+            en_US: Completion
+            zh_Hans: 补全
+        - value: chat
+          label:
+            en_US: Chat
+            zh_Hans: 对话
+    - variable: context_size
+      label:
+        zh_Hans: 模型上下文长度
+        en_US: Model context size
+      required: true
+      show_on:
+        - variable: __model_type
+          value: llm
+      type: text-input
+      default: "4096"
+      placeholder:
+        zh_Hans: 在此输入您的模型上下文长度
+        en_US: Enter your Model context size
    - variable: jwt_token
      required: true
      label:
--- a/api/core/model_runtime/model_providers/azure_ai_studio/llm/llm.py
+++ b/api/core/model_runtime/model_providers/azure_ai_studio/llm/llm.py
@ -1,9 +1,9 @@
 import logging
-from collections.abc import Generator
+from collections.abc import Generator, Sequence
 from typing import Any, Optional, Union

 from azure.ai.inference import ChatCompletionsClient
-from azure.ai.inference.models import StreamingChatCompletionsUpdate
+from azure.ai.inference.models import StreamingChatCompletionsUpdate, SystemMessage, UserMessage
 from azure.core.credentials import AzureKeyCredential
 from azure.core.exceptions import (
    ClientAuthenticationError,
@ -20,7 +20,7 @@ from azure.core.exceptions import (
 )

 from core.model_runtime.callbacks.base_callback import Callback
-from core.model_runtime.entities.llm_entities import LLMResult, LLMResultChunk, LLMResultChunkDelta, LLMUsage
+from core.model_runtime.entities.llm_entities import LLMMode, LLMResult, LLMResultChunk, LLMResultChunkDelta, LLMUsage
 from core.model_runtime.entities.message_entities import (
    AssistantPromptMessage,
    PromptMessage,
@ -30,6 +30,7 @@ from core.model_runtime.entities.model_entities import (
    AIModelEntity,
    FetchFrom,
    I18nObject,
+    ModelPropertyKey,
    ModelType,
    ParameterRule,
    ParameterType,
@ -60,10 +61,10 @@ class AzureAIStudioLargeLanguageModel(LargeLanguageModel):
        self,
        model: str,
        credentials: dict,
-        prompt_messages: list[PromptMessage],
+        prompt_messages: Sequence[PromptMessage],
        model_parameters: dict,
-        tools: Optional[list[PromptMessageTool]] = None,
-        stop: Optional[list[str]] = None,
+        tools: Optional[Sequence[PromptMessageTool]] = None,
+        stop: Optional[Sequence[str]] = None,
        stream: bool = True,
        user: Optional[str] = None,
    ) -> Union[LLMResult, Generator]:
@ -82,8 +83,8 @@ class AzureAIStudioLargeLanguageModel(LargeLanguageModel):
        """

        if not self.client:
-            endpoint = credentials.get("endpoint")
-            api_key = credentials.get("api_key")
+            endpoint = str(credentials.get("endpoint"))
+            api_key = str(credentials.get("api_key"))
            self.client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(api_key))

        messages = [{"role": msg.role.value, "content": msg.content} for msg in prompt_messages]
@ -94,6 +95,7 @@ class AzureAIStudioLargeLanguageModel(LargeLanguageModel):
            "temperature": model_parameters.get("temperature", 0),
            "top_p": model_parameters.get("top_p", 1),
            "stream": stream,
+            "model": model,
        }

        if stop:
@ -255,10 +257,16 @@ class AzureAIStudioLargeLanguageModel(LargeLanguageModel):
        :return:
        """
        try:
-            endpoint = credentials.get("endpoint")
-            api_key = credentials.get("api_key")
+            endpoint = str(credentials.get("endpoint"))
+            api_key = str(credentials.get("api_key"))
            client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(api_key))
-            client.get_model_info()
+            client.complete(
+                messages=[
+                    SystemMessage(content="I say 'ping', you say 'pong'"),
+                    UserMessage(content="ping"),
+                ],
+                model=model,
+            )
        except Exception as ex:
            raise CredentialsValidateFailedError(str(ex))

@ -327,7 +335,10 @@ class AzureAIStudioLargeLanguageModel(LargeLanguageModel):
            fetch_from=FetchFrom.CUSTOMIZABLE_MODEL,
            model_type=ModelType.LLM,
            features=[],
-            model_properties={},
+            model_properties={
+                ModelPropertyKey.CONTEXT_SIZE: int(credentials.get("context_size", "4096")),
+                ModelPropertyKey.MODE: credentials.get("mode", LLMMode.CHAT),
+            },
            parameter_rules=rules,
        )

--- a/api/core/model_runtime/model_providers/azure_openai/azure_openai.yaml
+++ b/api/core/model_runtime/model_providers/azure_openai/azure_openai.yaml
@ -138,6 +138,18 @@ model_credential_schema:
          show_on:
            - variable: __model_type
              value: llm
+        - label:
+            en_US: o3-mini
+          value: o3-mini
+          show_on:
+            - variable: __model_type
+              value: llm
+        - label:
+            en_US: o3-mini-2025-01-31
+          value: o3-mini-2025-01-31
+          show_on:
+            - variable: __model_type
+              value: llm
        - label:
            en_US: o1-preview
          value: o1-preview
--- a/api/core/model_runtime/model_providers/bedrock/bedrock.yaml
+++ b/api/core/model_runtime/model_providers/bedrock/bedrock.yaml
@ -123,6 +123,15 @@ provider_credential_schema:
            en_US: AWS GovCloud (US-West)
            zh_Hans: AWS GovCloud (US-West)
            ja_JP: AWS GovCloud (米国西部)
+    - variable: bedrock_endpoint_url
+      label:
+        zh_Hans: Bedrock Endpoint URL
+        en_US: Bedrock Endpoint URL
+      type: text-input
+      required: false
+      placeholder:
+        zh_Hans: 在此输入您的 Bedrock Endpoint URL, 如：https://123456.cloudfront.net
+        en_US: Enter your Bedrock Endpoint URL, e.g. https://123456.cloudfront.net
    - variable: model_for_validation
      required: false
      label:
--- a/api/core/model_runtime/model_providers/bedrock/get_bedrock_client.py
+++ b/api/core/model_runtime/model_providers/bedrock/get_bedrock_client.py
@ -13,6 +13,7 @@ def get_bedrock_client(service_name: str, credentials: Mapping[str, str]):
    client_config = Config(region_name=region_name)
    aws_access_key_id = credentials.get("aws_access_key_id")
    aws_secret_access_key = credentials.get("aws_secret_access_key")
+    bedrock_endpoint_url = credentials.get("bedrock_endpoint_url")

    if aws_access_key_id and aws_secret_access_key:
        # use aksk to call bedrock
@ -21,6 +22,7 @@ def get_bedrock_client(service_name: str, credentials: Mapping[str, str]):
            config=client_config,
            aws_access_key_id=aws_access_key_id,
            aws_secret_access_key=aws_secret_access_key,
+            **({"endpoint_url": bedrock_endpoint_url} if bedrock_endpoint_url else {}),
        )
    else:
        # use iam without aksk to call
--- a/api/core/model_runtime/model_providers/deepseek/llm/llm.py
+++ b/api/core/model_runtime/model_providers/deepseek/llm/llm.py
@ -1,13 +1,10 @@
-import json
 from collections.abc import Generator
 from typing import Optional, Union

-import requests
 from yarl import URL

-from core.model_runtime.entities.llm_entities import LLMMode, LLMResult, LLMResultChunk, LLMResultChunkDelta
+from core.model_runtime.entities.llm_entities import LLMMode, LLMResult
 from core.model_runtime.entities.message_entities import (
-    AssistantPromptMessage,
    PromptMessage,
    PromptMessageTool,
 )
@ -39,208 +36,3 @@ class DeepseekLargeLanguageModel(OAIAPICompatLargeLanguageModel):
        credentials["mode"] = LLMMode.CHAT.value
        credentials["function_calling_type"] = "tool_call"
        credentials["stream_function_calling"] = "support"
-
-    def _handle_generate_stream_response(
-        self, model: str, credentials: dict, response: requests.Response, prompt_messages: list[PromptMessage]
-    ) -> Generator:
-        """
-        Handle llm stream response
-
-        :param model: model name
-        :param credentials: model credentials
-        :param response: streamed response
-        :param prompt_messages: prompt messages
-        :return: llm response chunk generator
-        """
-        full_assistant_content = ""
-        chunk_index = 0
-        is_reasoning_started = False  # Add flag to track reasoning state
-
-        def create_final_llm_result_chunk(
-            id: Optional[str], index: int, message: AssistantPromptMessage, finish_reason: str, usage: dict
-        ) -> LLMResultChunk:
-            # calculate num tokens
-            prompt_tokens = usage and usage.get("prompt_tokens")
-            if prompt_tokens is None:
-                prompt_tokens = self._num_tokens_from_string(model, prompt_messages[0].content)
-            completion_tokens = usage and usage.get("completion_tokens")
-            if completion_tokens is None:
-                completion_tokens = self._num_tokens_from_string(model, full_assistant_content)
-
-            # transform usage
-            usage = self._calc_response_usage(model, credentials, prompt_tokens, completion_tokens)
-
-            return LLMResultChunk(
-                id=id,
-                model=model,
-                prompt_messages=prompt_messages,
-                delta=LLMResultChunkDelta(index=index, message=message, finish_reason=finish_reason, usage=usage),
-            )
-
-        # delimiter for stream response, need unicode_escape
-        import codecs
-
-        delimiter = credentials.get("stream_mode_delimiter", "\n\n")
-        delimiter = codecs.decode(delimiter, "unicode_escape")
-
-        tools_calls: list[AssistantPromptMessage.ToolCall] = []
-
-        def increase_tool_call(new_tool_calls: list[AssistantPromptMessage.ToolCall]):
-            def get_tool_call(tool_call_id: str):
-                if not tool_call_id:
-                    return tools_calls[-1]
-
-                tool_call = next((tool_call for tool_call in tools_calls if tool_call.id == tool_call_id), None)
-                if tool_call is None:
-                    tool_call = AssistantPromptMessage.ToolCall(
-                        id=tool_call_id,
-                        type="function",
-                        function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="", arguments=""),
-                    )
-                    tools_calls.append(tool_call)
-
-                return tool_call
-
-            for new_tool_call in new_tool_calls:
-                # get tool call
-                tool_call = get_tool_call(new_tool_call.function.name)
-                # update tool call
-                if new_tool_call.id:
-                    tool_call.id = new_tool_call.id
-                if new_tool_call.type:
-                    tool_call.type = new_tool_call.type
-                if new_tool_call.function.name:
-                    tool_call.function.name = new_tool_call.function.name
-                if new_tool_call.function.arguments:
-                    tool_call.function.arguments += new_tool_call.function.arguments
-
-        finish_reason = None  # The default value of finish_reason is None
-        message_id, usage = None, None
-        for chunk in response.iter_lines(decode_unicode=True, delimiter=delimiter):
-            chunk = chunk.strip()
-            if chunk:
-                # ignore sse comments
-                if chunk.startswith(":"):
-                    continue
-                decoded_chunk = chunk.strip().removeprefix("data:").lstrip()
-                if decoded_chunk == "[DONE]":  # Some provider returns "data: [DONE]"
-                    continue
-
-                try:
-                    chunk_json: dict = json.loads(decoded_chunk)
-                # stream ended
-                except json.JSONDecodeError as e:
-                    yield create_final_llm_result_chunk(
-                        id=message_id,
-                        index=chunk_index + 1,
-                        message=AssistantPromptMessage(content=""),
-                        finish_reason="Non-JSON encountered.",
-                        usage=usage,
-                    )
-                    break
-                # handle the error here. for issue #11629
-                if chunk_json.get("error") and chunk_json.get("choices") is None:
-                    raise ValueError(chunk_json.get("error"))
-
-                if chunk_json:
-                    if u := chunk_json.get("usage"):
-                        usage = u
-                if not chunk_json or len(chunk_json["choices"]) == 0:
-                    continue
-
-                choice = chunk_json["choices"][0]
-                finish_reason = chunk_json["choices"][0].get("finish_reason")
-                message_id = chunk_json.get("id")
-                chunk_index += 1
-
-                if "delta" in choice:
-                    delta = choice["delta"]
-                    is_reasoning = delta.get("reasoning_content")
-                    delta_content = delta.get("content") or delta.get("reasoning_content")
-
-                    assistant_message_tool_calls = None
-
-                    if "tool_calls" in delta and credentials.get("function_calling_type", "no_call") == "tool_call":
-                        assistant_message_tool_calls = delta.get("tool_calls", None)
-                    elif (
-                        "function_call" in delta
-                        and credentials.get("function_calling_type", "no_call") == "function_call"
-                    ):
-                        assistant_message_tool_calls = [
-                            {"id": "tool_call_id", "type": "function", "function": delta.get("function_call", {})}
-                        ]
-
-                    # assistant_message_function_call = delta.delta.function_call
-
-                    # extract tool calls from response
-                    if assistant_message_tool_calls:
-                        tool_calls = self._extract_response_tool_calls(assistant_message_tool_calls)
-                        increase_tool_call(tool_calls)
-
-                    if delta_content is None or delta_content == "":
-                        continue
-
-                    # Add markdown quote markers for reasoning content
-                    if is_reasoning:
-                        if not is_reasoning_started:
-                            delta_content = "> 💭 " + delta_content
-                            is_reasoning_started = True
-                        elif "\n\n" in delta_content:
-                            delta_content = delta_content.replace("\n\n", "\n> ")
-                        elif "\n" in delta_content:
-                            delta_content = delta_content.replace("\n", "\n> ")
-                    elif is_reasoning_started:
-                        # If we were in reasoning mode but now getting regular content,
-                        # add \n\n to close the reasoning block
-                        delta_content = "\n\n" + delta_content
-                        is_reasoning_started = False
-
-                    # transform assistant message to prompt message
-                    assistant_prompt_message = AssistantPromptMessage(
-                        content=delta_content,
-                    )
-
-                    # reset tool calls
-                    tool_calls = []
-                    full_assistant_content += delta_content
-                elif "text" in choice:
-                    choice_text = choice.get("text", "")
-                    if choice_text == "":
-                        continue
-
-                    # transform assistant message to prompt message
-                    assistant_prompt_message = AssistantPromptMessage(content=choice_text)
-                    full_assistant_content += choice_text
-                else:
-                    continue
-
-                yield LLMResultChunk(
-                    id=message_id,
-                    model=model,
-                    prompt_messages=prompt_messages,
-                    delta=LLMResultChunkDelta(
-                        index=chunk_index,
-                        message=assistant_prompt_message,
-                    ),
-                )
-
-            chunk_index += 1
-
-        if tools_calls:
-            yield LLMResultChunk(
-                id=message_id,
-                model=model,
-                prompt_messages=prompt_messages,
-                delta=LLMResultChunkDelta(
-                    index=chunk_index,
-                    message=AssistantPromptMessage(tool_calls=tools_calls, content=""),
-                ),
-            )
-
-        yield create_final_llm_result_chunk(
-            id=message_id,
-            index=chunk_index,
-            message=AssistantPromptMessage(content=""),
-            finish_reason=finish_reason,
-            usage=usage,
-        )
--- a/api/core/model_runtime/model_providers/google/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/google/llm/_position.yaml
@ -1,4 +1,6 @@
+- gemini-2.0-flash-001
 - gemini-2.0-flash-exp
+- gemini-2.0-pro-exp-02-05
 - gemini-2.0-flash-thinking-exp-1219
 - gemini-2.0-flash-thinking-exp-01-21
 - gemini-1.5-pro
--- a/api/core/model_runtime/model_providers/google/llm/gemini-2.0-flash-001.yaml
+++ b/api/core/model_runtime/model_providers/google/llm/gemini-2.0-flash-001.yaml
@ -0,0 +1,41 @@
+model: gemini-2.0-flash-001
+label:
+  en_US: Gemini 2.0 Flash 001
+model_type: llm
+features:
+  - agent-thought
+  - vision
+  - tool-call
+  - stream-tool-call
+  - document
+  - video
+  - audio
+model_properties:
+  mode: chat
+  context_size: 1048576
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: max_output_tokens
+    use_template: max_tokens
+    default: 8192
+    min: 1
+    max: 8192
+  - name: json_schema
+    use_template: json_schema
+pricing:
+  input: '0.00'
+  output: '0.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/google/llm/gemini-2.0-pro-exp-02-05.yaml
+++ b/api/core/model_runtime/model_providers/google/llm/gemini-2.0-pro-exp-02-05.yaml
@ -0,0 +1,41 @@
+model: gemini-2.0-pro-exp-02-05
+label:
+  en_US: Gemini 2.0 pro exp 02-05
+model_type: llm
+features:
+  - agent-thought
+  - vision
+  - tool-call
+  - stream-tool-call
+  - document
+  - video
+  - audio
+model_properties:
+  mode: chat
+  context_size: 1048576
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: max_output_tokens
+    use_template: max_tokens
+    default: 8192
+    min: 1
+    max: 8192
+  - name: json_schema
+    use_template: json_schema
+pricing:
+  input: '0.00'
+  output: '0.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/groq/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/groq/llm/_position.yaml
@ -1,3 +1,4 @@
+- deepseek-r1-distill-llama-70b
 - llama-3.1-405b-reasoning
 - llama-3.3-70b-versatile
 - llama-3.1-70b-versatile
--- a/api/core/model_runtime/model_providers/groq/llm/deepseek-r1-distill-llama-70b.yaml
+++ b/api/core/model_runtime/model_providers/groq/llm/deepseek-r1-distill-llama-70b.yaml
@ -0,0 +1,36 @@
+model: deepseek-r1-distill-llama-70b
+label:
+  en_US: DeepSeek R1 Distill Llama 70b
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 128000
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: max_tokens
+    use_template: max_tokens
+    default: 512
+    min: 1
+    max: 8192
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: Response Format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: '3.00'
+  output: '3.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/nvidia/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/nvidia/llm/_position.yaml
@ -1,3 +1,4 @@
+- deepseek-ai/deepseek-r1
 - google/gemma-7b
 - google/codegemma-7b
 - google/recurrentgemma-2b
--- a/api/core/model_runtime/model_providers/nvidia/llm/deepseek-r1.yaml
+++ b/api/core/model_runtime/model_providers/nvidia/llm/deepseek-r1.yaml
@ -0,0 +1,35 @@
+model: deepseek-ai/deepseek-r1
+label:
+  en_US: deepseek-ai/deepseek-r1
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 128000
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    min: 0
+    max: 1
+    default: 0.5
+  - name: top_p
+    use_template: top_p
+    min: 0
+    max: 1
+    default: 1
+  - name: max_tokens
+    use_template: max_tokens
+    min: 1
+    max: 1024
+    default: 1024
+  - name: frequency_penalty
+    use_template: frequency_penalty
+    min: -2
+    max: 2
+    default: 0
+  - name: presence_penalty
+    use_template: presence_penalty
+    min: -2
+    max: 2
+    default: 0
--- a/api/core/model_runtime/model_providers/nvidia/llm/llm.py
+++ b/api/core/model_runtime/model_providers/nvidia/llm/llm.py
@ -83,7 +83,7 @@ class NVIDIALargeLanguageModel(OAIAPICompatLargeLanguageModel):
    def _add_custom_parameters(self, credentials: dict, model: str) -> None:
        credentials["mode"] = "chat"

-        if self.MODEL_SUFFIX_MAP[model]:
+        if self.MODEL_SUFFIX_MAP.get(model):
            credentials["server_url"] = f"https://ai.api.nvidia.com/v1/{self.MODEL_SUFFIX_MAP[model]}"
            credentials.pop("endpoint_url")
        else:
--- a/api/core/model_runtime/model_providers/oci/llm/cohere.command-r-08-2024.yaml
+++ b/api/core/model_runtime/model_providers/oci/llm/cohere.command-r-08-2024.yaml
@ -0,0 +1,52 @@
+model: cohere.command-r-08-2024
+label:
+  en_US: cohere.command-r-08-2024 v1.7
+model_type: llm
+features:
+  - multi-tool-call
+  - agent-thought
+  - stream-tool-call
+model_properties:
+  mode: chat
+  context_size: 128000
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    default: 1
+    max: 1.0
+  - name: topP
+    use_template: top_p
+    default: 0.75
+    min: 0
+    max: 1
+  - name: topK
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+    default: 0
+    min: 0
+    max: 500
+  - name: presencePenalty
+    use_template: presence_penalty
+    min: 0
+    max: 1
+    default: 0
+  - name: frequencyPenalty
+    use_template: frequency_penalty
+    min: 0
+    max: 1
+    default: 0
+  - name: maxTokens
+    use_template: max_tokens
+    default: 600
+    max: 4000
+pricing:
+  input: '0.0009'
+  output: '0.0009'
+  unit: '0.0001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/oci/llm/cohere.command-r-16k.yaml
+++ b/api/core/model_runtime/model_providers/oci/llm/cohere.command-r-16k.yaml
@ -50,3 +50,4 @@ pricing:
  output: '0.004'
  unit: '0.0001'
  currency: USD
+deprecated: true
--- a/api/core/model_runtime/model_providers/oci/llm/cohere.command-r-plus-08-2024.yaml
+++ b/api/core/model_runtime/model_providers/oci/llm/cohere.command-r-plus-08-2024.yaml
@ -0,0 +1,52 @@
+model: cohere.command-r-plus-08-2024
+label:
+  en_US: cohere.command-r-plus-08-2024 v1.6
+model_type: llm
+features:
+  - multi-tool-call
+  - agent-thought
+  - stream-tool-call
+model_properties:
+  mode: chat
+  context_size: 128000
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    default: 1
+    max: 1.0
+  - name: topP
+    use_template: top_p
+    default: 0.75
+    min: 0
+    max: 1
+  - name: topK
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+    default: 0
+    min: 0
+    max: 500
+  - name: presencePenalty
+    use_template: presence_penalty
+    min: 0
+    max: 1
+    default: 0
+  - name: frequencyPenalty
+    use_template: frequency_penalty
+    min: 0
+    max: 1
+    default: 0
+  - name: maxTokens
+    use_template: max_tokens
+    default: 600
+    max: 4000
+pricing:
+  input: '0.0156'
+  output: '0.0156'
+  unit: '0.0001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/oci/llm/cohere.command-r-plus.yaml
+++ b/api/core/model_runtime/model_providers/oci/llm/cohere.command-r-plus.yaml
@ -50,3 +50,4 @@ pricing:
  output: '0.0219'
  unit: '0.0001'
  currency: USD
+deprecated: true
--- a/api/core/model_runtime/model_providers/oci/llm/llm.py
+++ b/api/core/model_runtime/model_providers/oci/llm/llm.py
@ -33,7 +33,7 @@ logger = logging.getLogger(__name__)

 request_template = {
    "compartmentId": "",
-    "servingMode": {"modelId": "cohere.command-r-plus", "servingType": "ON_DEMAND"},
+    "servingMode": {"modelId": "cohere.command-r-plus-08-2024", "servingType": "ON_DEMAND"},
    "chatRequest": {
        "apiFormat": "COHERE",
        # "preambleOverride": "You are a helpful assistant.",
@ -60,19 +60,19 @@ oci_config_template = {
 class OCILargeLanguageModel(LargeLanguageModel):
    # https://docs.oracle.com/en-us/iaas/Content/generative-ai/pretrained-models.htm
    _supported_models = {
-        "meta.llama-3-70b-instruct": {
+        "meta.llama-3.1-70b-instruct": {
            "system": True,
            "multimodal": False,
            "tool_call": False,
            "stream_tool_call": False,
        },
-        "cohere.command-r-16k": {
+        "cohere.command-r-08-2024": {
            "system": True,
            "multimodal": False,
            "tool_call": True,
            "stream_tool_call": False,
        },
-        "cohere.command-r-plus": {
+        "cohere.command-r-plus-08-2024": {
            "system": True,
            "multimodal": False,
            "tool_call": True,
--- a/api/core/model_runtime/model_providers/oci/llm/meta.llama-3-70b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/oci/llm/meta.llama-3-70b-instruct.yaml
@ -49,3 +49,4 @@ pricing:
  output: '0.015'
  unit: '0.0001'
  currency: USD
+deprecated: true
--- a/api/core/model_runtime/model_providers/oci/llm/meta.llama-3.1-70b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/oci/llm/meta.llama-3.1-70b-instruct.yaml
@ -0,0 +1,51 @@
+model: meta.llama-3.1-70b-instruct
+label:
+  zh_Hans: meta.llama-3.1-70b-instruct
+  en_US: meta.llama-3.1-70b-instruct
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 131072
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    default: 1
+    max: 2.0
+  - name: topP
+    use_template: top_p
+    default: 0.75
+    min: 0
+    max: 1
+  - name: topK
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+    default: 0
+    min: 0
+    max: 500
+  - name: presencePenalty
+    use_template: presence_penalty
+    min: -2
+    max: 2
+    default: 0
+  - name: frequencyPenalty
+    use_template: frequency_penalty
+    min: -2
+    max: 2
+    default: 0
+  - name: maxTokens
+    use_template: max_tokens
+    default: 600
+    max: 4000
+pricing:
+  input: '0.0075'
+  output: '0.0075'
+  unit: '0.0001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/oci/oci.py
+++ b/api/core/model_runtime/model_providers/oci/oci.py
@ -19,8 +19,8 @@ class OCIGENAIProvider(ModelProvider):
        try:
            model_instance = self.get_model_instance(ModelType.LLM)

-            # Use `cohere.command-r-plus` model for validate,
-            model_instance.validate_credentials(model="cohere.command-r-plus", credentials=credentials)
+            # Use `cohere.command-r-plus-08-2024` model for validate,
+            model_instance.validate_credentials(model="cohere.command-r-plus-08-2024", credentials=credentials)
        except CredentialsValidateFailedError as ex:
            raise ex
        except Exception as ex:
--- a/api/core/model_runtime/model_providers/ollama/llm/llm.py
+++ b/api/core/model_runtime/model_providers/ollama/llm/llm.py
@ -367,6 +367,7 @@ class OllamaLargeLanguageModel(LargeLanguageModel):

                # transform assistant message to prompt message
                text = chunk_json["response"]
+            text = self._wrap_thinking_by_tag(text)

            assistant_prompt_message = AssistantPromptMessage(content=text)

--- a/api/core/model_runtime/model_providers/openai/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/openai/llm/_position.yaml
@ -2,6 +2,8 @@
 - o1-2024-12-17
 - o1-mini
 - o1-mini-2024-09-12
+- o3-mini
+- o3-mini-2025-01-31
 - gpt-4
 - gpt-4o
 - gpt-4o-2024-05-13
--- a/api/core/model_runtime/model_providers/openai/llm/llm.py
+++ b/api/core/model_runtime/model_providers/openai/llm/llm.py
@ -619,9 +619,9 @@ class OpenAILargeLanguageModel(_CommonOpenAI, LargeLanguageModel):
        # clear illegal prompt messages
        prompt_messages = self._clear_illegal_prompt_messages(model, prompt_messages)

-        # o1 compatibility
+        # o1, o3 compatibility
        block_as_stream = False
-        if model.startswith("o1"):
+        if model.startswith(("o1", "o3")):
            if "max_tokens" in model_parameters:
                model_parameters["max_completion_tokens"] = model_parameters["max_tokens"]
                del model_parameters["max_tokens"]
@ -941,7 +941,7 @@ class OpenAILargeLanguageModel(_CommonOpenAI, LargeLanguageModel):
                                ]
                            )

-        if model.startswith("o1"):
+        if model.startswith(("o1", "o3")):
            system_message_count = len([m for m in prompt_messages if isinstance(m, SystemPromptMessage)])
            if system_message_count > 0:
                new_prompt_messages = []
@ -1053,7 +1053,7 @@ class OpenAILargeLanguageModel(_CommonOpenAI, LargeLanguageModel):
            model = model.split(":")[1]

        # Currently, we can use gpt4o to calculate chatgpt-4o-latest's token.
-        if model == "chatgpt-4o-latest" or model.startswith("o1"):
+        if model == "chatgpt-4o-latest" or model.startswith(("o1", "o3")):
            model = "gpt-4o"

        try:
@ -1068,7 +1068,7 @@ class OpenAILargeLanguageModel(_CommonOpenAI, LargeLanguageModel):
            tokens_per_message = 4
            # if there's a name, the role is omitted
            tokens_per_name = -1
-        elif model.startswith("gpt-3.5-turbo") or model.startswith("gpt-4") or model.startswith("o1"):
+        elif model.startswith("gpt-3.5-turbo") or model.startswith("gpt-4") or model.startswith(("o1", "o3")):
            tokens_per_message = 3
            tokens_per_name = 1
        else:
--- a/api/core/model_runtime/model_providers/openai/llm/o1-2024-12-17.yaml
+++ b/api/core/model_runtime/model_providers/openai/llm/o1-2024-12-17.yaml
@ -16,6 +16,19 @@ parameter_rules:
    default: 50000
    min: 1
    max: 50000
+  - name: reasoning_effort
+    label:
+      zh_Hans: 推理工作
+      en_US: reasoning_effort
+    type: string
+    help:
+      zh_Hans: 限制推理模型的推理工作
+      en_US: constrains effort on reasoning for reasoning models
+    required: false
+    options:
+      - low
+      - medium
+      - high
  - name: response_format
    label:
      zh_Hans: 回复格式
--- a/api/core/model_runtime/model_providers/openai/llm/o1.yaml
+++ b/api/core/model_runtime/model_providers/openai/llm/o1.yaml
@ -17,6 +17,19 @@ parameter_rules:
    default: 50000
    min: 1
    max: 50000
+  - name: reasoning_effort
+    label:
+      zh_Hans: 推理工作
+      en_US: reasoning_effort
+    type: string
+    help:
+      zh_Hans: 限制推理模型的推理工作
+      en_US: constrains effort on reasoning for reasoning models
+    required: false
+    options:
+      - low
+      - medium
+      - high
  - name: response_format
    label:
      zh_Hans: 回复格式
--- a/api/core/model_runtime/model_providers/openai/llm/o3-mini-2025-01-31.yaml
+++ b/api/core/model_runtime/model_providers/openai/llm/o3-mini-2025-01-31.yaml
@ -0,0 +1,46 @@
+model: o3-mini-2025-01-31
+label:
+  zh_Hans: o3-mini-2025-01-31
+  en_US: o3-mini-2025-01-31
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 200000
+parameter_rules:
+  - name: max_tokens
+    use_template: max_tokens
+    default: 100000
+    min: 1
+    max: 100000
+  - name: reasoning_effort
+    label:
+      zh_Hans: 推理工作
+      en_US: reasoning_effort
+    type: string
+    help:
+      zh_Hans: 限制推理模型的推理工作
+      en_US: constrains effort on reasoning for reasoning models
+    required: false
+    options:
+      - low
+      - medium
+      - high
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: response_format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: '1.10'
+  output: '4.40'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/openai/llm/o3-mini.yaml
+++ b/api/core/model_runtime/model_providers/openai/llm/o3-mini.yaml
@ -0,0 +1,46 @@
+model: o3-mini
+label:
+  zh_Hans: o3-mini
+  en_US: o3-mini
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 200000
+parameter_rules:
+  - name: max_tokens
+    use_template: max_tokens
+    default: 100000
+    min: 1
+    max: 100000
+  - name: reasoning_effort
+    label:
+      zh_Hans: 推理工作
+      en_US: reasoning_effort
+    type: string
+    help:
+      zh_Hans: 限制推理模型的推理工作
+      en_US: constrains effort on reasoning for reasoning models
+    required: false
+    options:
+      - low
+      - medium
+      - high
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: response_format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: '1.10'
+  output: '4.40'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/openai_api_compatible/llm/llm.py
+++ b/api/core/model_runtime/model_providers/openai_api_compatible/llm/llm.py
@ -1,5 +1,5 @@
+import codecs
 import json
-import logging
 from collections.abc import Generator
 from decimal import Decimal
 from typing import Optional, Union, cast
@ -38,8 +38,6 @@ from core.model_runtime.model_providers.__base.large_language_model import Large
 from core.model_runtime.model_providers.openai_api_compatible._common import _CommonOaiApiCompat
 from core.model_runtime.utils import helper

-logger = logging.getLogger(__name__)
-

 class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):
    """
@ -99,7 +97,7 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):
        :param tools: tools for tool calling
        :return:
        """
-        return self._num_tokens_from_messages(model, prompt_messages, tools, credentials)
+        return self._num_tokens_from_messages(prompt_messages, tools, credentials)

    def validate_credentials(self, model: str, credentials: dict) -> None:
        """
@ -398,6 +396,73 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):

        return self._handle_generate_response(model, credentials, response, prompt_messages)

+    def _create_final_llm_result_chunk(
+        self,
+        index: int,
+        message: AssistantPromptMessage,
+        finish_reason: str,
+        usage: dict,
+        model: str,
+        prompt_messages: list[PromptMessage],
+        credentials: dict,
+        full_content: str,
+    ) -> LLMResultChunk:
+        # calculate num tokens
+        prompt_tokens = usage and usage.get("prompt_tokens")
+        if prompt_tokens is None:
+            prompt_tokens = self._num_tokens_from_string(text=prompt_messages[0].content)
+        completion_tokens = usage and usage.get("completion_tokens")
+        if completion_tokens is None:
+            completion_tokens = self._num_tokens_from_string(text=full_content)
+
+        # transform usage
+        usage = self._calc_response_usage(model, credentials, prompt_tokens, completion_tokens)
+
+        return LLMResultChunk(
+            model=model,
+            prompt_messages=prompt_messages,
+            delta=LLMResultChunkDelta(index=index, message=message, finish_reason=finish_reason, usage=usage),
+        )
+
+    def _get_tool_call(self, tool_call_id: str, tools_calls: list[AssistantPromptMessage.ToolCall]):
+        """
+        Get or create a tool call by ID
+
+        :param tool_call_id: tool call ID
+        :param tools_calls: list of existing tool calls
+        :return: existing or new tool call, updated tools_calls
+        """
+        if not tool_call_id:
+            return tools_calls[-1], tools_calls
+
+        tool_call = next((tool_call for tool_call in tools_calls if tool_call.id == tool_call_id), None)
+        if tool_call is None:
+            tool_call = AssistantPromptMessage.ToolCall(
+                id=tool_call_id,
+                type="function",
+                function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="", arguments=""),
+            )
+            tools_calls.append(tool_call)
+
+        return tool_call, tools_calls
+
+    def _increase_tool_call(
+        self, new_tool_calls: list[AssistantPromptMessage.ToolCall], tools_calls: list[AssistantPromptMessage.ToolCall]
+    ) -> list[AssistantPromptMessage.ToolCall]:
+        for new_tool_call in new_tool_calls:
+            # get tool call
+            tool_call, tools_calls = self._get_tool_call(new_tool_call.function.name, tools_calls)
+            # update tool call
+            if new_tool_call.id:
+                tool_call.id = new_tool_call.id
+            if new_tool_call.type:
+                tool_call.type = new_tool_call.type
+            if new_tool_call.function.name:
+                tool_call.function.name = new_tool_call.function.name
+            if new_tool_call.function.arguments:
+                tool_call.function.arguments += new_tool_call.function.arguments
+        return tools_calls
+
    def _handle_generate_stream_response(
        self, model: str, credentials: dict, response: requests.Response, prompt_messages: list[PromptMessage]
    ) -> Generator:
@ -410,69 +475,15 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):
        :param prompt_messages: prompt messages
        :return: llm response chunk generator
        """
-        full_assistant_content = ""
        chunk_index = 0
-
-        def create_final_llm_result_chunk(
-            id: Optional[str], index: int, message: AssistantPromptMessage, finish_reason: str, usage: dict
-        ) -> LLMResultChunk:
-            # calculate num tokens
-            prompt_tokens = usage and usage.get("prompt_tokens")
-            if prompt_tokens is None:
-                prompt_tokens = self._num_tokens_from_string(model, prompt_messages[0].content)
-            completion_tokens = usage and usage.get("completion_tokens")
-            if completion_tokens is None:
-                completion_tokens = self._num_tokens_from_string(model, full_assistant_content)
-
-            # transform usage
-            usage = self._calc_response_usage(model, credentials, prompt_tokens, completion_tokens)
-
-            return LLMResultChunk(
-                id=id,
-                model=model,
-                prompt_messages=prompt_messages,
-                delta=LLMResultChunkDelta(index=index, message=message, finish_reason=finish_reason, usage=usage),
-            )
-
+        full_assistant_content = ""
+        tools_calls: list[AssistantPromptMessage.ToolCall] = []
+        finish_reason = None
+        usage = None
+        is_reasoning_started = False
        # delimiter for stream response, need unicode_escape
-        import codecs
-
        delimiter = credentials.get("stream_mode_delimiter", "\n\n")
        delimiter = codecs.decode(delimiter, "unicode_escape")
-
-        tools_calls: list[AssistantPromptMessage.ToolCall] = []
-
-        def increase_tool_call(new_tool_calls: list[AssistantPromptMessage.ToolCall]):
-            def get_tool_call(tool_call_id: str):
-                if not tool_call_id:
-                    return tools_calls[-1]
-
-                tool_call = next((tool_call for tool_call in tools_calls if tool_call.id == tool_call_id), None)
-                if tool_call is None:
-                    tool_call = AssistantPromptMessage.ToolCall(
-                        id=tool_call_id,
-                        type="function",
-                        function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="", arguments=""),
-                    )
-                    tools_calls.append(tool_call)
-
-                return tool_call
-
-            for new_tool_call in new_tool_calls:
-                # get tool call
-                tool_call = get_tool_call(new_tool_call.function.name)
-                # update tool call
-                if new_tool_call.id:
-                    tool_call.id = new_tool_call.id
-                if new_tool_call.type:
-                    tool_call.type = new_tool_call.type
-                if new_tool_call.function.name:
-                    tool_call.function.name = new_tool_call.function.name
-                if new_tool_call.function.arguments:
-                    tool_call.function.arguments += new_tool_call.function.arguments
-
-        finish_reason = None  # The default value of finish_reason is None
-        message_id, usage = None, None
        for chunk in response.iter_lines(decode_unicode=True, delimiter=delimiter):
            chunk = chunk.strip()
            if chunk:
@ -487,12 +498,15 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):
                    chunk_json: dict = json.loads(decoded_chunk)
                # stream ended
                except json.JSONDecodeError as e:
-                    yield create_final_llm_result_chunk(
-                        id=message_id,
+                    yield self._create_final_llm_result_chunk(
                        index=chunk_index + 1,
                        message=AssistantPromptMessage(content=""),
                        finish_reason="Non-JSON encountered.",
                        usage=usage,
+                        model=model,
+                        credentials=credentials,
+                        prompt_messages=prompt_messages,
+                        full_content=full_assistant_content,
                    )
                    break
                # handle the error here. for issue #11629
@ -507,12 +521,14 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):

                choice = chunk_json["choices"][0]
                finish_reason = chunk_json["choices"][0].get("finish_reason")
-                message_id = chunk_json.get("id")
                chunk_index += 1

                if "delta" in choice:
                    delta = choice["delta"]
-                    delta_content = delta.get("content")
+                    delta_content, is_reasoning_started = self._wrap_thinking_by_reasoning_content(
+                        delta, is_reasoning_started
+                    )
+                    delta_content = self._wrap_thinking_by_tag(delta_content)

                    assistant_message_tool_calls = None

@ -526,12 +542,10 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):
                            {"id": "tool_call_id", "type": "function", "function": delta.get("function_call", {})}
                        ]

-                    # assistant_message_function_call = delta.delta.function_call
-
                    # extract tool calls from response
                    if assistant_message_tool_calls:
                        tool_calls = self._extract_response_tool_calls(assistant_message_tool_calls)
-                        increase_tool_call(tool_calls)
+                        tools_calls = self._increase_tool_call(tool_calls, tools_calls)

                    if delta_content is None or delta_content == "":
                        continue
@ -556,7 +570,6 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):
                    continue

                yield LLMResultChunk(
-                    id=message_id,
                    model=model,
                    prompt_messages=prompt_messages,
                    delta=LLMResultChunkDelta(
@ -569,7 +582,6 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):

        if tools_calls:
            yield LLMResultChunk(
-                id=message_id,
                model=model,
                prompt_messages=prompt_messages,
                delta=LLMResultChunkDelta(
@ -578,12 +590,15 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):
                ),
            )

-        yield create_final_llm_result_chunk(
-            id=message_id,
+        yield self._create_final_llm_result_chunk(
            index=chunk_index,
            message=AssistantPromptMessage(content=""),
            finish_reason=finish_reason,
            usage=usage,
+            model=model,
+            credentials=credentials,
+            prompt_messages=prompt_messages,
+            full_content=full_assistant_content,
        )

    def _handle_generate_response(
@ -697,12 +712,11 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):
        return message_dict

    def _num_tokens_from_string(
-        self, model: str, text: Union[str, list[PromptMessageContent]], tools: Optional[list[PromptMessageTool]] = None
+        self, text: Union[str, list[PromptMessageContent]], tools: Optional[list[PromptMessageTool]] = None
    ) -> int:
        """
        Approximate num tokens for model with gpt2 tokenizer.

-        :param model: model name
        :param text: prompt text
        :param tools: tools for tool calling
        :return: number of tokens
@ -725,7 +739,6 @@ class OAIAPICompatLargeLanguageModel(_CommonOaiApiCompat, LargeLanguageModel):

    def _num_tokens_from_messages(
        self,
-        model: str,
        messages: list[PromptMessage],
        tools: Optional[list[PromptMessageTool]] = None,
        credentials: Optional[dict] = None,
--- a/api/core/model_runtime/model_providers/openrouter/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/openrouter/llm/_position.yaml
@ -1,5 +1,7 @@
 - openai/o1-preview
 - openai/o1-mini
+- openai/o3-mini
+- openai/o3-mini-2025-01-31
 - openai/gpt-4o
 - openai/gpt-4o-mini
 - openai/gpt-4
@ -28,5 +30,6 @@
 - mistralai/mistral-7b-instruct
 - qwen/qwen-2.5-72b-instruct
 - qwen/qwen-2-72b-instruct
+- deepseek/deepseek-r1
 - deepseek/deepseek-chat
 - deepseek/deepseek-coder
--- a/api/core/model_runtime/model_providers/openrouter/llm/deepseek-chat.yaml
+++ b/api/core/model_runtime/model_providers/openrouter/llm/deepseek-chat.yaml
@ -53,7 +53,7 @@ parameter_rules:
      zh_Hans: 介于 -2.0 和 2.0 之间的数字。如果该值为正，那么新 token 会根据其在已有文本中的出现频率受到相应的惩罚，降低模型重复相同内容的可能性。
      en_US: A number between -2.0 and 2.0. If the value is positive, new tokens are penalized based on their frequency of occurrence in existing text, reducing the likelihood that the model will repeat the same content.
 pricing:
-  input: "0.14"
-  output: "0.28"
+  input: "0.49"
+  output: "0.89"
  unit: "0.000001"
  currency: USD
--- a/api/core/model_runtime/model_providers/openrouter/llm/deepseek-r1.yaml
+++ b/api/core/model_runtime/model_providers/openrouter/llm/deepseek-r1.yaml
@ -0,0 +1,59 @@
+model: deepseek/deepseek-r1
+label:
+  en_US: deepseek-r1
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 163840
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    type: float
+    default: 1
+    min: 0.0
+    max: 2.0
+    help:
+      zh_Hans: 控制生成结果的多样性和随机性。数值越小，越严谨；数值越大，越发散。
+      en_US: Control the diversity and randomness of generated results. The smaller the value, the more rigorous it is; the larger the value, the more divergent it is.
+  - name: max_tokens
+    use_template: max_tokens
+    type: int
+    default: 4096
+    min: 1
+    max: 4096
+    help:
+      zh_Hans: 指定生成结果长度的上限。如果生成结果截断，可以调大该参数。
+      en_US: Specifies the upper limit on the length of generated results. If the generated results are truncated, you can increase this parameter.
+  - name: top_p
+    use_template: top_p
+    type: float
+    default: 1
+    min: 0.01
+    max: 1.00
+    help:
+      zh_Hans: 控制生成结果的随机性。数值越小，随机性越弱；数值越大，随机性越强。一般而言，top_p 和 temperature 两个参数选择一个进行调整即可。
+      en_US: Control the randomness of generated results. The smaller the value, the weaker the randomness; the larger the value, the stronger the randomness. Generally speaking, you can adjust one of the two parameters top_p and temperature.
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: frequency_penalty
+    use_template: frequency_penalty
+    default: 0
+    min: -2.0
+    max: 2.0
+    help:
+      zh_Hans: 介于 -2.0 和 2.0 之间的数字。如果该值为正，那么新 token 会根据其在已有文本中的出现频率受到相应的惩罚，降低模型重复相同内容的可能性。
+      en_US: A number between -2.0 and 2.0. If the value is positive, new tokens are penalized based on their frequency of occurrence in existing text, reducing the likelihood that the model will repeat the same content.
+pricing:
+  input: "3"
+  output: "8"
+  unit: "0.000001"
+  currency: USD
--- a/api/core/model_runtime/model_providers/openrouter/llm/o3-mini-2025-01-31.yaml
+++ b/api/core/model_runtime/model_providers/openrouter/llm/o3-mini-2025-01-31.yaml
@ -0,0 +1,49 @@
+model: openai/o3-mini-2025-01-31
+label:
+  en_US: o3-mini-2025-01-31
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 200000
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: presence_penalty
+    use_template: presence_penalty
+  - name: frequency_penalty
+    use_template: frequency_penalty
+  - name: max_tokens
+    use_template: max_tokens
+    default: 512
+    min: 1
+    max: 100000
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: response_format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: "1.10"
+  output: "4.40"
+  unit: "0.000001"
+  currency: USD
--- a/api/core/model_runtime/model_providers/openrouter/llm/o3-mini.yaml
+++ b/api/core/model_runtime/model_providers/openrouter/llm/o3-mini.yaml
@ -0,0 +1,49 @@
+model: openai/o3-mini
+label:
+  en_US: o3-mini
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 200000
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: presence_penalty
+    use_template: presence_penalty
+  - name: frequency_penalty
+    use_template: frequency_penalty
+  - name: max_tokens
+    use_template: max_tokens
+    default: 512
+    min: 1
+    max: 100000
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: response_format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: "1.10"
+  output: "4.40"
+  unit: "0.000001"
+  currency: USD
--- a/api/core/model_runtime/model_providers/siliconflow/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/_position.yaml
@ -12,7 +12,18 @@
 - Pro/Qwen/Qwen2-VL-7B-Instruct
 - OpenGVLab/InternVL2-26B
 - Pro/OpenGVLab/InternVL2-8B
+- deepseek-ai/DeepSeek-R1
+- deepseek-ai/DeepSeek-V2-Chat
 - deepseek-ai/DeepSeek-V2.5
+- deepseek-ai/DeepSeek-V3
+- deepseek-ai/DeepSeek-Coder-V2-Instruct
+- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
+- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
+- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
+- deepseek-ai/Janus-Pro-7B
 - THUDM/glm-4-9b-chat
 - 01-ai/Yi-1.5-34B-Chat-16K
 - 01-ai/Yi-1.5-9B-Chat-16K
@ -25,3 +36,4 @@
 - meta-llama/Meta-Llama-3.1-8B-Instruct
 - google/gemma-2-27b-it
 - google/gemma-2-9b-it
+- Tencent/Hunyuan-A52B-Instruct
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-llama-70B.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-llama-70B.yaml
@ -0,0 +1,21 @@
+model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
+label:
+  zh_Hans: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
+  en_US: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 32000
+parameter_rules:
+  - name: max_tokens
+    use_template: max_tokens
+    min: 1
+    max: 8192
+    default: 4096
+pricing:
+  input: "0.00"
+  output: "4.3"
+  unit: "0.000001"
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-llama-8B.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-llama-8B.yaml
@ -0,0 +1,21 @@
+model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+label:
+  zh_Hans: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+  en_US: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 32000
+parameter_rules:
+  - name: max_tokens
+    use_template: max_tokens
+    min: 1
+    max: 8192
+    default: 4096
+pricing:
+  input: "0.00"
+  output: "0.00"
+  unit: "0.000001"
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-qwen-1.5B.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-qwen-1.5B.yaml
@ -0,0 +1,21 @@
+model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+label:
+  zh_Hans: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+  en_US: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 32000
+parameter_rules:
+  - name: max_tokens
+    use_template: max_tokens
+    min: 1
+    max: 8192
+    default: 4096
+pricing:
+  input: "0.00"
+  output: "1.26"
+  unit: "0.000001"
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-qwen-14B.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-qwen-14B.yaml
@ -0,0 +1,21 @@
+model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
+label:
+  zh_Hans: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
+  en_US: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 32000
+parameter_rules:
+  - name: max_tokens
+    use_template: max_tokens
+    min: 1
+    max: 8192
+    default: 4096
+pricing:
+  input: "0.00"
+  output: "0.70"
+  unit: "0.000001"
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-qwen-32B.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-qwen-32B.yaml
@ -0,0 +1,21 @@
+model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
+label:
+  zh_Hans: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
+  en_US: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 32000
+parameter_rules:
+  - name: max_tokens
+    use_template: max_tokens
+    min: 1
+    max: 8192
+    default: 4096
+pricing:
+  input: "0.00"
+  output: "1.26"
+  unit: "0.000001"
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-qwen-7B.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1-distill-qwen-7B.yaml
@ -0,0 +1,21 @@
+model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+label:
+  zh_Hans: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+  en_US: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 32000
+parameter_rules:
+  - name: max_tokens
+    use_template: max_tokens
+    min: 1
+    max: 8192
+    default: 4096
+pricing:
+  input: "0.00"
+  output: "0.00"
+  unit: "0.000001"
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-r1.yaml
@ -0,0 +1,21 @@
+model: deepseek-ai/DeepSeek-R1
+label:
+  zh_Hans: deepseek-ai/DeepSeek-R1
+  en_US: deepseek-ai/DeepSeek-R1
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 64000
+parameter_rules:
+  - name: max_tokens
+    use_template: max_tokens
+    min: 1
+    max: 8192
+    default: 4096
+pricing:
+  input: "4"
+  output: "16"
+  unit: "0.000001"
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-v3.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-v3.yaml
@ -0,0 +1,53 @@
+model: deepseek-ai/DeepSeek-V3
+label:
+  en_US: deepseek-ai/DeepSeek-V3
+model_type: llm
+features:
+  - agent-thought
+  - tool-call
+  - stream-tool-call
+model_properties:
+  mode: chat
+  context_size: 64000
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: max_tokens
+    use_template: max_tokens
+    type: int
+    default: 512
+    min: 1
+    max: 4096
+    help:
+      zh_Hans: 指定生成结果长度的上限。如果生成结果截断，可以调大该参数。
+      en_US: Specifies the upper limit on the length of generated results. If the generated results are truncated, you can increase this parameter.
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: frequency_penalty
+    use_template: frequency_penalty
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: Response Format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: "1"
+  output: "2"
+  unit: "0.000001"
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/janus-pro-7B.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/janus-pro-7B.yaml
@ -0,0 +1,22 @@
+model: deepseek-ai/Janus-Pro-7B
+label:
+  zh_Hans: deepseek-ai/Janus-Pro-7B
+  en_US: deepseek-ai/Janus-Pro-7B
+model_type: llm
+features:
+  - agent-thought
+  - vision
+model_properties:
+  mode: chat
+  context_size: 32000
+parameter_rules:
+  - name: max_tokens
+    use_template: max_tokens
+    min: 1
+    max: 8192
+    default: 4096
+pricing:
+  input: "0.00"
+  output: "0.00"
+  unit: "0.000001"
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/llm.py
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/llm.py
@ -1,13 +1,9 @@
-import json
 from collections.abc import Generator
 from typing import Optional, Union

-import requests
-
 from core.model_runtime.entities.common_entities import I18nObject
-from core.model_runtime.entities.llm_entities import LLMMode, LLMResult, LLMResultChunk, LLMResultChunkDelta
+from core.model_runtime.entities.llm_entities import LLMMode, LLMResult
 from core.model_runtime.entities.message_entities import (
-    AssistantPromptMessage,
    PromptMessage,
    PromptMessageTool,
 )
@ -96,208 +92,3 @@ class SiliconflowLargeLanguageModel(OAIAPICompatLargeLanguageModel):
                ),
            ],
        )
-
-    def _handle_generate_stream_response(
-        self, model: str, credentials: dict, response: requests.Response, prompt_messages: list[PromptMessage]
-    ) -> Generator:
-        """
-        Handle llm stream response
-
-        :param model: model name
-        :param credentials: model credentials
-        :param response: streamed response
-        :param prompt_messages: prompt messages
-        :return: llm response chunk generator
-        """
-        full_assistant_content = ""
-        chunk_index = 0
-        is_reasoning_started = False  # Add flag to track reasoning state
-
-        def create_final_llm_result_chunk(
-            id: Optional[str], index: int, message: AssistantPromptMessage, finish_reason: str, usage: dict
-        ) -> LLMResultChunk:
-            # calculate num tokens
-            prompt_tokens = usage and usage.get("prompt_tokens")
-            if prompt_tokens is None:
-                prompt_tokens = self._num_tokens_from_string(model, prompt_messages[0].content)
-            completion_tokens = usage and usage.get("completion_tokens")
-            if completion_tokens is None:
-                completion_tokens = self._num_tokens_from_string(model, full_assistant_content)
-
-            # transform usage
-            usage = self._calc_response_usage(model, credentials, prompt_tokens, completion_tokens)
-
-            return LLMResultChunk(
-                id=id,
-                model=model,
-                prompt_messages=prompt_messages,
-                delta=LLMResultChunkDelta(index=index, message=message, finish_reason=finish_reason, usage=usage),
-            )
-
-        # delimiter for stream response, need unicode_escape
-        import codecs
-
-        delimiter = credentials.get("stream_mode_delimiter", "\n\n")
-        delimiter = codecs.decode(delimiter, "unicode_escape")
-
-        tools_calls: list[AssistantPromptMessage.ToolCall] = []
-
-        def increase_tool_call(new_tool_calls: list[AssistantPromptMessage.ToolCall]):
-            def get_tool_call(tool_call_id: str):
-                if not tool_call_id:
-                    return tools_calls[-1]
-
-                tool_call = next((tool_call for tool_call in tools_calls if tool_call.id == tool_call_id), None)
-                if tool_call is None:
-                    tool_call = AssistantPromptMessage.ToolCall(
-                        id=tool_call_id,
-                        type="function",
-                        function=AssistantPromptMessage.ToolCall.ToolCallFunction(name="", arguments=""),
-                    )
-                    tools_calls.append(tool_call)
-
-                return tool_call
-
-            for new_tool_call in new_tool_calls:
-                # get tool call
-                tool_call = get_tool_call(new_tool_call.function.name)
-                # update tool call
-                if new_tool_call.id:
-                    tool_call.id = new_tool_call.id
-                if new_tool_call.type:
-                    tool_call.type = new_tool_call.type
-                if new_tool_call.function.name:
-                    tool_call.function.name = new_tool_call.function.name
-                if new_tool_call.function.arguments:
-                    tool_call.function.arguments += new_tool_call.function.arguments
-
-        finish_reason = None  # The default value of finish_reason is None
-        message_id, usage = None, None
-        for chunk in response.iter_lines(decode_unicode=True, delimiter=delimiter):
-            chunk = chunk.strip()
-            if chunk:
-                # ignore sse comments
-                if chunk.startswith(":"):
-                    continue
-                decoded_chunk = chunk.strip().removeprefix("data:").lstrip()
-                if decoded_chunk == "[DONE]":  # Some provider returns "data: [DONE]"
-                    continue
-
-                try:
-                    chunk_json: dict = json.loads(decoded_chunk)
-                # stream ended
-                except json.JSONDecodeError as e:
-                    yield create_final_llm_result_chunk(
-                        id=message_id,
-                        index=chunk_index + 1,
-                        message=AssistantPromptMessage(content=""),
-                        finish_reason="Non-JSON encountered.",
-                        usage=usage,
-                    )
-                    break
-                # handle the error here. for issue #11629
-                if chunk_json.get("error") and chunk_json.get("choices") is None:
-                    raise ValueError(chunk_json.get("error"))
-
-                if chunk_json:
-                    if u := chunk_json.get("usage"):
-                        usage = u
-                if not chunk_json or len(chunk_json["choices"]) == 0:
-                    continue
-
-                choice = chunk_json["choices"][0]
-                finish_reason = chunk_json["choices"][0].get("finish_reason")
-                message_id = chunk_json.get("id")
-                chunk_index += 1
-
-                if "delta" in choice:
-                    delta = choice["delta"]
-                    delta_content = delta.get("content")
-
-                    assistant_message_tool_calls = None
-
-                    if "tool_calls" in delta and credentials.get("function_calling_type", "no_call") == "tool_call":
-                        assistant_message_tool_calls = delta.get("tool_calls", None)
-                    elif (
-                        "function_call" in delta
-                        and credentials.get("function_calling_type", "no_call") == "function_call"
-                    ):
-                        assistant_message_tool_calls = [
-                            {"id": "tool_call_id", "type": "function", "function": delta.get("function_call", {})}
-                        ]
-
-                    # assistant_message_function_call = delta.delta.function_call
-
-                    # extract tool calls from response
-                    if assistant_message_tool_calls:
-                        tool_calls = self._extract_response_tool_calls(assistant_message_tool_calls)
-                        increase_tool_call(tool_calls)
-
-                    if delta_content is None or delta_content == "":
-                        continue
-
-                    # Check for think tags
-                    if "<think>" in delta_content:
-                        is_reasoning_started = True
-                        # Remove <think> tag and add markdown quote
-                        delta_content = "> 💭 " + delta_content.replace("<think>", "")
-                    elif "</think>" in delta_content:
-                        # Remove </think> tag and add newlines to end quote block
-                        delta_content = delta_content.replace("</think>", "") + "\n\n"
-                        is_reasoning_started = False
-                    elif is_reasoning_started:
-                        # Add quote markers for content within thinking block
-                        if "\n\n" in delta_content:
-                            delta_content = delta_content.replace("\n\n", "\n> ")
-                        elif "\n" in delta_content:
-                            delta_content = delta_content.replace("\n", "\n> ")
-
-                    # transform assistant message to prompt message
-                    assistant_prompt_message = AssistantPromptMessage(
-                        content=delta_content,
-                    )
-
-                    # reset tool calls
-                    tool_calls = []
-                    full_assistant_content += delta_content
-                elif "text" in choice:
-                    choice_text = choice.get("text", "")
-                    if choice_text == "":
-                        continue
-
-                    # transform assistant message to prompt message
-                    assistant_prompt_message = AssistantPromptMessage(content=choice_text)
-                    full_assistant_content += choice_text
-                else:
-                    continue
-
-                yield LLMResultChunk(
-                    id=message_id,
-                    model=model,
-                    prompt_messages=prompt_messages,
-                    delta=LLMResultChunkDelta(
-                        index=chunk_index,
-                        message=assistant_prompt_message,
-                    ),
-                )
-
-            chunk_index += 1
-
-        if tools_calls:
-            yield LLMResultChunk(
-                id=message_id,
-                model=model,
-                prompt_messages=prompt_messages,
-                delta=LLMResultChunkDelta(
-                    index=chunk_index,
-                    message=AssistantPromptMessage(tool_calls=tools_calls, content=""),
-                ),
-            )
-
-        yield create_final_llm_result_chunk(
-            id=message_id,
-            index=chunk_index,
-            message=AssistantPromptMessage(content=""),
-            finish_reason=finish_reason,
-            usage=usage,
-        )
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-0107.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-0107.yaml
@ -69,6 +69,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-0403.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-0403.yaml
@ -69,6 +69,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-0428.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-0428.yaml
@ -69,6 +69,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-0919.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-0919.yaml
@ -69,6 +69,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-1201.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-1201.yaml
@ -68,6 +68,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-latest.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-latest.yaml
@ -69,6 +69,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-longcontext.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-max-longcontext.yaml
@ -69,6 +69,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-0206.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-0206.yaml
@ -67,6 +67,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-0624.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-0624.yaml
@ -67,6 +67,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-0723.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-0723.yaml
@ -67,6 +67,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-0806.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-0806.yaml
@ -67,6 +67,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-0919.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-0919.yaml
@ -67,6 +67,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-chat.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-chat.yaml
@ -69,6 +69,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-latest.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-plus-latest.yaml
@ -67,6 +67,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-turbo-0206.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-turbo-0206.yaml
@ -68,6 +68,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-turbo-0624.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-turbo-0624.yaml
@ -67,6 +67,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-turbo-0919.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-turbo-0919.yaml
@ -67,6 +67,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-turbo-chat.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-turbo-chat.yaml
@ -69,6 +69,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/tongyi/llm/qwen-turbo-latest.yaml
+++ b/api/core/model_runtime/model_providers/tongyi/llm/qwen-turbo-latest.yaml
@ -67,6 +67,15 @@ parameter_rules:
    help:
      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: enable_search
+    type: boolean
+    default: false
+    label:
+      zh_Hans: 联网搜索
+      en_US: Web Search
+    help:
+      zh_Hans: 模型内置了互联网搜索服务，该参数控制模型在生成文本时是否参考使用互联网搜索结果。启用互联网搜索，模型会将搜索结果作为文本生成过程中的参考信息，但模型会基于其内部逻辑“自行判断”是否使用互联网搜索结果。
+      en_US: The model has a built-in Internet search service. This parameter controls whether the model refers to Internet search results when generating text. When Internet search is enabled, the model will use the search results as reference information in the text generation process, but the model will "judge" whether to use Internet search results based on its internal logic.
  - name: response_format
    use_template: response_format
 pricing:
--- a/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-2.0-flash-001.yaml
+++ b/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-2.0-flash-001.yaml
@ -0,0 +1,41 @@
+model: gemini-2.0-flash-001
+label:
+  en_US: Gemini 2.0 Flash 001
+model_type: llm
+features:
+  - agent-thought
+  - vision
+  - tool-call
+  - stream-tool-call
+  - document
+  - video
+  - audio
+model_properties:
+  mode: chat
+  context_size: 1048576
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: max_output_tokens
+    use_template: max_tokens
+    default: 8192
+    min: 1
+    max: 8192
+  - name: json_schema
+    use_template: json_schema
+pricing:
+  input: '0.00'
+  output: '0.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-2.0-flash-lite-preview-02-05.yaml
+++ b/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-2.0-flash-lite-preview-02-05.yaml
@ -0,0 +1,41 @@
+model: gemini-2.0-flash-lite-preview-02-05
+label:
+  en_US: Gemini 2.0 Flash Lite Preview 0205
+model_type: llm
+features:
+  - agent-thought
+  - vision
+  - tool-call
+  - stream-tool-call
+  - document
+  - video
+  - audio
+model_properties:
+  mode: chat
+  context_size: 1048576
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: max_output_tokens
+    use_template: max_tokens
+    default: 8192
+    min: 1
+    max: 8192
+  - name: json_schema
+    use_template: json_schema
+pricing:
+  input: '0.00'
+  output: '0.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-2.0-flash-thinking-exp-01-21.yaml
+++ b/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-2.0-flash-thinking-exp-01-21.yaml
@ -0,0 +1,39 @@
+model: gemini-2.0-flash-thinking-exp-01-21
+label:
+  en_US: Gemini 2.0 Flash Thinking Exp 0121
+model_type: llm
+features:
+  - agent-thought
+  - vision
+  - document
+  - video
+  - audio
+model_properties:
+  mode: chat
+  context_size: 32767
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: max_output_tokens
+    use_template: max_tokens
+    default: 8192
+    min: 1
+    max: 8192
+  - name: json_schema
+    use_template: json_schema
+pricing:
+  input: '0.00'
+  output: '0.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-2.0-flash-thinking-exp-1219.yaml
+++ b/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-2.0-flash-thinking-exp-1219.yaml
@ -0,0 +1,39 @@
+model: gemini-2.0-flash-thinking-exp-1219
+label:
+  en_US: Gemini 2.0 Flash Thinking Exp 1219
+model_type: llm
+features:
+  - agent-thought
+  - vision
+  - document
+  - video
+  - audio
+model_properties:
+  mode: chat
+  context_size: 32767
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: max_output_tokens
+    use_template: max_tokens
+    default: 8192
+    min: 1
+    max: 8192
+  - name: json_schema
+    use_template: json_schema
+pricing:
+  input: '0.00'
+  output: '0.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-2.0-pro-exp-02-05.yaml
+++ b/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-2.0-pro-exp-02-05.yaml
@ -0,0 +1,37 @@
+model: gemini-2.0-pro-exp-02-05
+label:
+  en_US: Gemini 2.0 Pro Exp 0205
+model_type: llm
+features:
+  - agent-thought
+  - document
+model_properties:
+  mode: chat
+  context_size: 2000000
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      en_US: Top k
+    type: int
+    help:
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: presence_penalty
+    use_template: presence_penalty
+  - name: frequency_penalty
+    use_template: frequency_penalty
+  - name: max_output_tokens
+    use_template: max_tokens
+    required: true
+    default: 8192
+    min: 1
+    max: 8192
+pricing:
+  input: '0.00'
+  output: '0.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-exp-1114.yaml
+++ b/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-exp-1114.yaml
@ -0,0 +1,41 @@
+model: gemini-exp-1114
+label:
+  en_US: Gemini exp 1114
+model_type: llm
+features:
+  - agent-thought
+  - vision
+  - tool-call
+  - stream-tool-call
+  - document
+  - video
+  - audio
+model_properties:
+  mode: chat
+  context_size: 32767
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: max_output_tokens
+    use_template: max_tokens
+    default: 8192
+    min: 1
+    max: 8192
+  - name: json_schema
+    use_template: json_schema
+pricing:
+  input: '0.00'
+  output: '0.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-exp-1121.yaml
+++ b/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-exp-1121.yaml
@ -0,0 +1,41 @@
+model: gemini-exp-1121
+label:
+  en_US: Gemini exp 1121
+model_type: llm
+features:
+  - agent-thought
+  - vision
+  - tool-call
+  - stream-tool-call
+  - document
+  - video
+  - audio
+model_properties:
+  mode: chat
+  context_size: 32767
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: max_output_tokens
+    use_template: max_tokens
+    default: 8192
+    min: 1
+    max: 8192
+  - name: json_schema
+    use_template: json_schema
+pricing:
+  input: '0.00'
+  output: '0.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-exp-1206.yaml
+++ b/api/core/model_runtime/model_providers/vertex_ai/llm/gemini-exp-1206.yaml
@ -0,0 +1,41 @@
+model: gemini-exp-1206
+label:
+  en_US: Gemini exp 1206
+model_type: llm
+features:
+  - agent-thought
+  - vision
+  - tool-call
+  - stream-tool-call
+  - document
+  - video
+  - audio
+model_properties:
+  mode: chat
+  context_size: 2097152
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+  - name: top_p
+    use_template: top_p
+  - name: top_k
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    type: int
+    help:
+      zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+      en_US: Only sample from the top K options for each subsequent token.
+    required: false
+  - name: max_output_tokens
+    use_template: max_tokens
+    default: 8192
+    min: 1
+    max: 8192
+  - name: json_schema
+    use_template: json_schema
+pricing:
+  input: '0.00'
+  output: '0.00'
+  unit: '0.000001'
+  currency: USD
--- a/api/core/model_runtime/model_providers/volcengine_maas/llm/llm.py
+++ b/api/core/model_runtime/model_providers/volcengine_maas/llm/llm.py
@ -1,4 +1,5 @@
 import logging
+import re
 from collections.abc import Generator
 from typing import Optional

@ -247,15 +248,34 @@ class VolcengineMaaSLargeLanguageModel(LargeLanguageModel):
            req_params["tools"] = tools

        def _handle_stream_chat_response(chunks: Generator[ChatCompletionChunk]) -> Generator:
+            is_reasoning_started = False
            for chunk in chunks:
+                content = ""
+                if chunk.choices:
+                    delta = chunk.choices[0].delta
+                    if is_reasoning_started and not hasattr(delta, "reasoning_content") and not delta.content:
+                        content = ""
+                    elif hasattr(delta, "reasoning_content"):
+                        if not is_reasoning_started:
+                            is_reasoning_started = True
+                            content = "> 💭 " + delta.reasoning_content
+                        else:
+                            content = delta.reasoning_content
+
+                        if "\n" in content:
+                            content = re.sub(r"\n(?!(>|\n))", "\n> ", content)
+                    elif is_reasoning_started:
+                        content = "\n\n" + delta.content
+                        is_reasoning_started = False
+                    else:
+                        content = delta.content
+
                yield LLMResultChunk(
                    model=model,
                    prompt_messages=prompt_messages,
                    delta=LLMResultChunkDelta(
                        index=0,
-                        message=AssistantPromptMessage(
-                            content=chunk.choices[0].delta.content if chunk.choices else "", tool_calls=[]
-                        ),
+                        message=AssistantPromptMessage(content=content, tool_calls=[]),
                        usage=self._calc_response_usage(
                            model=model,
                            credentials=credentials,
--- a/api/core/model_runtime/model_providers/volcengine_maas/llm/models.py
+++ b/api/core/model_runtime/model_providers/volcengine_maas/llm/models.py
@ -18,6 +18,22 @@ class ModelConfig(BaseModel):


 configs: dict[str, ModelConfig] = {
+    "DeepSeek-R1-Distill-Qwen-32B": ModelConfig(
+        properties=ModelProperties(context_size=64000, max_tokens=8192, mode=LLMMode.CHAT),
+        features=[ModelFeature.AGENT_THOUGHT],
+    ),
+    "DeepSeek-R1-Distill-Qwen-7B": ModelConfig(
+        properties=ModelProperties(context_size=64000, max_tokens=8192, mode=LLMMode.CHAT),
+        features=[ModelFeature.AGENT_THOUGHT],
+    ),
+    "DeepSeek-R1": ModelConfig(
+        properties=ModelProperties(context_size=64000, max_tokens=8192, mode=LLMMode.CHAT),
+        features=[ModelFeature.AGENT_THOUGHT],
+    ),
+    "DeepSeek-V3": ModelConfig(
+        properties=ModelProperties(context_size=64000, max_tokens=8192, mode=LLMMode.CHAT),
+        features=[ModelFeature.AGENT_THOUGHT, ModelFeature.TOOL_CALL, ModelFeature.STREAM_TOOL_CALL],
+    ),
    "Doubao-1.5-vision-pro-32k": ModelConfig(
        properties=ModelProperties(context_size=32768, max_tokens=12288, mode=LLMMode.CHAT),
        features=[ModelFeature.AGENT_THOUGHT, ModelFeature.VISION],
--- a/api/core/model_runtime/model_providers/volcengine_maas/volcengine_maas.yaml
+++ b/api/core/model_runtime/model_providers/volcengine_maas/volcengine_maas.yaml
@ -118,6 +118,30 @@ model_credential_schema:
      type: select
      required: true
      options:
+        - label:
+            en_US: DeepSeek-R1-Distill-Qwen-32B
+          value: DeepSeek-R1-Distill-Qwen-32B
+          show_on:
+            - variable: __model_type
+              value: llm
+        - label:
+            en_US: DeepSeek-R1-Distill-Qwen-7B
+          value: DeepSeek-R1-Distill-Qwen-7B
+          show_on:
+            - variable: __model_type
+              value: llm
+        - label:
+            en_US: DeepSeek-R1
+          value: DeepSeek-R1
+          show_on:
+            - variable: __model_type
+              value: llm
+        - label:
+            en_US: DeepSeek-V3
+          value: DeepSeek-V3
+          show_on:
+            - variable: __model_type
+              value: llm
        - label:
            en_US: Doubao-1.5-vision-pro-32k
          value: Doubao-1.5-vision-pro-32k
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
twwu	b31ee1f6f7	feat: update styling and improve accessibility in retrieval modal; add translation for retrieval method	2025-02-08 14:08:53 +08:00
twwu	bb45f646dc	feat: enhance styling and add history icon to dataset components	2025-02-08 11:52:57 +08:00
twwu	26bd253c2d	Merge branch 'main' into feat/knowledge-dark-mode	2025-02-08 10:53:46 +08:00
twwu	0756b49a7c	feat: improve styling and accessibility of dataset components	2025-02-08 10:39:28 +08:00
Xin Zhang	982bca5d40	fix: add rate limiting to prevent brute force on password reset (#13292 )	2025-02-08 10:28:31 +08:00
Kalo Chin	c8dcde6cd0	fix: Gemini 2.0 Flash 001 model yaml file naming (#13372 )	2025-02-08 09:12:42 +08:00
Riddhimaan-Senapati	8f9db61688	feat: added new silicon flow models (#13369 )	2025-02-08 09:12:22 +08:00
github-actions[bot]	ebdbaf34e6	chore: translate i18n files (#13349 ) Co-authored-by: JzoNgKVO <27049666+JzoNgKVO@users.noreply.github.com>	2025-02-07 22:41:25 +08:00
zhu-an	a081b1e79e	fix: add compatibility config for third-party S3-compatible providers (#13354 ) Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com>	2025-02-07 22:35:24 +08:00
Steven sun	38c31e64db	add enable_search parameter to qwen_max, plus, turbo (#13335 ) Co-authored-by: steven <sunzwj@digitalchina.com>	2025-02-07 22:16:26 +08:00
Yi Xiao	ae6f67420c	Chore: update app detail panel (#13337 )	2025-02-07 18:56:43 +08:00
twwu	25711ffae2	feat: enhance UI components with improved styling and icon updates	2025-02-07 16:49:10 +08:00
-LAN-	ca19bd31d4	chore(*): Bump version to 0.15.3 (#13308 ) Signed-off-by: -LAN- <laipz8200@outlook.com>	2025-02-07 15:20:05 +08:00
-LAN-	413dfd5628	feat: add completion mode and context size options for LLM configuration (#13325 ) Signed-off-by: -LAN- <laipz8200@outlook.com>	2025-02-07 15:08:53 +08:00
-LAN-	f9515901cc	fix: Azure AI Foundry model cannot be used in the workflow (#13323 ) Signed-off-by: -LAN- <laipz8200@outlook.com>	2025-02-07 14:52:57 +08:00
twwu	f127e10e0c	Merge branch 'main' into feat/knowledge-dark-mode	2025-02-07 14:30:14 +08:00
呆萌闷油瓶	3f42fabff8	chore:improve thinking display for llm from xinference and ollama pro… (#13318 )	2025-02-07 14:29:29 +08:00
-LAN-	1caa578771	chore(*): Update style of thinking (#13319 ) Signed-off-by: -LAN- <laipz8200@outlook.com>	2025-02-07 14:06:35 +08:00
Lazy_Frog	b7c11c1818	Fix the problem of Workflow terminates after parallel tasks execution, merge node not triggered (#12498 ) Co-authored-by: Novice Lee <novicelee@NoviPro.local>	2025-02-07 13:56:08 +08:00
非法操作	3eb3db0663	chore: refactor the OpenAICompatible and improve thinking display (#13299 )	2025-02-07 13:28:46 +08:00
-LAN-	be46f32056	fix(credits): require model name equals (#13314 ) Signed-off-by: -LAN- <laipz8200@outlook.com>	2025-02-07 13:28:17 +08:00
sino	6e5c915f96	feat(model): add deepseek-r1 for openrouter (#13312 )	2025-02-07 12:39:13 +08:00
-LAN-	04d13a8116	feat(credits): Allow to configure model-credit mapping (#13274 ) Signed-off-by: -LAN- <laipz8200@outlook.com>	2025-02-07 11:01:31 +08:00
Kemal	e638ede3f2	Update README_TR.md (#13294 )	2025-02-07 09:11:39 +08:00
Riddhimaan-Senapati	2348abe4bf	feat: added a couple of models not defined in vertex ai, that were already … (#13296 )	2025-02-07 09:11:25 +08:00
呆萌闷油瓶	f7e7a399d9	feat:add think tag display for xinference deepseek r1 (#13291 )	2025-02-06 22:04:58 +08:00
le0zh	ba91f34636	fix: incorrect transferMethod assignment for remote file (#13286 )	2025-02-06 19:32:21 +08:00
zhu-an	16865d43a8	feat: add deepseek models for volcengine provider (#13283 ) Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com>	2025-02-06 18:20:03 +08:00
呆萌闷油瓶	0d13aee15c	feat:add deepseek r1 think display for ollama provider (#13272 )	2025-02-06 15:32:10 +08:00
Wu Tianwei	49b4144ffd	fix: add dataset edit permissions (#13223 )	2025-02-06 14:26:16 +08:00
dependabot[bot]	186e2d972e	chore(deps): bump katex from 0.16.10 to 0.16.21 in /web (#13270 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-06 13:27:07 +08:00
engchina	40dd63ecef	Upgrade oracle models (#13174 ) Co-authored-by: engchina <atjapan2015@gmail.com>	2025-02-06 13:24:27 +08:00
-LAN-	6d66d6da15	feat(model_providers): Support deepseek-r1 for Nvidia Catalog (#13269 ) Signed-off-by: -LAN- <laipz8200@outlook.com>	2025-02-06 13:03:19 +08:00
weiwenyan-dev	03ec3513f3	Fix bug large data no render (#12683 ) Co-authored-by: ex_wenyan.wei <ex_wenyan.wei@tcl.com>	2025-02-06 13:00:04 +08:00
-LAN-	87763fc234	feat(model_providers): Support deepseek for Azure AI Foundry (#13267 ) Signed-off-by: -LAN- <laipz8200@outlook.com>	2025-02-06 12:45:48 +08:00
JasonVV	f6c44cae2e	feat(model): add gemini-2.0 model (#13266 )	2025-02-06 12:28:59 +08:00
xhe	da2ee04fce	fix: correct linewrap think display in generic openai api (#13260 ) Signed-off-by: xhe <xw897002528@gmail.com>	2025-02-06 10:53:08 +08:00
JasonVV	7673c36af3	feat(model): add gemini-2.0-flash-thinking-exp-01-21 (#13230 )	2025-02-06 10:01:00 +08:00
Riddhimaan-Senapati	9457b2af2f	feat: added models :gemini 2.0 flash 001 and gemini 2.0 pro exp 02-05 (#13247 )	2025-02-06 09:58:39 +08:00
k-zaku	7203991032	feat: add parameter "reasoning_effort" and Openai o3-mini (#13243 )	2025-02-06 09:29:48 +08:00
xhe	5a685f7156	feat: add think display for volcengine and generic openapi (#13234 ) Signed-off-by: xhe <xw897002528@gmail.com>	2025-02-06 09:24:40 +08:00
Riddhimaan-Senapati	a6a25030ad	fix: updated _position.yaml to include the latest model already integ… (#13245 )	2025-02-06 09:21:51 +08:00
Riddhimaan-Senapati	00458a31d5	feat: added deepseek r1 and v3 to siliconflow (#13238 )	2025-02-05 21:59:18 +08:00
-LAN-	c6ddf6d6cc	feat(model_providers): Add Groq DeepSeek-R1-Distill-Llama-70b (#13229 ) Signed-off-by: -LAN- <laipz8200@outlook.com>	2025-02-05 19:15:29 +08:00
Joshbly	34b21b3065	feat: Add o3-mini and o3-mini-2025-01-31 model variants (#13129 ) Co-authored-by: crazywoola <427733928@qq.com>	2025-02-05 17:04:45 +08:00
Bowen Liang	8fbb355cd2	chore: squash system dependencies installation steps (#13206 )	2025-02-05 16:42:53 +08:00
HQidea	e8b3b7e578	Fix new variables in the conversation opener would override prompt_variables (#13191 )	2025-02-05 16:16:00 +08:00
-LAN-	59ca44f493	chore(model_runtime): Move deepseek ahead in the providers list. (#13197 ) Signed-off-by: -LAN- <laipz8200@outlook.com>	2025-02-05 16:08:28 +08:00
Bowen Liang	9e1457c2c3	fix: mypy checks violation in AzureBlobStorage (#13215 )	2025-02-05 15:56:23 +08:00
te-chan	fac83e14bc	Use DefaultAzureCredential for managed identity in azure blob extention (#11559 )	2025-02-05 13:43:43 +08:00
Nam Vu	a97cec57e4	fix: SSRF proxy file descriptor leak in concurrent requests (#13108 )	2025-02-05 13:10:27 +08:00
Riddhimaan-Senapati	38c10b47d3	Feat: add linkedin to readme (#13203 )	2025-02-05 12:27:58 +08:00
MaFee921	1a2523fd15	feat: bedrock_endpoint_url (#12838 )	2025-02-05 12:24:24 +08:00
Warren Chen	03243cb422	Modify params for bedrock retrieve generate (#13182 )	2025-02-05 12:17:42 +08:00
Bowen Liang	2ad7ee0344	chore: add tests for build docker image when dockerfile changed (#10732 )	2025-02-05 11:40:22 +08:00
twwu	7616ef8c22	feat: enhance document picker styles for dark mode	2025-01-24 10:06:47 +08:00
twwu	6c69baf025	feat: update icons and styles in dataset components for improved UI consistency	2025-01-23 16:47:26 +08:00
twwu	08bd96f170	feat: update styling for dataset creation components and replace error message background	2025-01-23 15:46:12 +08:00
twwu	684f7188f4	Merge branch 'main' into feat/knowledge-dark-mode	2025-01-23 15:10:46 +08:00
twwu	ebad19c9f7	feat: update error message styles and add background gradients for dataset crawler	2025-01-23 15:09:35 +08:00
twwu	49674507c6	Merge branch 'main' into feat/knowledge-dark-mode	2025-01-22 14:31:44 +08:00
twwu	80ad81471b	refactor: remove unused CSS files and update translations for Firecrawl and Jina Reader	2025-01-22 14:30:14 +08:00