feat: introduce init container to automatically fix storage permissions (#29297 )

2026-01-22 13:05:24 +08:00 · 2025-12-09 13:40:10 +08:00
651 changed files with 11535 additions and 51849 deletions
--- a/.claude/skills/frontend-testing/CHECKLIST.md
+++ b/.claude/skills/frontend-testing/CHECKLIST.md
@ -1,205 +0,0 @@
-# Test Generation Checklist
-
-Use this checklist when generating or reviewing tests for Dify frontend components.
-
-## Pre-Generation
-
- [ ] Read the component source code completely
- [ ] Identify component type (component, hook, utility, page)
- [ ] Run `pnpm analyze-component <path>` if available
- [ ] Note complexity score and features detected
- [ ] Check for existing tests in the same directory
- [ ] **Identify ALL files in the directory** that need testing (not just index)
-
-## Testing Strategy
-
-### ⚠️ Incremental Workflow (CRITICAL for Multi-File)
-
- [ ] **NEVER generate all tests at once** - process one file at a time
- [ ] Order files by complexity: utilities → hooks → simple → complex → integration
- [ ] Create a todo list to track progress before starting
- [ ] For EACH file: write → run test → verify pass → then next
- [ ] **DO NOT proceed** to next file until current one passes
-
-### Path-Level Coverage
-
- [ ] **Test ALL files** in the assigned directory/path
- [ ] List all components, hooks, utilities that need coverage
- [ ] Decide: single spec file (integration) or multiple spec files (unit)
-
-### Complexity Assessment
-
- [ ] Run `pnpm analyze-component <path>` for complexity score
- [ ] **Complexity > 50**: Consider refactoring before testing
- [ ] **500+ lines**: Consider splitting before testing
- [ ] **30-50 complexity**: Use multiple describe blocks, organized structure
-
-### Integration vs Mocking
-
- [ ] **DO NOT mock base components** (`Loading`, `Button`, `Tooltip`, etc.)
- [ ] Import real project components instead of mocking
- [ ] Only mock: API calls, complex context providers, third-party libs with side effects
- [ ] Prefer integration testing when using single spec file
-
-## Required Test Sections
-
-### All Components MUST Have
-
- [ ] **Rendering tests** - Component renders without crashing
- [ ] **Props tests** - Required props, optional props, default values
- [ ] **Edge cases** - null, undefined, empty values, boundaries
-
-### Conditional Sections (Add When Feature Present)
-
-| Feature | Add Tests For |
-|---------|---------------|
-| `useState` | Initial state, transitions, cleanup |
-| `useEffect` | Execution, dependencies, cleanup |
-| Event handlers | onClick, onChange, onSubmit, keyboard |
-| API calls | Loading, success, error states |
-| Routing | Navigation, params, query strings |
-| `useCallback`/`useMemo` | Referential equality |
-| Context | Provider values, consumer behavior |
-| Forms | Validation, submission, error display |
-
-## Code Quality Checklist
-
-### Structure
-
- [ ] Uses `describe` blocks to group related tests
- [ ] Test names follow `should <behavior> when <condition>` pattern
- [ ] AAA pattern (Arrange-Act-Assert) is clear
- [ ] Comments explain complex test scenarios
-
-### Mocks
-
- [ ] **DO NOT mock base components** (`@/app/components/base/*`)
- [ ] `jest.clearAllMocks()` in `beforeEach` (not `afterEach`)
- [ ] Shared mock state reset in `beforeEach`
- [ ] i18n uses shared mock (auto-loaded); only override locally for custom translations
- [ ] Router mocks match actual Next.js API
- [ ] Mocks reflect actual component conditional behavior
- [ ] Only mock: API services, complex context providers, third-party libs
-
-### Queries
-
- [ ] Prefer semantic queries (`getByRole`, `getByLabelText`)
- [ ] Use `queryBy*` for absence assertions
- [ ] Use `findBy*` for async elements
- [ ] `getByTestId` only as last resort
-
-### Async
-
- [ ] All async tests use `async/await`
- [ ] `waitFor` wraps async assertions
- [ ] Fake timers properly setup/teardown
- [ ] No floating promises
-
-### TypeScript
-
- [ ] No `any` types without justification
- [ ] Mock data uses actual types from source
- [ ] Factory functions have proper return types
-
-## Coverage Goals (Per File)
-
-For the current file being tested:
-
- [ ] 100% function coverage
- [ ] 100% statement coverage
- [ ] >95% branch coverage
- [ ] >95% line coverage
-
-## Post-Generation (Per File)
-
-**Run these checks after EACH test file, not just at the end:**
-
- [ ] Run `pnpm test -- path/to/file.spec.tsx` - **MUST PASS before next file**
- [ ] Fix any failures immediately
- [ ] Mark file as complete in todo list
- [ ] Only then proceed to next file
-
-### After All Files Complete
-
- [ ] Run full directory test: `pnpm test -- path/to/directory/`
- [ ] Check coverage report: `pnpm test -- --coverage`
- [ ] Run `pnpm lint:fix` on all test files
- [ ] Run `pnpm type-check:tsgo`
-
-## Common Issues to Watch
-
-### False Positives
-
-```typescript
-// ❌ Mock doesn't match actual behavior
-jest.mock('./Component', () => () => <div>Mocked</div>)
-
-// ✅ Mock matches actual conditional logic
-jest.mock('./Component', () => ({ isOpen }: any) =>
-  isOpen ? <div>Content</div> : null
-)
-```
-
-### State Leakage
-
-```typescript
-// ❌ Shared state not reset
-let mockState = false
-jest.mock('./useHook', () => () => mockState)
-
-// ✅ Reset in beforeEach
-beforeEach(() => {
-  mockState = false
-})
-```
-
-### Async Race Conditions
-
-```typescript
-// ❌ Not awaited
-it('loads data', () => {
-  render(<Component />)
-  expect(screen.getByText('Data')).toBeInTheDocument()
-})
-
-// ✅ Properly awaited
-it('loads data', async () => {
-  render(<Component />)
-  await waitFor(() => {
-    expect(screen.getByText('Data')).toBeInTheDocument()
-  })
-})
-```
-
-### Missing Edge Cases
-
-Always test these scenarios:
-
- `null` / `undefined` inputs
- Empty strings / arrays / objects
- Boundary values (0, -1, MAX_INT)
- Error states
- Loading states
- Disabled states
-
-## Quick Commands
-
-```bash
-# Run specific test
-pnpm test -- path/to/file.spec.tsx
-
-# Run with coverage
-pnpm test -- --coverage path/to/file.spec.tsx
-
-# Watch mode
-pnpm test -- --watch path/to/file.spec.tsx
-
-# Update snapshots (use sparingly)
-pnpm test -- -u path/to/file.spec.tsx
-
-# Analyze component
-pnpm analyze-component path/to/component.tsx
-
-# Review existing test
-pnpm analyze-component path/to/component.tsx --review
-```
--- a/.claude/skills/frontend-testing/SKILL.md
+++ b/.claude/skills/frontend-testing/SKILL.md
@ -1,321 +0,0 @@
---
-name: Dify Frontend Testing
-description: Generate Jest + React Testing Library tests for Dify frontend components, hooks, and utilities. Triggers on testing, spec files, coverage, Jest, RTL, unit tests, integration tests, or write/review test requests.
---
-
-# Dify Frontend Testing Skill
-
-This skill enables Claude to generate high-quality, comprehensive frontend tests for the Dify project following established conventions and best practices.
-
-> **⚠️ Authoritative Source**: This skill is derived from `web/testing/testing.md`. When in doubt, always refer to that document as the canonical specification.
-
-## When to Apply This Skill
-
-Apply this skill when the user:
-
- Asks to **write tests** for a component, hook, or utility
- Asks to **review existing tests** for completeness
- Mentions **Jest**, **React Testing Library**, **RTL**, or **spec files**
- Requests **test coverage** improvement
- Uses `pnpm analyze-component` output as context
- Mentions **testing**, **unit tests**, or **integration tests** for frontend code
- Wants to understand **testing patterns** in the Dify codebase
-
-**Do NOT apply** when:
-
- User is asking about backend/API tests (Python/pytest)
- User is asking about E2E tests (Playwright/Cypress)
- User is only asking conceptual questions without code context
-
-## Quick Reference
-
-### Tech Stack
-
-| Tool | Version | Purpose |
-|------|---------|---------|
-| Jest | 29.7 | Test runner |
-| React Testing Library | 16.0 | Component testing |
-| happy-dom | - | Test environment |
-| nock | 14.0 | HTTP mocking |
-| TypeScript | 5.x | Type safety |
-
-### Key Commands
-
-```bash
-# Run all tests
-pnpm test
-
-# Watch mode
-pnpm test -- --watch
-
-# Run specific file
-pnpm test -- path/to/file.spec.tsx
-
-# Generate coverage report
-pnpm test -- --coverage
-
-# Analyze component complexity
-pnpm analyze-component <path>
-
-# Review existing test
-pnpm analyze-component <path> --review
-```
-
-### File Naming
-
- Test files: `ComponentName.spec.tsx` (same directory as component)
- Integration tests: `web/__tests__/` directory
-
-## Test Structure Template
-
-```typescript
-import { render, screen, fireEvent, waitFor } from '@testing-library/react'
-import Component from './index'
-
-// ✅ Import real project components (DO NOT mock these)
-// import Loading from '@/app/components/base/loading'
-// import { ChildComponent } from './child-component'
-
-// ✅ Mock external dependencies only
-jest.mock('@/service/api')
-jest.mock('next/navigation', () => ({
-  useRouter: () => ({ push: jest.fn() }),
-  usePathname: () => '/test',
-}))
-
-// Shared state for mocks (if needed)
-let mockSharedState = false
-
-describe('ComponentName', () => {
-  beforeEach(() => {
-    jest.clearAllMocks()  // ✅ Reset mocks BEFORE each test
-    mockSharedState = false  // ✅ Reset shared state
-  })
-
-  // Rendering tests (REQUIRED)
-  describe('Rendering', () => {
-    it('should render without crashing', () => {
-      // Arrange
-      const props = { title: 'Test' }
-      
-      // Act
-      render(<Component {...props} />)
-      
-      // Assert
-      expect(screen.getByText('Test')).toBeInTheDocument()
-    })
-  })
-
-  // Props tests (REQUIRED)
-  describe('Props', () => {
-    it('should apply custom className', () => {
-      render(<Component className="custom" />)
-      expect(screen.getByRole('button')).toHaveClass('custom')
-    })
-  })
-
-  // User Interactions
-  describe('User Interactions', () => {
-    it('should handle click events', () => {
-      const handleClick = jest.fn()
-      render(<Component onClick={handleClick} />)
-      
-      fireEvent.click(screen.getByRole('button'))
-      
-      expect(handleClick).toHaveBeenCalledTimes(1)
-    })
-  })
-
-  // Edge Cases (REQUIRED)
-  describe('Edge Cases', () => {
-    it('should handle null data', () => {
-      render(<Component data={null} />)
-      expect(screen.getByText(/no data/i)).toBeInTheDocument()
-    })
-
-    it('should handle empty array', () => {
-      render(<Component items={[]} />)
-      expect(screen.getByText(/empty/i)).toBeInTheDocument()
-    })
-  })
-})
-```
-
-## Testing Workflow (CRITICAL)
-
-### ⚠️ Incremental Approach Required
-
-**NEVER generate all test files at once.** For complex components or multi-file directories:
-
-1. **Analyze & Plan**: List all files, order by complexity (simple → complex)
-1. **Process ONE at a time**: Write test → Run test → Fix if needed → Next
-1. **Verify before proceeding**: Do NOT continue to next file until current passes
-
-```
-For each file:
-  ┌────────────────────────────────────────┐
-  │ 1. Write test                          │
-  │ 2. Run: pnpm test -- <file>.spec.tsx   │
-  │ 3. PASS? → Mark complete, next file    │
-  │    FAIL? → Fix first, then continue    │
-  └────────────────────────────────────────┘
-```
-
-### Complexity-Based Order
-
-Process in this order for multi-file testing:
-
-1. 🟢 Utility functions (simplest)
-1. 🟢 Custom hooks
-1. 🟡 Simple components (presentational)
-1. 🟡 Medium components (state, effects)
-1. 🔴 Complex components (API, routing)
-1. 🔴 Integration tests (index files - last)
-
-### When to Refactor First
-
- **Complexity > 50**: Break into smaller pieces before testing
- **500+ lines**: Consider splitting before testing
- **Many dependencies**: Extract logic into hooks first
-
-> 📖 See `guides/workflow.md` for complete workflow details and todo list format.
-
-## Testing Strategy
-
-### Path-Level Testing (Directory Testing)
-
-When assigned to test a directory/path, test **ALL content** within that path:
-
- Test all components, hooks, utilities in the directory (not just `index` file)
- Use incremental approach: one file at a time, verify each before proceeding
- Goal: 100% coverage of ALL files in the directory
-
-### Integration Testing First
-
-**Prefer integration testing** when writing tests for a directory:
-
- ✅ **Import real project components** directly (including base components and siblings)
- ✅ **Only mock**: API services (`@/service/*`), `next/navigation`, complex context providers
- ❌ **DO NOT mock** base components (`@/app/components/base/*`)
- ❌ **DO NOT mock** sibling/child components in the same directory
-
-> See [Test Structure Template](#test-structure-template) for correct import/mock patterns.
-
-## Core Principles
-
-### 1. AAA Pattern (Arrange-Act-Assert)
-
-Every test should clearly separate:
-
- **Arrange**: Setup test data and render component
- **Act**: Perform user actions
- **Assert**: Verify expected outcomes
-
-### 2. Black-Box Testing
-
- Test observable behavior, not implementation details
- Use semantic queries (getByRole, getByLabelText)
- Avoid testing internal state directly
- **Prefer pattern matching over hardcoded strings** in assertions:
-
-```typescript
-// ❌ Avoid: hardcoded text assertions
-expect(screen.getByText('Loading...')).toBeInTheDocument()
-
-// ✅ Better: role-based queries
-expect(screen.getByRole('status')).toBeInTheDocument()
-
-// ✅ Better: pattern matching
-expect(screen.getByText(/loading/i)).toBeInTheDocument()
-```
-
-### 3. Single Behavior Per Test
-
-Each test verifies ONE user-observable behavior:
-
-```typescript
-// ✅ Good: One behavior
-it('should disable button when loading', () => {
-  render(<Button loading />)
-  expect(screen.getByRole('button')).toBeDisabled()
-})
-
-// ❌ Bad: Multiple behaviors
-it('should handle loading state', () => {
-  render(<Button loading />)
-  expect(screen.getByRole('button')).toBeDisabled()
-  expect(screen.getByText('Loading...')).toBeInTheDocument()
-  expect(screen.getByRole('button')).toHaveClass('loading')
-})
-```
-
-### 4. Semantic Naming
-
-Use `should <behavior> when <condition>`:
-
-```typescript
-it('should show error message when validation fails')
-it('should call onSubmit when form is valid')
-it('should disable input when isReadOnly is true')
-```
-
-## Required Test Scenarios
-
-### Always Required (All Components)
-
-1. **Rendering**: Component renders without crashing
-1. **Props**: Required props, optional props, default values
-1. **Edge Cases**: null, undefined, empty values, boundary conditions
-
-### Conditional (When Present)
-
-| Feature | Test Focus |
-|---------|-----------|
-| `useState` | Initial state, transitions, cleanup |
-| `useEffect` | Execution, dependencies, cleanup |
-| Event handlers | All onClick, onChange, onSubmit, keyboard |
-| API calls | Loading, success, error states |
-| Routing | Navigation, params, query strings |
-| `useCallback`/`useMemo` | Referential equality |
-| Context | Provider values, consumer behavior |
-| Forms | Validation, submission, error display |
-
-## Coverage Goals (Per File)
-
-For each test file generated, aim for:
-
- ✅ **100%** function coverage
- ✅ **100%** statement coverage
- ✅ **>95%** branch coverage
- ✅ **>95%** line coverage
-
-> **Note**: For multi-file directories, process one file at a time with full coverage each. See `guides/workflow.md`.
-
-## Detailed Guides
-
-For more detailed information, refer to:
-
- `guides/workflow.md` - **Incremental testing workflow** (MUST READ for multi-file testing)
- `guides/mocking.md` - Mock patterns and best practices
- `guides/async-testing.md` - Async operations and API calls
- `guides/domain-components.md` - Workflow, Dataset, Configuration testing
- `guides/common-patterns.md` - Frequently used testing patterns
-
-## Authoritative References
-
-### Primary Specification (MUST follow)
-
- **`web/testing/testing.md`** - The canonical testing specification. This skill is derived from this document.
-
-### Reference Examples in Codebase
-
- `web/utils/classnames.spec.ts` - Utility function tests
- `web/app/components/base/button/index.spec.tsx` - Component tests
- `web/__mocks__/provider-context.ts` - Mock factory example
-
-### Project Configuration
-
- `web/jest.config.ts` - Jest configuration
- `web/jest.setup.ts` - Test environment setup
- `web/testing/analyze-component.js` - Component analysis tool
- `web/__mocks__/react-i18next.ts` - Shared i18n mock (auto-loaded by Jest, no explicit mock needed; override locally only for custom translations)
--- a/.claude/skills/frontend-testing/guides/async-testing.md
+++ b/.claude/skills/frontend-testing/guides/async-testing.md
@ -1,345 +0,0 @@
-# Async Testing Guide
-
-## Core Async Patterns
-
-### 1. waitFor - Wait for Condition
-
-```typescript
-import { render, screen, waitFor } from '@testing-library/react'
-
-it('should load and display data', async () => {
-  render(<DataComponent />)
-  
-  // Wait for element to appear
-  await waitFor(() => {
-    expect(screen.getByText('Loaded Data')).toBeInTheDocument()
-  })
-})
-
-it('should hide loading spinner after load', async () => {
-  render(<DataComponent />)
-  
-  // Wait for element to disappear
-  await waitFor(() => {
-    expect(screen.queryByText('Loading...')).not.toBeInTheDocument()
-  })
-})
-```
-
-### 2. findBy\* - Async Queries
-
-```typescript
-it('should show user name after fetch', async () => {
-  render(<UserProfile />)
-  
-  // findBy returns a promise, auto-waits up to 1000ms
-  const userName = await screen.findByText('John Doe')
-  expect(userName).toBeInTheDocument()
-  
-  // findByRole with options
-  const button = await screen.findByRole('button', { name: /submit/i })
-  expect(button).toBeEnabled()
-})
-```
-
-### 3. userEvent for Async Interactions
-
-```typescript
-import userEvent from '@testing-library/user-event'
-
-it('should submit form', async () => {
-  const user = userEvent.setup()
-  const onSubmit = jest.fn()
-  
-  render(<Form onSubmit={onSubmit} />)
-  
-  // userEvent methods are async
-  await user.type(screen.getByLabelText('Email'), 'test@example.com')
-  await user.click(screen.getByRole('button', { name: /submit/i }))
-  
-  await waitFor(() => {
-    expect(onSubmit).toHaveBeenCalledWith({ email: 'test@example.com' })
-  })
-})
-```
-
-## Fake Timers
-
-### When to Use Fake Timers
-
- Testing components with `setTimeout`/`setInterval`
- Testing debounce/throttle behavior
- Testing animations or delayed transitions
- Testing polling or retry logic
-
-### Basic Fake Timer Setup
-
-```typescript
-describe('Debounced Search', () => {
-  beforeEach(() => {
-    jest.useFakeTimers()
-  })
-
-  afterEach(() => {
-    jest.useRealTimers()
-  })
-
-  it('should debounce search input', async () => {
-    const onSearch = jest.fn()
-    render(<SearchInput onSearch={onSearch} debounceMs={300} />)
-    
-    // Type in the input
-    fireEvent.change(screen.getByRole('textbox'), { target: { value: 'query' } })
-    
-    // Search not called immediately
-    expect(onSearch).not.toHaveBeenCalled()
-    
-    // Advance timers
-    jest.advanceTimersByTime(300)
-    
-    // Now search is called
-    expect(onSearch).toHaveBeenCalledWith('query')
-  })
-})
-```
-
-### Fake Timers with Async Code
-
-```typescript
-it('should retry on failure', async () => {
-  jest.useFakeTimers()
-  const fetchData = jest.fn()
-    .mockRejectedValueOnce(new Error('Network error'))
-    .mockResolvedValueOnce({ data: 'success' })
-  
-  render(<RetryComponent fetchData={fetchData} retryDelayMs={1000} />)
-  
-  // First call fails
-  await waitFor(() => {
-    expect(fetchData).toHaveBeenCalledTimes(1)
-  })
-  
-  // Advance timer for retry
-  jest.advanceTimersByTime(1000)
-  
-  // Second call succeeds
-  await waitFor(() => {
-    expect(fetchData).toHaveBeenCalledTimes(2)
-    expect(screen.getByText('success')).toBeInTheDocument()
-  })
-  
-  jest.useRealTimers()
-})
-```
-
-### Common Fake Timer Utilities
-
-```typescript
-// Run all pending timers
-jest.runAllTimers()
-
-// Run only pending timers (not new ones created during execution)
-jest.runOnlyPendingTimers()
-
-// Advance by specific time
-jest.advanceTimersByTime(1000)
-
-// Get current fake time
-jest.now()
-
-// Clear all timers
-jest.clearAllTimers()
-```
-
-## API Testing Patterns
-
-### Loading → Success → Error States
-
-```typescript
-describe('DataFetcher', () => {
-  beforeEach(() => {
-    jest.clearAllMocks()
-  })
-
-  it('should show loading state', () => {
-    mockedApi.fetchData.mockImplementation(() => new Promise(() => {})) // Never resolves
-    
-    render(<DataFetcher />)
-    
-    expect(screen.getByTestId('loading-spinner')).toBeInTheDocument()
-  })
-
-  it('should show data on success', async () => {
-    mockedApi.fetchData.mockResolvedValue({ items: ['Item 1', 'Item 2'] })
-    
-    render(<DataFetcher />)
-    
-    // Use findBy* for multiple async elements (better error messages than waitFor with multiple assertions)
-    const item1 = await screen.findByText('Item 1')
-    const item2 = await screen.findByText('Item 2')
-    expect(item1).toBeInTheDocument()
-    expect(item2).toBeInTheDocument()
-    
-    expect(screen.queryByTestId('loading-spinner')).not.toBeInTheDocument()
-  })
-
-  it('should show error on failure', async () => {
-    mockedApi.fetchData.mockRejectedValue(new Error('Failed to fetch'))
-    
-    render(<DataFetcher />)
-    
-    await waitFor(() => {
-      expect(screen.getByText(/failed to fetch/i)).toBeInTheDocument()
-    })
-  })
-
-  it('should retry on error', async () => {
-    mockedApi.fetchData.mockRejectedValue(new Error('Network error'))
-    
-    render(<DataFetcher />)
-    
-    await waitFor(() => {
-      expect(screen.getByRole('button', { name: /retry/i })).toBeInTheDocument()
-    })
-    
-    mockedApi.fetchData.mockResolvedValue({ items: ['Item 1'] })
-    fireEvent.click(screen.getByRole('button', { name: /retry/i }))
-    
-    await waitFor(() => {
-      expect(screen.getByText('Item 1')).toBeInTheDocument()
-    })
-  })
-})
-```
-
-### Testing Mutations
-
-```typescript
-it('should submit form and show success', async () => {
-  const user = userEvent.setup()
-  mockedApi.createItem.mockResolvedValue({ id: '1', name: 'New Item' })
-  
-  render(<CreateItemForm />)
-  
-  await user.type(screen.getByLabelText('Name'), 'New Item')
-  await user.click(screen.getByRole('button', { name: /create/i }))
-  
-  // Button should be disabled during submission
-  expect(screen.getByRole('button', { name: /creating/i })).toBeDisabled()
-  
-  await waitFor(() => {
-    expect(screen.getByText(/created successfully/i)).toBeInTheDocument()
-  })
-  
-  expect(mockedApi.createItem).toHaveBeenCalledWith({ name: 'New Item' })
-})
-```
-
-## useEffect Testing
-
-### Testing Effect Execution
-
-```typescript
-it('should fetch data on mount', async () => {
-  const fetchData = jest.fn().mockResolvedValue({ data: 'test' })
-  
-  render(<ComponentWithEffect fetchData={fetchData} />)
-  
-  await waitFor(() => {
-    expect(fetchData).toHaveBeenCalledTimes(1)
-  })
-})
-```
-
-### Testing Effect Dependencies
-
-```typescript
-it('should refetch when id changes', async () => {
-  const fetchData = jest.fn().mockResolvedValue({ data: 'test' })
-  
-  const { rerender } = render(<ComponentWithEffect id="1" fetchData={fetchData} />)
-  
-  await waitFor(() => {
-    expect(fetchData).toHaveBeenCalledWith('1')
-  })
-  
-  rerender(<ComponentWithEffect id="2" fetchData={fetchData} />)
-  
-  await waitFor(() => {
-    expect(fetchData).toHaveBeenCalledWith('2')
-    expect(fetchData).toHaveBeenCalledTimes(2)
-  })
-})
-```
-
-### Testing Effect Cleanup
-
-```typescript
-it('should cleanup subscription on unmount', () => {
-  const subscribe = jest.fn()
-  const unsubscribe = jest.fn()
-  subscribe.mockReturnValue(unsubscribe)
-  
-  const { unmount } = render(<SubscriptionComponent subscribe={subscribe} />)
-  
-  expect(subscribe).toHaveBeenCalledTimes(1)
-  
-  unmount()
-  
-  expect(unsubscribe).toHaveBeenCalledTimes(1)
-})
-```
-
-## Common Async Pitfalls
-
-### ❌ Don't: Forget to await
-
-```typescript
-// Bad - test may pass even if assertion fails
-it('should load data', () => {
-  render(<Component />)
-  waitFor(() => {
-    expect(screen.getByText('Data')).toBeInTheDocument()
-  })
-})
-
-// Good - properly awaited
-it('should load data', async () => {
-  render(<Component />)
-  await waitFor(() => {
-    expect(screen.getByText('Data')).toBeInTheDocument()
-  })
-})
-```
-
-### ❌ Don't: Use multiple assertions in single waitFor
-
-```typescript
-// Bad - if first assertion fails, won't know about second
-await waitFor(() => {
-  expect(screen.getByText('Title')).toBeInTheDocument()
-  expect(screen.getByText('Description')).toBeInTheDocument()
-})
-
-// Good - separate waitFor or use findBy
-const title = await screen.findByText('Title')
-const description = await screen.findByText('Description')
-expect(title).toBeInTheDocument()
-expect(description).toBeInTheDocument()
-```
-
-### ❌ Don't: Mix fake timers with real async
-
-```typescript
-// Bad - fake timers don't work well with real Promises
-jest.useFakeTimers()
-await waitFor(() => {
-  expect(screen.getByText('Data')).toBeInTheDocument()
-}) // May timeout!
-
-// Good - use runAllTimers or advanceTimersByTime
-jest.useFakeTimers()
-render(<Component />)
-jest.runAllTimers()
-expect(screen.getByText('Data')).toBeInTheDocument()
-```
--- a/.claude/skills/frontend-testing/guides/common-patterns.md
+++ b/.claude/skills/frontend-testing/guides/common-patterns.md
@ -1,449 +0,0 @@
-# Common Testing Patterns
-
-## Query Priority
-
-Use queries in this order (most to least preferred):
-
-```typescript
-// 1. getByRole - Most recommended (accessibility)
-screen.getByRole('button', { name: /submit/i })
-screen.getByRole('textbox', { name: /email/i })
-screen.getByRole('heading', { level: 1 })
-
-// 2. getByLabelText - Form fields
-screen.getByLabelText('Email address')
-screen.getByLabelText(/password/i)
-
-// 3. getByPlaceholderText - When no label
-screen.getByPlaceholderText('Search...')
-
-// 4. getByText - Non-interactive elements
-screen.getByText('Welcome to Dify')
-screen.getByText(/loading/i)
-
-// 5. getByDisplayValue - Current input value
-screen.getByDisplayValue('current value')
-
-// 6. getByAltText - Images
-screen.getByAltText('Company logo')
-
-// 7. getByTitle - Tooltip elements
-screen.getByTitle('Close')
-
-// 8. getByTestId - Last resort only!
-screen.getByTestId('custom-element')
-```
-
-## Event Handling Patterns
-
-### Click Events
-
-```typescript
-// Basic click
-fireEvent.click(screen.getByRole('button'))
-
-// With userEvent (preferred for realistic interaction)
-const user = userEvent.setup()
-await user.click(screen.getByRole('button'))
-
-// Double click
-await user.dblClick(screen.getByRole('button'))
-
-// Right click
-await user.pointer({ keys: '[MouseRight]', target: screen.getByRole('button') })
-```
-
-### Form Input
-
-```typescript
-const user = userEvent.setup()
-
-// Type in input
-await user.type(screen.getByRole('textbox'), 'Hello World')
-
-// Clear and type
-await user.clear(screen.getByRole('textbox'))
-await user.type(screen.getByRole('textbox'), 'New value')
-
-// Select option
-await user.selectOptions(screen.getByRole('combobox'), 'option-value')
-
-// Check checkbox
-await user.click(screen.getByRole('checkbox'))
-
-// Upload file
-const file = new File(['content'], 'test.pdf', { type: 'application/pdf' })
-await user.upload(screen.getByLabelText(/upload/i), file)
-```
-
-### Keyboard Events
-
-```typescript
-const user = userEvent.setup()
-
-// Press Enter
-await user.keyboard('{Enter}')
-
-// Press Escape
-await user.keyboard('{Escape}')
-
-// Keyboard shortcut
-await user.keyboard('{Control>}a{/Control}') // Ctrl+A
-
-// Tab navigation
-await user.tab()
-
-// Arrow keys
-await user.keyboard('{ArrowDown}')
-await user.keyboard('{ArrowUp}')
-```
-
-## Component State Testing
-
-### Testing State Transitions
-
-```typescript
-describe('Counter', () => {
-  it('should increment count', async () => {
-    const user = userEvent.setup()
-    render(<Counter initialCount={0} />)
-    
-    // Initial state
-    expect(screen.getByText('Count: 0')).toBeInTheDocument()
-    
-    // Trigger transition
-    await user.click(screen.getByRole('button', { name: /increment/i }))
-    
-    // New state
-    expect(screen.getByText('Count: 1')).toBeInTheDocument()
-  })
-})
-```
-
-### Testing Controlled Components
-
-```typescript
-describe('ControlledInput', () => {
-  it('should call onChange with new value', async () => {
-    const user = userEvent.setup()
-    const handleChange = jest.fn()
-    
-    render(<ControlledInput value="" onChange={handleChange} />)
-    
-    await user.type(screen.getByRole('textbox'), 'a')
-    
-    expect(handleChange).toHaveBeenCalledWith('a')
-  })
-
-  it('should display controlled value', () => {
-    render(<ControlledInput value="controlled" onChange={jest.fn()} />)
-    
-    expect(screen.getByRole('textbox')).toHaveValue('controlled')
-  })
-})
-```
-
-## Conditional Rendering Testing
-
-```typescript
-describe('ConditionalComponent', () => {
-  it('should show loading state', () => {
-    render(<DataDisplay isLoading={true} data={null} />)
-    
-    expect(screen.getByText(/loading/i)).toBeInTheDocument()
-    expect(screen.queryByTestId('data-content')).not.toBeInTheDocument()
-  })
-
-  it('should show error state', () => {
-    render(<DataDisplay isLoading={false} data={null} error="Failed to load" />)
-    
-    expect(screen.getByText(/failed to load/i)).toBeInTheDocument()
-  })
-
-  it('should show data when loaded', () => {
-    render(<DataDisplay isLoading={false} data={{ name: 'Test' }} />)
-    
-    expect(screen.getByText('Test')).toBeInTheDocument()
-  })
-
-  it('should show empty state when no data', () => {
-    render(<DataDisplay isLoading={false} data={[]} />)
-    
-    expect(screen.getByText(/no data/i)).toBeInTheDocument()
-  })
-})
-```
-
-## List Rendering Testing
-
-```typescript
-describe('ItemList', () => {
-  const items = [
-    { id: '1', name: 'Item 1' },
-    { id: '2', name: 'Item 2' },
-    { id: '3', name: 'Item 3' },
-  ]
-
-  it('should render all items', () => {
-    render(<ItemList items={items} />)
-    
-    expect(screen.getAllByRole('listitem')).toHaveLength(3)
-    items.forEach(item => {
-      expect(screen.getByText(item.name)).toBeInTheDocument()
-    })
-  })
-
-  it('should handle item selection', async () => {
-    const user = userEvent.setup()
-    const onSelect = jest.fn()
-    
-    render(<ItemList items={items} onSelect={onSelect} />)
-    
-    await user.click(screen.getByText('Item 2'))
-    
-    expect(onSelect).toHaveBeenCalledWith(items[1])
-  })
-
-  it('should handle empty list', () => {
-    render(<ItemList items={[]} />)
-    
-    expect(screen.getByText(/no items/i)).toBeInTheDocument()
-  })
-})
-```
-
-## Modal/Dialog Testing
-
-```typescript
-describe('Modal', () => {
-  it('should not render when closed', () => {
-    render(<Modal isOpen={false} onClose={jest.fn()} />)
-    
-    expect(screen.queryByRole('dialog')).not.toBeInTheDocument()
-  })
-
-  it('should render when open', () => {
-    render(<Modal isOpen={true} onClose={jest.fn()} />)
-    
-    expect(screen.getByRole('dialog')).toBeInTheDocument()
-  })
-
-  it('should call onClose when clicking overlay', async () => {
-    const user = userEvent.setup()
-    const handleClose = jest.fn()
-    
-    render(<Modal isOpen={true} onClose={handleClose} />)
-    
-    await user.click(screen.getByTestId('modal-overlay'))
-    
-    expect(handleClose).toHaveBeenCalled()
-  })
-
-  it('should call onClose when pressing Escape', async () => {
-    const user = userEvent.setup()
-    const handleClose = jest.fn()
-    
-    render(<Modal isOpen={true} onClose={handleClose} />)
-    
-    await user.keyboard('{Escape}')
-    
-    expect(handleClose).toHaveBeenCalled()
-  })
-
-  it('should trap focus inside modal', async () => {
-    const user = userEvent.setup()
-    
-    render(
-      <Modal isOpen={true} onClose={jest.fn()}>
-        <button>First</button>
-        <button>Second</button>
-      </Modal>
-    )
-    
-    // Focus should cycle within modal
-    await user.tab()
-    expect(screen.getByText('First')).toHaveFocus()
-    
-    await user.tab()
-    expect(screen.getByText('Second')).toHaveFocus()
-    
-    await user.tab()
-    expect(screen.getByText('First')).toHaveFocus() // Cycles back
-  })
-})
-```
-
-## Form Testing
-
-```typescript
-describe('LoginForm', () => {
-  it('should submit valid form', async () => {
-    const user = userEvent.setup()
-    const onSubmit = jest.fn()
-    
-    render(<LoginForm onSubmit={onSubmit} />)
-    
-    await user.type(screen.getByLabelText(/email/i), 'test@example.com')
-    await user.type(screen.getByLabelText(/password/i), 'password123')
-    await user.click(screen.getByRole('button', { name: /sign in/i }))
-    
-    expect(onSubmit).toHaveBeenCalledWith({
-      email: 'test@example.com',
-      password: 'password123',
-    })
-  })
-
-  it('should show validation errors', async () => {
-    const user = userEvent.setup()
-    
-    render(<LoginForm onSubmit={jest.fn()} />)
-    
-    // Submit empty form
-    await user.click(screen.getByRole('button', { name: /sign in/i }))
-    
-    expect(screen.getByText(/email is required/i)).toBeInTheDocument()
-    expect(screen.getByText(/password is required/i)).toBeInTheDocument()
-  })
-
-  it('should validate email format', async () => {
-    const user = userEvent.setup()
-    
-    render(<LoginForm onSubmit={jest.fn()} />)
-    
-    await user.type(screen.getByLabelText(/email/i), 'invalid-email')
-    await user.click(screen.getByRole('button', { name: /sign in/i }))
-    
-    expect(screen.getByText(/invalid email/i)).toBeInTheDocument()
-  })
-
-  it('should disable submit button while submitting', async () => {
-    const user = userEvent.setup()
-    const onSubmit = jest.fn(() => new Promise(resolve => setTimeout(resolve, 100)))
-    
-    render(<LoginForm onSubmit={onSubmit} />)
-    
-    await user.type(screen.getByLabelText(/email/i), 'test@example.com')
-    await user.type(screen.getByLabelText(/password/i), 'password123')
-    await user.click(screen.getByRole('button', { name: /sign in/i }))
-    
-    expect(screen.getByRole('button', { name: /signing in/i })).toBeDisabled()
-    
-    await waitFor(() => {
-      expect(screen.getByRole('button', { name: /sign in/i })).toBeEnabled()
-    })
-  })
-})
-```
-
-## Data-Driven Tests with test.each
-
-```typescript
-describe('StatusBadge', () => {
-  test.each([
-    ['success', 'bg-green-500'],
-    ['warning', 'bg-yellow-500'],
-    ['error', 'bg-red-500'],
-    ['info', 'bg-blue-500'],
-  ])('should apply correct class for %s status', (status, expectedClass) => {
-    render(<StatusBadge status={status} />)
-    
-    expect(screen.getByTestId('status-badge')).toHaveClass(expectedClass)
-  })
-
-  test.each([
-    { input: null, expected: 'Unknown' },
-    { input: undefined, expected: 'Unknown' },
-    { input: '', expected: 'Unknown' },
-    { input: 'invalid', expected: 'Unknown' },
-  ])('should show "Unknown" for invalid input: $input', ({ input, expected }) => {
-    render(<StatusBadge status={input} />)
-    
-    expect(screen.getByText(expected)).toBeInTheDocument()
-  })
-})
-```
-
-## Debugging Tips
-
-```typescript
-// Print entire DOM
-screen.debug()
-
-// Print specific element
-screen.debug(screen.getByRole('button'))
-
-// Log testing playground URL
-screen.logTestingPlaygroundURL()
-
-// Pretty print DOM
-import { prettyDOM } from '@testing-library/react'
-console.log(prettyDOM(screen.getByRole('dialog')))
-
-// Check available roles
-import { getRoles } from '@testing-library/react'
-console.log(getRoles(container))
-```
-
-## Common Mistakes to Avoid
-
-### ❌ Don't Use Implementation Details
-
-```typescript
-// Bad - testing implementation
-expect(component.state.isOpen).toBe(true)
-expect(wrapper.find('.internal-class').length).toBe(1)
-
-// Good - testing behavior
-expect(screen.getByRole('dialog')).toBeInTheDocument()
-```
-
-### ❌ Don't Forget Cleanup
-
-```typescript
-// Bad - may leak state between tests
-it('test 1', () => {
-  render(<Component />)
-})
-
-// Good - cleanup is automatic with RTL, but reset mocks
-beforeEach(() => {
-  jest.clearAllMocks()
-})
-```
-
-### ❌ Don't Use Exact String Matching (Prefer Black-Box Assertions)
-
-```typescript
-// ❌ Bad - hardcoded strings are brittle
-expect(screen.getByText('Submit Form')).toBeInTheDocument()
-expect(screen.getByText('Loading...')).toBeInTheDocument()
-
-// ✅ Good - role-based queries (most semantic)
-expect(screen.getByRole('button', { name: /submit/i })).toBeInTheDocument()
-expect(screen.getByRole('status')).toBeInTheDocument()
-
-// ✅ Good - pattern matching (flexible)
-expect(screen.getByText(/submit/i)).toBeInTheDocument()
-expect(screen.getByText(/loading/i)).toBeInTheDocument()
-
-// ✅ Good - test behavior, not exact UI text
-expect(screen.getByRole('button')).toBeDisabled()
-expect(screen.getByRole('alert')).toBeInTheDocument()
-```
-
-**Why prefer black-box assertions?**
-
- Text content may change (i18n, copy updates)
- Role-based queries test accessibility
- Pattern matching is resilient to minor changes
- Tests focus on behavior, not implementation details
-
-### ❌ Don't Assert on Absence Without Query
-
-```typescript
-// Bad - throws if not found
-expect(screen.getByText('Error')).not.toBeInTheDocument() // Error!
-
-// Good - use queryBy for absence assertions
-expect(screen.queryByText('Error')).not.toBeInTheDocument()
-```
--- a/.claude/skills/frontend-testing/guides/domain-components.md
+++ b/.claude/skills/frontend-testing/guides/domain-components.md
@ -1,523 +0,0 @@
-# Domain-Specific Component Testing
-
-This guide covers testing patterns for Dify's domain-specific components.
-
-## Workflow Components (`workflow/`)
-
-Workflow components handle node configuration, data flow, and graph operations.
-
-### Key Test Areas
-
-1. **Node Configuration**
-1. **Data Validation**
-1. **Variable Passing**
-1. **Edge Connections**
-1. **Error Handling**
-
-### Example: Node Configuration Panel
-
-```typescript
-import { render, screen, fireEvent, waitFor } from '@testing-library/react'
-import userEvent from '@testing-library/user-event'
-import NodeConfigPanel from './node-config-panel'
-import { createMockNode, createMockWorkflowContext } from '@/__mocks__/workflow'
-
-// Mock workflow context
-jest.mock('@/app/components/workflow/hooks', () => ({
-  useWorkflowStore: () => mockWorkflowStore,
-  useNodesInteractions: () => mockNodesInteractions,
-}))
-
-let mockWorkflowStore = {
-  nodes: [],
-  edges: [],
-  updateNode: jest.fn(),
-}
-
-let mockNodesInteractions = {
-  handleNodeSelect: jest.fn(),
-  handleNodeDelete: jest.fn(),
-}
-
-describe('NodeConfigPanel', () => {
-  beforeEach(() => {
-    jest.clearAllMocks()
-    mockWorkflowStore = {
-      nodes: [],
-      edges: [],
-      updateNode: jest.fn(),
-    }
-  })
-
-  describe('Node Configuration', () => {
-    it('should render node type selector', () => {
-      const node = createMockNode({ type: 'llm' })
-      render(<NodeConfigPanel node={node} />)
-      
-      expect(screen.getByLabelText(/model/i)).toBeInTheDocument()
-    })
-
-    it('should update node config on change', async () => {
-      const user = userEvent.setup()
-      const node = createMockNode({ type: 'llm' })
-      
-      render(<NodeConfigPanel node={node} />)
-      
-      await user.selectOptions(screen.getByLabelText(/model/i), 'gpt-4')
-      
-      expect(mockWorkflowStore.updateNode).toHaveBeenCalledWith(
-        node.id,
-        expect.objectContaining({ model: 'gpt-4' })
-      )
-    })
-  })
-
-  describe('Data Validation', () => {
-    it('should show error for invalid input', async () => {
-      const user = userEvent.setup()
-      const node = createMockNode({ type: 'code' })
-      
-      render(<NodeConfigPanel node={node} />)
-      
-      // Enter invalid code
-      const codeInput = screen.getByLabelText(/code/i)
-      await user.clear(codeInput)
-      await user.type(codeInput, 'invalid syntax {{{')
-      
-      await waitFor(() => {
-        expect(screen.getByText(/syntax error/i)).toBeInTheDocument()
-      })
-    })
-
-    it('should validate required fields', async () => {
-      const node = createMockNode({ type: 'http', data: { url: '' } })
-      
-      render(<NodeConfigPanel node={node} />)
-      
-      fireEvent.click(screen.getByRole('button', { name: /save/i }))
-      
-      await waitFor(() => {
-        expect(screen.getByText(/url is required/i)).toBeInTheDocument()
-      })
-    })
-  })
-
-  describe('Variable Passing', () => {
-    it('should display available variables from upstream nodes', () => {
-      const upstreamNode = createMockNode({
-        id: 'node-1',
-        type: 'start',
-        data: { outputs: [{ name: 'user_input', type: 'string' }] },
-      })
-      const currentNode = createMockNode({
-        id: 'node-2',
-        type: 'llm',
-      })
-      
-      mockWorkflowStore.nodes = [upstreamNode, currentNode]
-      mockWorkflowStore.edges = [{ source: 'node-1', target: 'node-2' }]
-      
-      render(<NodeConfigPanel node={currentNode} />)
-      
-      // Variable selector should show upstream variables
-      fireEvent.click(screen.getByRole('button', { name: /add variable/i }))
-      
-      expect(screen.getByText('user_input')).toBeInTheDocument()
-    })
-
-    it('should insert variable into prompt template', async () => {
-      const user = userEvent.setup()
-      const node = createMockNode({ type: 'llm' })
-      
-      render(<NodeConfigPanel node={node} />)
-      
-      // Click variable button
-      await user.click(screen.getByRole('button', { name: /insert variable/i }))
-      await user.click(screen.getByText('user_input'))
-      
-      const promptInput = screen.getByLabelText(/prompt/i)
-      expect(promptInput).toHaveValue(expect.stringContaining('{{user_input}}'))
-    })
-  })
-})
-```
-
-## Dataset Components (`dataset/`)
-
-Dataset components handle file uploads, data display, and search/filter operations.
-
-### Key Test Areas
-
-1. **File Upload**
-1. **File Type Validation**
-1. **Pagination**
-1. **Search & Filtering**
-1. **Data Format Handling**
-
-### Example: Document Uploader
-
-```typescript
-import { render, screen, fireEvent, waitFor } from '@testing-library/react'
-import userEvent from '@testing-library/user-event'
-import DocumentUploader from './document-uploader'
-
-jest.mock('@/service/datasets', () => ({
-  uploadDocument: jest.fn(),
-  parseDocument: jest.fn(),
-}))
-
-import * as datasetService from '@/service/datasets'
-const mockedService = datasetService as jest.Mocked<typeof datasetService>
-
-describe('DocumentUploader', () => {
-  beforeEach(() => {
-    jest.clearAllMocks()
-  })
-
-  describe('File Upload', () => {
-    it('should accept valid file types', async () => {
-      const user = userEvent.setup()
-      const onUpload = jest.fn()
-      mockedService.uploadDocument.mockResolvedValue({ id: 'doc-1' })
-      
-      render(<DocumentUploader onUpload={onUpload} />)
-      
-      const file = new File(['content'], 'test.pdf', { type: 'application/pdf' })
-      const input = screen.getByLabelText(/upload/i)
-      
-      await user.upload(input, file)
-      
-      await waitFor(() => {
-        expect(mockedService.uploadDocument).toHaveBeenCalledWith(
-          expect.any(FormData)
-        )
-      })
-    })
-
-    it('should reject invalid file types', async () => {
-      const user = userEvent.setup()
-      
-      render(<DocumentUploader />)
-      
-      const file = new File(['content'], 'test.exe', { type: 'application/x-msdownload' })
-      const input = screen.getByLabelText(/upload/i)
-      
-      await user.upload(input, file)
-      
-      expect(screen.getByText(/unsupported file type/i)).toBeInTheDocument()
-      expect(mockedService.uploadDocument).not.toHaveBeenCalled()
-    })
-
-    it('should show upload progress', async () => {
-      const user = userEvent.setup()
-      
-      // Mock upload with progress
-      mockedService.uploadDocument.mockImplementation(() => {
-        return new Promise((resolve) => {
-          setTimeout(() => resolve({ id: 'doc-1' }), 100)
-        })
-      })
-      
-      render(<DocumentUploader />)
-      
-      const file = new File(['content'], 'test.pdf', { type: 'application/pdf' })
-      await user.upload(screen.getByLabelText(/upload/i), file)
-      
-      expect(screen.getByRole('progressbar')).toBeInTheDocument()
-      
-      await waitFor(() => {
-        expect(screen.queryByRole('progressbar')).not.toBeInTheDocument()
-      })
-    })
-  })
-
-  describe('Error Handling', () => {
-    it('should handle upload failure', async () => {
-      const user = userEvent.setup()
-      mockedService.uploadDocument.mockRejectedValue(new Error('Upload failed'))
-      
-      render(<DocumentUploader />)
-      
-      const file = new File(['content'], 'test.pdf', { type: 'application/pdf' })
-      await user.upload(screen.getByLabelText(/upload/i), file)
-      
-      await waitFor(() => {
-        expect(screen.getByText(/upload failed/i)).toBeInTheDocument()
-      })
-    })
-
-    it('should allow retry after failure', async () => {
-      const user = userEvent.setup()
-      mockedService.uploadDocument
-        .mockRejectedValueOnce(new Error('Network error'))
-        .mockResolvedValueOnce({ id: 'doc-1' })
-      
-      render(<DocumentUploader />)
-      
-      const file = new File(['content'], 'test.pdf', { type: 'application/pdf' })
-      await user.upload(screen.getByLabelText(/upload/i), file)
-      
-      await waitFor(() => {
-        expect(screen.getByRole('button', { name: /retry/i })).toBeInTheDocument()
-      })
-      
-      await user.click(screen.getByRole('button', { name: /retry/i }))
-      
-      await waitFor(() => {
-        expect(screen.getByText(/uploaded successfully/i)).toBeInTheDocument()
-      })
-    })
-  })
-})
-```
-
-### Example: Document List with Pagination
-
-```typescript
-describe('DocumentList', () => {
-  describe('Pagination', () => {
-    it('should load first page on mount', async () => {
-      mockedService.getDocuments.mockResolvedValue({
-        data: [{ id: '1', name: 'Doc 1' }],
-        total: 50,
-        page: 1,
-        pageSize: 10,
-      })
-      
-      render(<DocumentList datasetId="ds-1" />)
-      
-      await waitFor(() => {
-        expect(screen.getByText('Doc 1')).toBeInTheDocument()
-      })
-      
-      expect(mockedService.getDocuments).toHaveBeenCalledWith('ds-1', { page: 1 })
-    })
-
-    it('should navigate to next page', async () => {
-      const user = userEvent.setup()
-      mockedService.getDocuments.mockResolvedValue({
-        data: [{ id: '1', name: 'Doc 1' }],
-        total: 50,
-        page: 1,
-        pageSize: 10,
-      })
-      
-      render(<DocumentList datasetId="ds-1" />)
-      
-      await waitFor(() => {
-        expect(screen.getByText('Doc 1')).toBeInTheDocument()
-      })
-      
-      mockedService.getDocuments.mockResolvedValue({
-        data: [{ id: '11', name: 'Doc 11' }],
-        total: 50,
-        page: 2,
-        pageSize: 10,
-      })
-      
-      await user.click(screen.getByRole('button', { name: /next/i }))
-      
-      await waitFor(() => {
-        expect(screen.getByText('Doc 11')).toBeInTheDocument()
-      })
-    })
-  })
-
-  describe('Search & Filtering', () => {
-    it('should filter by search query', async () => {
-      const user = userEvent.setup()
-      jest.useFakeTimers()
-      
-      render(<DocumentList datasetId="ds-1" />)
-      
-      await user.type(screen.getByPlaceholderText(/search/i), 'test query')
-      
-      // Debounce
-      jest.advanceTimersByTime(300)
-      
-      await waitFor(() => {
-        expect(mockedService.getDocuments).toHaveBeenCalledWith(
-          'ds-1',
-          expect.objectContaining({ search: 'test query' })
-        )
-      })
-      
-      jest.useRealTimers()
-    })
-  })
-})
-```
-
-## Configuration Components (`app/configuration/`, `config/`)
-
-Configuration components handle forms, validation, and data persistence.
-
-### Key Test Areas
-
-1. **Form Validation**
-1. **Save/Reset**
-1. **Required vs Optional Fields**
-1. **Configuration Persistence**
-1. **Error Feedback**
-
-### Example: App Configuration Form
-
-```typescript
-import { render, screen, fireEvent, waitFor } from '@testing-library/react'
-import userEvent from '@testing-library/user-event'
-import AppConfigForm from './app-config-form'
-
-jest.mock('@/service/apps', () => ({
-  updateAppConfig: jest.fn(),
-  getAppConfig: jest.fn(),
-}))
-
-import * as appService from '@/service/apps'
-const mockedService = appService as jest.Mocked<typeof appService>
-
-describe('AppConfigForm', () => {
-  const defaultConfig = {
-    name: 'My App',
-    description: '',
-    icon: 'default',
-    openingStatement: '',
-  }
-
-  beforeEach(() => {
-    jest.clearAllMocks()
-    mockedService.getAppConfig.mockResolvedValue(defaultConfig)
-  })
-
-  describe('Form Validation', () => {
-    it('should require app name', async () => {
-      const user = userEvent.setup()
-      
-      render(<AppConfigForm appId="app-1" />)
-      
-      await waitFor(() => {
-        expect(screen.getByLabelText(/name/i)).toHaveValue('My App')
-      })
-      
-      // Clear name field
-      await user.clear(screen.getByLabelText(/name/i))
-      await user.click(screen.getByRole('button', { name: /save/i }))
-      
-      expect(screen.getByText(/name is required/i)).toBeInTheDocument()
-      expect(mockedService.updateAppConfig).not.toHaveBeenCalled()
-    })
-
-    it('should validate name length', async () => {
-      const user = userEvent.setup()
-      
-      render(<AppConfigForm appId="app-1" />)
-      
-      await waitFor(() => {
-        expect(screen.getByLabelText(/name/i)).toBeInTheDocument()
-      })
-      
-      // Enter very long name
-      await user.clear(screen.getByLabelText(/name/i))
-      await user.type(screen.getByLabelText(/name/i), 'a'.repeat(101))
-      
-      expect(screen.getByText(/name must be less than 100 characters/i)).toBeInTheDocument()
-    })
-
-    it('should allow empty optional fields', async () => {
-      const user = userEvent.setup()
-      mockedService.updateAppConfig.mockResolvedValue({ success: true })
-      
-      render(<AppConfigForm appId="app-1" />)
-      
-      await waitFor(() => {
-        expect(screen.getByLabelText(/name/i)).toHaveValue('My App')
-      })
-      
-      // Leave description empty (optional)
-      await user.click(screen.getByRole('button', { name: /save/i }))
-      
-      await waitFor(() => {
-        expect(mockedService.updateAppConfig).toHaveBeenCalled()
-      })
-    })
-  })
-
-  describe('Save/Reset Functionality', () => {
-    it('should save configuration', async () => {
-      const user = userEvent.setup()
-      mockedService.updateAppConfig.mockResolvedValue({ success: true })
-      
-      render(<AppConfigForm appId="app-1" />)
-      
-      await waitFor(() => {
-        expect(screen.getByLabelText(/name/i)).toHaveValue('My App')
-      })
-      
-      await user.clear(screen.getByLabelText(/name/i))
-      await user.type(screen.getByLabelText(/name/i), 'Updated App')
-      await user.click(screen.getByRole('button', { name: /save/i }))
-      
-      await waitFor(() => {
-        expect(mockedService.updateAppConfig).toHaveBeenCalledWith(
-          'app-1',
-          expect.objectContaining({ name: 'Updated App' })
-        )
-      })
-      
-      expect(screen.getByText(/saved successfully/i)).toBeInTheDocument()
-    })
-
-    it('should reset to default values', async () => {
-      const user = userEvent.setup()
-      
-      render(<AppConfigForm appId="app-1" />)
-      
-      await waitFor(() => {
-        expect(screen.getByLabelText(/name/i)).toHaveValue('My App')
-      })
-      
-      // Make changes
-      await user.clear(screen.getByLabelText(/name/i))
-      await user.type(screen.getByLabelText(/name/i), 'Changed Name')
-      
-      // Reset
-      await user.click(screen.getByRole('button', { name: /reset/i }))
-      
-      expect(screen.getByLabelText(/name/i)).toHaveValue('My App')
-    })
-
-    it('should show unsaved changes warning', async () => {
-      const user = userEvent.setup()
-      
-      render(<AppConfigForm appId="app-1" />)
-      
-      await waitFor(() => {
-        expect(screen.getByLabelText(/name/i)).toHaveValue('My App')
-      })
-      
-      // Make changes
-      await user.type(screen.getByLabelText(/name/i), ' Updated')
-      
-      expect(screen.getByText(/unsaved changes/i)).toBeInTheDocument()
-    })
-  })
-
-  describe('Error Handling', () => {
-    it('should show error on save failure', async () => {
-      const user = userEvent.setup()
-      mockedService.updateAppConfig.mockRejectedValue(new Error('Server error'))
-      
-      render(<AppConfigForm appId="app-1" />)
-      
-      await waitFor(() => {
-        expect(screen.getByLabelText(/name/i)).toHaveValue('My App')
-      })
-      
-      await user.click(screen.getByRole('button', { name: /save/i }))
-      
-      await waitFor(() => {
-        expect(screen.getByText(/failed to save/i)).toBeInTheDocument()
-      })
-    })
-  })
-})
-```
--- a/.claude/skills/frontend-testing/guides/mocking.md
+++ b/.claude/skills/frontend-testing/guides/mocking.md
@ -1,363 +0,0 @@
-# Mocking Guide for Dify Frontend Tests
-
-## ⚠️ Important: What NOT to Mock
-
-### DO NOT Mock Base Components
-
-**Never mock components from `@/app/components/base/`** such as:
-
- `Loading`, `Spinner`
- `Button`, `Input`, `Select`
- `Tooltip`, `Modal`, `Dropdown`
- `Icon`, `Badge`, `Tag`
-
-**Why?**
-
- Base components will have their own dedicated tests
- Mocking them creates false positives (tests pass but real integration fails)
- Using real components tests actual integration behavior
-
-```typescript
-// ❌ WRONG: Don't mock base components
-jest.mock('@/app/components/base/loading', () => () => <div>Loading</div>)
-jest.mock('@/app/components/base/button', () => ({ children }: any) => <button>{children}</button>)
-
-// ✅ CORRECT: Import and use real base components
-import Loading from '@/app/components/base/loading'
-import Button from '@/app/components/base/button'
-// They will render normally in tests
-```
-
-### What TO Mock
-
-Only mock these categories:
-
-1. **API services** (`@/service/*`) - Network calls
-1. **Complex context providers** - When setup is too difficult
-1. **Third-party libraries with side effects** - `next/navigation`, external SDKs
-1. **i18n** - Always mock to return keys
-
-## Mock Placement
-
-| Location | Purpose |
-|----------|---------|
-| `web/__mocks__/` | Reusable mocks shared across multiple test files |
-| Test file | Test-specific mocks, inline with `jest.mock()` |
-
-## Essential Mocks
-
-### 1. i18n (Auto-loaded via Shared Mock)
-
-A shared mock is available at `web/__mocks__/react-i18next.ts` and is auto-loaded by Jest.
-**No explicit mock needed** for most tests - it returns translation keys as-is.
-
-For tests requiring custom translations, override the mock:
-
-```typescript
-jest.mock('react-i18next', () => ({
-  useTranslation: () => ({
-    t: (key: string) => {
-      const translations: Record<string, string> = {
-        'my.custom.key': 'Custom translation',
-      }
-      return translations[key] || key
-    },
-  }),
-}))
-```
-
-### 2. Next.js Router
-
-```typescript
-const mockPush = jest.fn()
-const mockReplace = jest.fn()
-
-jest.mock('next/navigation', () => ({
-  useRouter: () => ({
-    push: mockPush,
-    replace: mockReplace,
-    back: jest.fn(),
-    prefetch: jest.fn(),
-  }),
-  usePathname: () => '/current-path',
-  useSearchParams: () => new URLSearchParams('?key=value'),
-}))
-
-describe('Component', () => {
-  beforeEach(() => {
-    jest.clearAllMocks()
-  })
-
-  it('should navigate on click', () => {
-    render(<Component />)
-    fireEvent.click(screen.getByRole('button'))
-    expect(mockPush).toHaveBeenCalledWith('/expected-path')
-  })
-})
-```
-
-### 3. Portal Components (with Shared State)
-
-```typescript
-// ⚠️ Important: Use shared state for components that depend on each other
-let mockPortalOpenState = false
-
-jest.mock('@/app/components/base/portal-to-follow-elem', () => ({
-  PortalToFollowElem: ({ children, open, ...props }: any) => {
-    mockPortalOpenState = open || false  // Update shared state
-    return <div data-testid="portal" data-open={open}>{children}</div>
-  },
-  PortalToFollowElemContent: ({ children }: any) => {
-    // ✅ Matches actual: returns null when portal is closed
-    if (!mockPortalOpenState) return null
-    return <div data-testid="portal-content">{children}</div>
-  },
-  PortalToFollowElemTrigger: ({ children }: any) => (
-    <div data-testid="portal-trigger">{children}</div>
-  ),
-}))
-
-describe('Component', () => {
-  beforeEach(() => {
-    jest.clearAllMocks()
-    mockPortalOpenState = false  // ✅ Reset shared state
-  })
-})
-```
-
-### 4. API Service Mocks
-
-```typescript
-import * as api from '@/service/api'
-
-jest.mock('@/service/api')
-
-const mockedApi = api as jest.Mocked<typeof api>
-
-describe('Component', () => {
-  beforeEach(() => {
-    jest.clearAllMocks()
-    
-    // Setup default mock implementation
-    mockedApi.fetchData.mockResolvedValue({ data: [] })
-  })
-
-  it('should show data on success', async () => {
-    mockedApi.fetchData.mockResolvedValue({ data: [{ id: 1 }] })
-    
-    render(<Component />)
-    
-    await waitFor(() => {
-      expect(screen.getByText('1')).toBeInTheDocument()
-    })
-  })
-
-  it('should show error on failure', async () => {
-    mockedApi.fetchData.mockRejectedValue(new Error('Network error'))
-    
-    render(<Component />)
-    
-    await waitFor(() => {
-      expect(screen.getByText(/error/i)).toBeInTheDocument()
-    })
-  })
-})
-```
-
-### 5. HTTP Mocking with Nock
-
-```typescript
-import nock from 'nock'
-
-const GITHUB_HOST = 'https://api.github.com'
-const GITHUB_PATH = '/repos/owner/repo'
-
-const mockGithubApi = (status: number, body: Record<string, unknown>, delayMs = 0) => {
-  return nock(GITHUB_HOST)
-    .get(GITHUB_PATH)
-    .delay(delayMs)
-    .reply(status, body)
-}
-
-describe('GithubComponent', () => {
-  afterEach(() => {
-    nock.cleanAll()
-  })
-
-  it('should display repo info', async () => {
-    mockGithubApi(200, { name: 'dify', stars: 1000 })
-    
-    render(<GithubComponent />)
-    
-    await waitFor(() => {
-      expect(screen.getByText('dify')).toBeInTheDocument()
-    })
-  })
-
-  it('should handle API error', async () => {
-    mockGithubApi(500, { message: 'Server error' })
-    
-    render(<GithubComponent />)
-    
-    await waitFor(() => {
-      expect(screen.getByText(/error/i)).toBeInTheDocument()
-    })
-  })
-})
-```
-
-### 6. Context Providers
-
-```typescript
-import { ProviderContext } from '@/context/provider-context'
-import { createMockProviderContextValue, createMockPlan } from '@/__mocks__/provider-context'
-
-describe('Component with Context', () => {
-  it('should render for free plan', () => {
-    const mockContext = createMockPlan('sandbox')
-    
-    render(
-      <ProviderContext.Provider value={mockContext}>
-        <Component />
-      </ProviderContext.Provider>
-    )
-    
-    expect(screen.getByText('Upgrade')).toBeInTheDocument()
-  })
-
-  it('should render for pro plan', () => {
-    const mockContext = createMockPlan('professional')
-    
-    render(
-      <ProviderContext.Provider value={mockContext}>
-        <Component />
-      </ProviderContext.Provider>
-    )
-    
-    expect(screen.queryByText('Upgrade')).not.toBeInTheDocument()
-  })
-})
-```
-
-### 7. SWR / React Query
-
-```typescript
-// SWR
-jest.mock('swr', () => ({
-  __esModule: true,
-  default: jest.fn(),
-}))
-
-import useSWR from 'swr'
-const mockedUseSWR = useSWR as jest.Mock
-
-describe('Component with SWR', () => {
-  it('should show loading state', () => {
-    mockedUseSWR.mockReturnValue({
-      data: undefined,
-      error: undefined,
-      isLoading: true,
-    })
-    
-    render(<Component />)
-    expect(screen.getByText(/loading/i)).toBeInTheDocument()
-  })
-})
-
-// React Query
-import { QueryClient, QueryClientProvider } from '@tanstack/react-query'
-
-const createTestQueryClient = () => new QueryClient({
-  defaultOptions: {
-    queries: { retry: false },
-    mutations: { retry: false },
-  },
-})
-
-const renderWithQueryClient = (ui: React.ReactElement) => {
-  const queryClient = createTestQueryClient()
-  return render(
-    <QueryClientProvider client={queryClient}>
-      {ui}
-    </QueryClientProvider>
-  )
-}
-```
-
-## Mock Best Practices
-
-### ✅ DO
-
-1. **Use real base components** - Import from `@/app/components/base/` directly
-1. **Use real project components** - Prefer importing over mocking
-1. **Reset mocks in `beforeEach`**, not `afterEach`
-1. **Match actual component behavior** in mocks (when mocking is necessary)
-1. **Use factory functions** for complex mock data
-1. **Import actual types** for type safety
-1. **Reset shared mock state** in `beforeEach`
-
-### ❌ DON'T
-
-1. **Don't mock base components** (`Loading`, `Button`, `Tooltip`, etc.)
-1. Don't mock components you can import directly
-1. Don't create overly simplified mocks that miss conditional logic
-1. Don't forget to clean up nock after each test
-1. Don't use `any` types in mocks without necessity
-
-### Mock Decision Tree
-
-```
-Need to use a component in test?
-│
-├─ Is it from @/app/components/base/*?
-│  └─ YES → Import real component, DO NOT mock
-│
-├─ Is it a project component?
-│  └─ YES → Prefer importing real component
-│           Only mock if setup is extremely complex
-│
-├─ Is it an API service (@/service/*)?
-│  └─ YES → Mock it
-│
-├─ Is it a third-party lib with side effects?
-│  └─ YES → Mock it (next/navigation, external SDKs)
-│
-└─ Is it i18n?
-   └─ YES → Uses shared mock (auto-loaded). Override only for custom translations
-```
-
-## Factory Function Pattern
-
-```typescript
-// __mocks__/data-factories.ts
-import type { User, Project } from '@/types'
-
-export const createMockUser = (overrides: Partial<User> = {}): User => ({
-  id: 'user-1',
-  name: 'Test User',
-  email: 'test@example.com',
-  role: 'member',
-  createdAt: new Date().toISOString(),
-  ...overrides,
-})
-
-export const createMockProject = (overrides: Partial<Project> = {}): Project => ({
-  id: 'project-1',
-  name: 'Test Project',
-  description: 'A test project',
-  owner: createMockUser(),
-  members: [],
-  createdAt: new Date().toISOString(),
-  ...overrides,
-})
-
-// Usage in tests
-it('should display project owner', () => {
-  const project = createMockProject({
-    owner: createMockUser({ name: 'John Doe' }),
-  })
-  
-  render(<ProjectCard project={project} />)
-  expect(screen.getByText('John Doe')).toBeInTheDocument()
-})
-```
--- a/.claude/skills/frontend-testing/guides/workflow.md
+++ b/.claude/skills/frontend-testing/guides/workflow.md
@ -1,269 +0,0 @@
-# Testing Workflow Guide
-
-This guide defines the workflow for generating tests, especially for complex components or directories with multiple files.
-
-## Scope Clarification
-
-This guide addresses **multi-file workflow** (how to process multiple test files). For coverage requirements within a single test file, see `web/testing/testing.md` § Coverage Goals.
-
-| Scope | Rule |
-|-------|------|
-| **Single file** | Complete coverage in one generation (100% function, >95% branch) |
-| **Multi-file directory** | Process one file at a time, verify each before proceeding |
-
-## ⚠️ Critical Rule: Incremental Approach for Multi-File Testing
-
-When testing a **directory with multiple files**, **NEVER generate all test files at once.** Use an incremental, verify-as-you-go approach.
-
-### Why Incremental?
-
-| Batch Approach (❌) | Incremental Approach (✅) |
-|---------------------|---------------------------|
-| Generate 5+ tests at once | Generate 1 test at a time |
-| Run tests only at the end | Run test immediately after each file |
-| Multiple failures compound | Single point of failure, easy to debug |
-| Hard to identify root cause | Clear cause-effect relationship |
-| Mock issues affect many files | Mock issues caught early |
-| Messy git history | Clean, atomic commits possible |
-
-## Single File Workflow
-
-When testing a **single component, hook, or utility**:
-
-```
-1. Read source code completely
-2. Run `pnpm analyze-component <path>` (if available)
-3. Check complexity score and features detected
-4. Write the test file
-5. Run test: `pnpm test -- <file>.spec.tsx`
-6. Fix any failures
-7. Verify coverage meets goals (100% function, >95% branch)
-```
-
-## Directory/Multi-File Workflow (MUST FOLLOW)
-
-When testing a **directory or multiple files**, follow this strict workflow:
-
-### Step 1: Analyze and Plan
-
-1. **List all files** that need tests in the directory
-1. **Categorize by complexity**:
-   - 🟢 **Simple**: Utility functions, simple hooks, presentational components
-   - 🟡 **Medium**: Components with state, effects, or event handlers
-   - 🔴 **Complex**: Components with API calls, routing, or many dependencies
-1. **Order by dependency**: Test dependencies before dependents
-1. **Create a todo list** to track progress
-
-### Step 2: Determine Processing Order
-
-Process files in this recommended order:
-
-```
-1. Utility functions (simplest, no React)
-2. Custom hooks (isolated logic)
-3. Simple presentational components (few/no props)
-4. Medium complexity components (state, effects)
-5. Complex components (API, routing, many deps)
-6. Container/index components (integration tests - last)
-```
-
-**Rationale**:
-
- Simpler files help establish mock patterns
- Hooks used by components should be tested first
- Integration tests (index files) depend on child components working
-
-### Step 3: Process Each File Incrementally
-
-**For EACH file in the ordered list:**
-
-```
-┌─────────────────────────────────────────────┐
-│  1. Write test file                         │
-│  2. Run: pnpm test -- <file>.spec.tsx       │
-│  3. If FAIL → Fix immediately, re-run       │
-│  4. If PASS → Mark complete in todo list    │
-│  5. ONLY THEN proceed to next file          │
-└─────────────────────────────────────────────┘
-```
-
-**DO NOT proceed to the next file until the current one passes.**
-
-### Step 4: Final Verification
-
-After all individual tests pass:
-
-```bash
-# Run all tests in the directory together
-pnpm test -- path/to/directory/
-
-# Check coverage
-pnpm test -- --coverage path/to/directory/
-```
-
-## Component Complexity Guidelines
-
-Use `pnpm analyze-component <path>` to assess complexity before testing.
-
-### 🔴 Very Complex Components (Complexity > 50)
-
-**Consider refactoring BEFORE testing:**
-
- Break component into smaller, testable pieces
- Extract complex logic into custom hooks
- Separate container and presentational layers
-
-**If testing as-is:**
-
- Use integration tests for complex workflows
- Use `test.each()` for data-driven testing
- Multiple `describe` blocks for organization
- Consider testing major sections separately
-
-### 🟡 Medium Complexity (Complexity 30-50)
-
- Group related tests in `describe` blocks
- Test integration scenarios between internal parts
- Focus on state transitions and side effects
- Use helper functions to reduce test complexity
-
-### 🟢 Simple Components (Complexity < 30)
-
- Standard test structure
- Focus on props, rendering, and edge cases
- Usually straightforward to test
-
-### 📏 Large Files (500+ lines)
-
-Regardless of complexity score:
-
- **Strongly consider refactoring** before testing
- If testing as-is, test major sections separately
- Create helper functions for test setup
- May need multiple test files
-
-## Todo List Format
-
-When testing multiple files, use a todo list like this:
-
-```
-Testing: path/to/directory/
-
-Ordered by complexity (simple → complex):
-
-☐ utils/helper.ts           [utility, simple]
-☐ hooks/use-custom-hook.ts  [hook, simple]
-☐ empty-state.tsx           [component, simple]
-☐ item-card.tsx             [component, medium]
-☐ list.tsx                  [component, complex]
-☐ index.tsx                 [integration]
-
-Progress: 0/6 complete
-```
-
-Update status as you complete each:
-
- ☐ → ⏳ (in progress)
- ⏳ → ✅ (complete and verified)
- ⏳ → ❌ (blocked, needs attention)
-
-## When to Stop and Verify
-
-**Always run tests after:**
-
- Completing a test file
- Making changes to fix a failure
- Modifying shared mocks
- Updating test utilities or helpers
-
-**Signs you should pause:**
-
- More than 2 consecutive test failures
- Mock-related errors appearing
- Unclear why a test is failing
- Test passing but coverage unexpectedly low
-
-## Common Pitfalls to Avoid
-
-### ❌ Don't: Generate Everything First
-
-```
-# BAD: Writing all files then testing
-Write component-a.spec.tsx
-Write component-b.spec.tsx  
-Write component-c.spec.tsx
-Write component-d.spec.tsx
-Run pnpm test  ← Multiple failures, hard to debug
-```
-
-### ✅ Do: Verify Each Step
-
-```
-# GOOD: Incremental with verification
-Write component-a.spec.tsx
-Run pnpm test -- component-a.spec.tsx ✅
-Write component-b.spec.tsx
-Run pnpm test -- component-b.spec.tsx ✅
-...continue...
-```
-
-### ❌ Don't: Skip Verification for "Simple" Components
-
-Even simple components can have:
-
- Import errors
- Missing mock setup
- Incorrect assumptions about props
-
-**Always verify, regardless of perceived simplicity.**
-
-### ❌ Don't: Continue When Tests Fail
-
-Failing tests compound:
-
- A mock issue in file A affects files B, C, D
- Fixing A later requires revisiting all dependent tests
- Time wasted on debugging cascading failures
-
-**Fix failures immediately before proceeding.**
-
-## Integration with Claude's Todo Feature
-
-When using Claude for multi-file testing:
-
-1. **Ask Claude to create a todo list** before starting
-1. **Request one file at a time** or ensure Claude processes incrementally
-1. **Verify each test passes** before asking for the next
-1. **Mark todos complete** as you progress
-
-Example prompt:
-
-```
-Test all components in `path/to/directory/`.
-First, analyze the directory and create a todo list ordered by complexity.
-Then, process ONE file at a time, waiting for my confirmation that tests pass
-before proceeding to the next.
-```
-
-## Summary Checklist
-
-Before starting multi-file testing:
-
- [ ] Listed all files needing tests
- [ ] Ordered by complexity (simple → complex)
- [ ] Created todo list for tracking
- [ ] Understand dependencies between files
-
-During testing:
-
- [ ] Processing ONE file at a time
- [ ] Running tests after EACH file
- [ ] Fixing failures BEFORE proceeding
- [ ] Updating todo list progress
-
-After completion:
-
- [ ] All individual tests pass
- [ ] Full directory test run passes
- [ ] Coverage goals met
- [ ] Todo list shows all complete
--- a/.claude/skills/frontend-testing/templates/component-test.template.tsx
+++ b/.claude/skills/frontend-testing/templates/component-test.template.tsx
@ -1,296 +0,0 @@
-/**
- * Test Template for React Components
- *
- * WHY THIS STRUCTURE?
- * - Organized sections make tests easy to navigate and maintain
- * - Mocks at top ensure consistent test isolation
- * - Factory functions reduce duplication and improve readability
- * - describe blocks group related scenarios for better debugging
- *
- * INSTRUCTIONS:
- * 1. Replace `ComponentName` with your component name
- * 2. Update import path
- * 3. Add/remove test sections based on component features (use analyze-component)
- * 4. Follow AAA pattern: Arrange → Act → Assert
- *
- * RUN FIRST: pnpm analyze-component <path> to identify required test scenarios
- */
-
-import { render, screen, fireEvent, waitFor } from '@testing-library/react'
-import userEvent from '@testing-library/user-event'
-// import ComponentName from './index'
-
-// ============================================================================
-// Mocks
-// ============================================================================
-// WHY: Mocks must be hoisted to top of file (Jest requirement).
-// They run BEFORE imports, so keep them before component imports.
-
-// i18n (automatically mocked)
-// WHY: Shared mock at web/__mocks__/react-i18next.ts is auto-loaded by Jest
-// No explicit mock needed - it returns translation keys as-is
-// Override only if custom translations are required:
-// jest.mock('react-i18next', () => ({
-//   useTranslation: () => ({
-//     t: (key: string) => {
-//       const customTranslations: Record<string, string> = {
-//         'my.custom.key': 'Custom Translation',
-//       }
-//       return customTranslations[key] || key
-//     },
-//   }),
-// }))
-
-// Router (if component uses useRouter, usePathname, useSearchParams)
-// WHY: Isolates tests from Next.js routing, enables testing navigation behavior
-// const mockPush = jest.fn()
-// jest.mock('next/navigation', () => ({
-//   useRouter: () => ({ push: mockPush }),
-//   usePathname: () => '/test-path',
-// }))
-
-// API services (if component fetches data)
-// WHY: Prevents real network calls, enables testing all states (loading/success/error)
-// jest.mock('@/service/api')
-// import * as api from '@/service/api'
-// const mockedApi = api as jest.Mocked<typeof api>
-
-// Shared mock state (for portal/dropdown components)
-// WHY: Portal components like PortalToFollowElem need shared state between
-// parent and child mocks to correctly simulate open/close behavior
-// let mockOpenState = false
-
-// ============================================================================
-// Test Data Factories
-// ============================================================================
-// WHY FACTORIES?
-// - Avoid hard-coded test data scattered across tests
-// - Easy to create variations with overrides
-// - Type-safe when using actual types from source
-// - Single source of truth for default test values
-
-// const createMockProps = (overrides = {}) => ({
-//   // Default props that make component render successfully
-//   ...overrides,
-// })
-
-// const createMockItem = (overrides = {}) => ({
-//   id: 'item-1',
-//   name: 'Test Item',
-//   ...overrides,
-// })
-
-// ============================================================================
-// Test Helpers
-// ============================================================================
-
-// const renderComponent = (props = {}) => {
-//   return render(<ComponentName {...createMockProps(props)} />)
-// }
-
-// ============================================================================
-// Tests
-// ============================================================================
-
-describe('ComponentName', () => {
-  // WHY beforeEach with clearAllMocks?
-  // - Ensures each test starts with clean slate
-  // - Prevents mock call history from leaking between tests
-  // - MUST be beforeEach (not afterEach) to reset BEFORE assertions like toHaveBeenCalledTimes
-  beforeEach(() => {
-    jest.clearAllMocks()
-    // Reset shared mock state if used (CRITICAL for portal/dropdown tests)
-    // mockOpenState = false
-  })
-
-  // --------------------------------------------------------------------------
-  // Rendering Tests (REQUIRED - Every component MUST have these)
-  // --------------------------------------------------------------------------
-  // WHY: Catches import errors, missing providers, and basic render issues
-  describe('Rendering', () => {
-    it('should render without crashing', () => {
-      // Arrange - Setup data and mocks
-      // const props = createMockProps()
-
-      // Act - Render the component
-      // render(<ComponentName {...props} />)
-
-      // Assert - Verify expected output
-      // Prefer getByRole for accessibility; it's what users "see"
-      // expect(screen.getByRole('...')).toBeInTheDocument()
-    })
-
-    it('should render with default props', () => {
-      // WHY: Verifies component works without optional props
-      // render(<ComponentName />)
-      // expect(screen.getByText('...')).toBeInTheDocument()
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Props Tests (REQUIRED - Every component MUST test prop behavior)
-  // --------------------------------------------------------------------------
-  // WHY: Props are the component's API contract. Test them thoroughly.
-  describe('Props', () => {
-    it('should apply custom className', () => {
-      // WHY: Common pattern in Dify - components should merge custom classes
-      // render(<ComponentName className="custom-class" />)
-      // expect(screen.getByTestId('component')).toHaveClass('custom-class')
-    })
-
-    it('should use default values for optional props', () => {
-      // WHY: Verifies TypeScript defaults work at runtime
-      // render(<ComponentName />)
-      // expect(screen.getByRole('...')).toHaveAttribute('...', 'default-value')
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // User Interactions (if component has event handlers - on*, handle*)
-  // --------------------------------------------------------------------------
-  // WHY: Event handlers are core functionality. Test from user's perspective.
-  describe('User Interactions', () => {
-    it('should call onClick when clicked', async () => {
-      // WHY userEvent over fireEvent?
-      // - userEvent simulates real user behavior (focus, hover, then click)
-      // - fireEvent is lower-level, doesn't trigger all browser events
-      // const user = userEvent.setup()
-      // const handleClick = jest.fn()
-      // render(<ComponentName onClick={handleClick} />)
-      //
-      // await user.click(screen.getByRole('button'))
-      //
-      // expect(handleClick).toHaveBeenCalledTimes(1)
-    })
-
-    it('should call onChange when value changes', async () => {
-      // const user = userEvent.setup()
-      // const handleChange = jest.fn()
-      // render(<ComponentName onChange={handleChange} />)
-      //
-      // await user.type(screen.getByRole('textbox'), 'new value')
-      //
-      // expect(handleChange).toHaveBeenCalled()
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // State Management (if component uses useState/useReducer)
-  // --------------------------------------------------------------------------
-  // WHY: Test state through observable UI changes, not internal state values
-  describe('State Management', () => {
-    it('should update state on interaction', async () => {
-      // WHY test via UI, not state?
-      // - State is implementation detail; UI is what users see
-      // - If UI works correctly, state must be correct
-      // const user = userEvent.setup()
-      // render(<ComponentName />)
-      //
-      // // Initial state - verify what user sees
-      // expect(screen.getByText('Initial')).toBeInTheDocument()
-      //
-      // // Trigger state change via user action
-      // await user.click(screen.getByRole('button'))
-      //
-      // // New state - verify UI updated
-      // expect(screen.getByText('Updated')).toBeInTheDocument()
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Async Operations (if component fetches data - useSWR, useQuery, fetch)
-  // --------------------------------------------------------------------------
-  // WHY: Async operations have 3 states users experience: loading, success, error
-  describe('Async Operations', () => {
-    it('should show loading state', () => {
-      // WHY never-resolving promise?
-      // - Keeps component in loading state for assertion
-      // - Alternative: use fake timers
-      // mockedApi.fetchData.mockImplementation(() => new Promise(() => {}))
-      // render(<ComponentName />)
-      //
-      // expect(screen.getByText(/loading/i)).toBeInTheDocument()
-    })
-
-    it('should show data on success', async () => {
-      // WHY waitFor?
-      // - Component updates asynchronously after fetch resolves
-      // - waitFor retries assertion until it passes or times out
-      // mockedApi.fetchData.mockResolvedValue({ items: ['Item 1'] })
-      // render(<ComponentName />)
-      //
-      // await waitFor(() => {
-      //   expect(screen.getByText('Item 1')).toBeInTheDocument()
-      // })
-    })
-
-    it('should show error on failure', async () => {
-      // mockedApi.fetchData.mockRejectedValue(new Error('Network error'))
-      // render(<ComponentName />)
-      //
-      // await waitFor(() => {
-      //   expect(screen.getByText(/error/i)).toBeInTheDocument()
-      // })
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Edge Cases (REQUIRED - Every component MUST handle edge cases)
-  // --------------------------------------------------------------------------
-  // WHY: Real-world data is messy. Components must handle:
-  // - Null/undefined from API failures or optional fields
-  // - Empty arrays/strings from user clearing data
-  // - Boundary values (0, MAX_INT, special characters)
-  describe('Edge Cases', () => {
-    it('should handle null value', () => {
-      // WHY test null specifically?
-      // - API might return null for missing data
-      // - Prevents "Cannot read property of null" in production
-      // render(<ComponentName value={null} />)
-      // expect(screen.getByText(/no data/i)).toBeInTheDocument()
-    })
-
-    it('should handle undefined value', () => {
-      // WHY test undefined separately from null?
-      // - TypeScript treats them differently
-      // - Optional props are undefined, not null
-      // render(<ComponentName value={undefined} />)
-      // expect(screen.getByText(/no data/i)).toBeInTheDocument()
-    })
-
-    it('should handle empty array', () => {
-      // WHY: Empty state often needs special UI (e.g., "No items yet")
-      // render(<ComponentName items={[]} />)
-      // expect(screen.getByText(/empty/i)).toBeInTheDocument()
-    })
-
-    it('should handle empty string', () => {
-      // WHY: Empty strings are truthy in JS but visually empty
-      // render(<ComponentName text="" />)
-      // expect(screen.getByText(/placeholder/i)).toBeInTheDocument()
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Accessibility (optional but recommended for Dify's enterprise users)
-  // --------------------------------------------------------------------------
-  // WHY: Dify has enterprise customers who may require accessibility compliance
-  describe('Accessibility', () => {
-    it('should have accessible name', () => {
-      // WHY getByRole with name?
-      // - Tests that screen readers can identify the element
-      // - Enforces proper labeling practices
-      // render(<ComponentName label="Test Label" />)
-      // expect(screen.getByRole('button', { name: /test label/i })).toBeInTheDocument()
-    })
-
-    it('should support keyboard navigation', async () => {
-      // WHY: Some users can't use a mouse
-      // const user = userEvent.setup()
-      // render(<ComponentName />)
-      //
-      // await user.tab()
-      // expect(screen.getByRole('button')).toHaveFocus()
-    })
-  })
-})
--- a/.claude/skills/frontend-testing/templates/hook-test.template.ts
+++ b/.claude/skills/frontend-testing/templates/hook-test.template.ts
@ -1,207 +0,0 @@
-/**
- * Test Template for Custom Hooks
- *
- * Instructions:
- * 1. Replace `useHookName` with your hook name
- * 2. Update import path
- * 3. Add/remove test sections based on hook features
- */
-
-import { renderHook, act, waitFor } from '@testing-library/react'
-// import { useHookName } from './use-hook-name'
-
-// ============================================================================
-// Mocks
-// ============================================================================
-
-// API services (if hook fetches data)
-// jest.mock('@/service/api')
-// import * as api from '@/service/api'
-// const mockedApi = api as jest.Mocked<typeof api>
-
-// ============================================================================
-// Test Helpers
-// ============================================================================
-
-// Wrapper for hooks that need context
-// const createWrapper = (contextValue = {}) => {
-//   return ({ children }: { children: React.ReactNode }) => (
-//     <SomeContext.Provider value={contextValue}>
-//       {children}
-//     </SomeContext.Provider>
-//   )
-// }
-
-// ============================================================================
-// Tests
-// ============================================================================
-
-describe('useHookName', () => {
-  beforeEach(() => {
-    jest.clearAllMocks()
-  })
-
-  // --------------------------------------------------------------------------
-  // Initial State
-  // --------------------------------------------------------------------------
-  describe('Initial State', () => {
-    it('should return initial state', () => {
-      // const { result } = renderHook(() => useHookName())
-      //
-      // expect(result.current.value).toBe(initialValue)
-      // expect(result.current.isLoading).toBe(false)
-    })
-
-    it('should accept initial value from props', () => {
-      // const { result } = renderHook(() => useHookName({ initialValue: 'custom' }))
-      //
-      // expect(result.current.value).toBe('custom')
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // State Updates
-  // --------------------------------------------------------------------------
-  describe('State Updates', () => {
-    it('should update value when setValue is called', () => {
-      // const { result } = renderHook(() => useHookName())
-      //
-      // act(() => {
-      //   result.current.setValue('new value')
-      // })
-      //
-      // expect(result.current.value).toBe('new value')
-    })
-
-    it('should reset to initial value', () => {
-      // const { result } = renderHook(() => useHookName({ initialValue: 'initial' }))
-      //
-      // act(() => {
-      //   result.current.setValue('changed')
-      // })
-      // expect(result.current.value).toBe('changed')
-      //
-      // act(() => {
-      //   result.current.reset()
-      // })
-      // expect(result.current.value).toBe('initial')
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Async Operations
-  // --------------------------------------------------------------------------
-  describe('Async Operations', () => {
-    it('should fetch data on mount', async () => {
-      // mockedApi.fetchData.mockResolvedValue({ data: 'test' })
-      //
-      // const { result } = renderHook(() => useHookName())
-      //
-      // // Initially loading
-      // expect(result.current.isLoading).toBe(true)
-      //
-      // // Wait for data
-      // await waitFor(() => {
-      //   expect(result.current.isLoading).toBe(false)
-      // })
-      //
-      // expect(result.current.data).toEqual({ data: 'test' })
-    })
-
-    it('should handle fetch error', async () => {
-      // mockedApi.fetchData.mockRejectedValue(new Error('Network error'))
-      //
-      // const { result } = renderHook(() => useHookName())
-      //
-      // await waitFor(() => {
-      //   expect(result.current.error).toBeTruthy()
-      // })
-      //
-      // expect(result.current.error?.message).toBe('Network error')
-    })
-
-    it('should refetch when dependency changes', async () => {
-      // mockedApi.fetchData.mockResolvedValue({ data: 'test' })
-      //
-      // const { result, rerender } = renderHook(
-      //   ({ id }) => useHookName(id),
-      //   { initialProps: { id: '1' } }
-      // )
-      //
-      // await waitFor(() => {
-      //   expect(mockedApi.fetchData).toHaveBeenCalledWith('1')
-      // })
-      //
-      // rerender({ id: '2' })
-      //
-      // await waitFor(() => {
-      //   expect(mockedApi.fetchData).toHaveBeenCalledWith('2')
-      // })
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Side Effects
-  // --------------------------------------------------------------------------
-  describe('Side Effects', () => {
-    it('should call callback when value changes', () => {
-      // const callback = jest.fn()
-      // const { result } = renderHook(() => useHookName({ onChange: callback }))
-      //
-      // act(() => {
-      //   result.current.setValue('new value')
-      // })
-      //
-      // expect(callback).toHaveBeenCalledWith('new value')
-    })
-
-    it('should cleanup on unmount', () => {
-      // const cleanup = jest.fn()
-      // jest.spyOn(window, 'addEventListener')
-      // jest.spyOn(window, 'removeEventListener')
-      //
-      // const { unmount } = renderHook(() => useHookName())
-      //
-      // expect(window.addEventListener).toHaveBeenCalled()
-      //
-      // unmount()
-      //
-      // expect(window.removeEventListener).toHaveBeenCalled()
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Edge Cases
-  // --------------------------------------------------------------------------
-  describe('Edge Cases', () => {
-    it('should handle null input', () => {
-      // const { result } = renderHook(() => useHookName(null))
-      //
-      // expect(result.current.value).toBeNull()
-    })
-
-    it('should handle rapid updates', () => {
-      // const { result } = renderHook(() => useHookName())
-      //
-      // act(() => {
-      //   result.current.setValue('1')
-      //   result.current.setValue('2')
-      //   result.current.setValue('3')
-      // })
-      //
-      // expect(result.current.value).toBe('3')
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // With Context (if hook uses context)
-  // --------------------------------------------------------------------------
-  describe('With Context', () => {
-    it('should use context value', () => {
-      // const wrapper = createWrapper({ someValue: 'context-value' })
-      // const { result } = renderHook(() => useHookName(), { wrapper })
-      //
-      // expect(result.current.contextValue).toBe('context-value')
-    })
-  })
-})
--- a/.claude/skills/frontend-testing/templates/utility-test.template.ts
+++ b/.claude/skills/frontend-testing/templates/utility-test.template.ts
@ -1,154 +0,0 @@
-/**
- * Test Template for Utility Functions
- *
- * Instructions:
- * 1. Replace `utilityFunction` with your function name
- * 2. Update import path
- * 3. Use test.each for data-driven tests
- */
-
-// import { utilityFunction } from './utility'
-
-// ============================================================================
-// Tests
-// ============================================================================
-
-describe('utilityFunction', () => {
-  // --------------------------------------------------------------------------
-  // Basic Functionality
-  // --------------------------------------------------------------------------
-  describe('Basic Functionality', () => {
-    it('should return expected result for valid input', () => {
-      // expect(utilityFunction('input')).toBe('expected-output')
-    })
-
-    it('should handle multiple arguments', () => {
-      // expect(utilityFunction('a', 'b', 'c')).toBe('abc')
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Data-Driven Tests
-  // --------------------------------------------------------------------------
-  describe('Input/Output Mapping', () => {
-    test.each([
-      // [input, expected]
-      ['input1', 'output1'],
-      ['input2', 'output2'],
-      ['input3', 'output3'],
-    ])('should return %s for input %s', (input, expected) => {
-      // expect(utilityFunction(input)).toBe(expected)
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Edge Cases
-  // --------------------------------------------------------------------------
-  describe('Edge Cases', () => {
-    it('should handle empty string', () => {
-      // expect(utilityFunction('')).toBe('')
-    })
-
-    it('should handle null', () => {
-      // expect(utilityFunction(null)).toBe(null)
-      // or
-      // expect(() => utilityFunction(null)).toThrow()
-    })
-
-    it('should handle undefined', () => {
-      // expect(utilityFunction(undefined)).toBe(undefined)
-      // or
-      // expect(() => utilityFunction(undefined)).toThrow()
-    })
-
-    it('should handle empty array', () => {
-      // expect(utilityFunction([])).toEqual([])
-    })
-
-    it('should handle empty object', () => {
-      // expect(utilityFunction({})).toEqual({})
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Boundary Conditions
-  // --------------------------------------------------------------------------
-  describe('Boundary Conditions', () => {
-    it('should handle minimum value', () => {
-      // expect(utilityFunction(0)).toBe(0)
-    })
-
-    it('should handle maximum value', () => {
-      // expect(utilityFunction(Number.MAX_SAFE_INTEGER)).toBe(...)
-    })
-
-    it('should handle negative numbers', () => {
-      // expect(utilityFunction(-1)).toBe(...)
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Type Coercion (if applicable)
-  // --------------------------------------------------------------------------
-  describe('Type Handling', () => {
-    it('should handle numeric string', () => {
-      // expect(utilityFunction('123')).toBe(123)
-    })
-
-    it('should handle boolean', () => {
-      // expect(utilityFunction(true)).toBe(...)
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Error Cases
-  // --------------------------------------------------------------------------
-  describe('Error Handling', () => {
-    it('should throw for invalid input', () => {
-      // expect(() => utilityFunction('invalid')).toThrow('Error message')
-    })
-
-    it('should throw with specific error type', () => {
-      // expect(() => utilityFunction('invalid')).toThrow(ValidationError)
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Complex Objects (if applicable)
-  // --------------------------------------------------------------------------
-  describe('Object Handling', () => {
-    it('should preserve object structure', () => {
-      // const input = { a: 1, b: 2 }
-      // expect(utilityFunction(input)).toEqual({ a: 1, b: 2 })
-    })
-
-    it('should handle nested objects', () => {
-      // const input = { nested: { deep: 'value' } }
-      // expect(utilityFunction(input)).toEqual({ nested: { deep: 'transformed' } })
-    })
-
-    it('should not mutate input', () => {
-      // const input = { a: 1 }
-      // const inputCopy = { ...input }
-      // utilityFunction(input)
-      // expect(input).toEqual(inputCopy)
-    })
-  })
-
-  // --------------------------------------------------------------------------
-  // Array Handling (if applicable)
-  // --------------------------------------------------------------------------
-  describe('Array Handling', () => {
-    it('should process all elements', () => {
-      // expect(utilityFunction([1, 2, 3])).toEqual([2, 4, 6])
-    })
-
-    it('should handle single element array', () => {
-      // expect(utilityFunction([1])).toEqual([2])
-    })
-
-    it('should preserve order', () => {
-      // expect(utilityFunction(['c', 'a', 'b'])).toEqual(['c', 'a', 'b'])
-    })
-  })
-})
--- a/.coveragerc
+++ b/.coveragerc
@ -1,5 +0,0 @@
-[run]
-omit =
-    api/tests/*
-    api/migrations/*
-    api/core/rag/datasource/vdb/*
--- a/web/AGENTS.md
+++ b/web/AGENTS.md
@ -1,5 +1,6 @@
+# Cursor Rules for Dify Project
+
 ## Automated Test Generation

 - Use `web/testing/testing.md` as the canonical instruction set for generating frontend automated tests.
 - When proposing or saving tests, re-read that document and follow every requirement.
- All frontend tests MUST also comply with the `frontend-testing` skill. Treat the skill as a mandatory constraint, not optional guidance.
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@ -9,14 +9,6 @@
 # Backend (default owner, more specific rules below will override)
 api/ @QuantumGhost

-# Backend - MCP
-api/core/mcp/ @Nov1c444
-api/core/entities/mcp_provider.py @Nov1c444
-api/services/tools/mcp_tools_manage_service.py @Nov1c444
-api/controllers/mcp/ @Nov1c444
-api/controllers/console/app/mcp_server.py @Nov1c444
-api/tests/**/*mcp* @Nov1c444
-
 # Backend - Workflow - Engine (Core graph execution engine)
 api/core/workflow/graph_engine/ @laipz8200 @QuantumGhost
 api/core/workflow/runtime/ @laipz8200 @QuantumGhost
--- a/.github/ISSUE_TEMPLATE/refactor.yml
+++ b/.github/ISSUE_TEMPLATE/refactor.yml
@ -1,6 +1,8 @@
-name: "✨ Refactor or Chore"
-description: Refactor existing code or perform maintenance chores to improve readability and reliability.
-title: "[Refactor/Chore] "
+name: "✨ Refactor"
+description: Refactor existing code for improved readability and maintainability.
+title: "[Chore/Refactor] "
+labels:
+  - refactor
 body:
  - type: checkboxes
    attributes:
@ -9,7 +11,7 @@ body:
      options:
        - label: I have read the [Contributing Guide](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md) and [Language Policy](https://github.com/langgenius/dify/issues/1542).
          required: true
-        - label: This is only for refactors or chores; if you would like to ask a question, please head to [Discussions](https://github.com/langgenius/dify/discussions/categories/general).
+        - label: This is only for refactoring, if you would like to ask a question, please head to [Discussions](https://github.com/langgenius/dify/discussions/categories/general).
          required: true
        - label: I have searched for existing issues [search for existing issues](https://github.com/langgenius/dify/issues), including closed ones.
          required: true
@ -23,14 +25,14 @@ body:
    id: description
    attributes:
      label: Description
-      placeholder: "Describe the refactor or chore you are proposing."
+      placeholder: "Describe the refactor you are proposing."
    validations:
      required: true
  - type: textarea
    id: motivation
    attributes:
      label: Motivation
-      placeholder: "Explain why this refactor or chore is necessary."
+      placeholder: "Explain why this refactor is necessary."
    validations:
      required: false
  - type: textarea
--- a/.github/ISSUE_TEMPLATE/tracker.yml
+++ b/.github/ISSUE_TEMPLATE/tracker.yml
@ -0,0 +1,13 @@
+name: "👾 Tracker"
+description: For inner usages, please do not use this template.
+title: "[Tracker] "
+labels:
+  - tracker
+body:
+  - type: textarea
+    id: content
+    attributes:
+      label: Blockers
+      placeholder: "- [ ] ..."
+    validations:
+      required: true
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@ -0,0 +1,12 @@
+# Copilot Instructions
+
+GitHub Copilot must follow the unified frontend testing requirements documented in `web/testing/testing.md`.
+
+Key reminders:
+
+- Generate tests using the mandated tech stack, naming, and code style (AAA pattern, `fireEvent`, descriptive test names, cleans up mocks).
+- Cover rendering, prop combinations, and edge cases by default; extend coverage for hooks, routing, async flows, and domain-specific components when applicable.
+- Target >95% line and branch coverage and 100% function/statement coverage.
+- Apply the project's mocking conventions for i18n, toast notifications, and Next.js utilities.
+
+Any suggestions from Copilot that conflict with `web/testing/testing.md` should be revised before acceptance.
--- a/.github/workflows/api-tests.yml
+++ b/.github/workflows/api-tests.yml
@ -71,18 +71,18 @@ jobs:
        run: |
          cp api/tests/integration_tests/.env.example api/tests/integration_tests/.env

-      - name: Run API Tests
-        env:
-          STORAGE_TYPE: opendal
-          OPENDAL_SCHEME: fs
-          OPENDAL_FS_ROOT: /tmp/dify-storage
+      - name: Run Workflow
+        run: uv run --project api bash dev/pytest/pytest_workflow.sh
+
+      - name: Run Tool
+        run: uv run --project api bash dev/pytest/pytest_tools.sh
+
+      - name: Run TestContainers
+        run: uv run --project api bash dev/pytest/pytest_testcontainers.sh
+
+      - name: Run Unit tests
        run: |
-          uv run --project api pytest \
-            --timeout "${PYTEST_TIMEOUT:-180}" \
-            api/tests/integration_tests/workflow \
-            api/tests/integration_tests/tools \
-            api/tests/test_containers_integration_tests \
-            api/tests/unit_tests
+          uv run --project api bash dev/pytest/pytest_unit_tests.sh

      - name: Coverage Summary
        run: |
@ -93,12 +93,5 @@ jobs:
          # Create a detailed coverage summary
          echo "### Test Coverage Summary :test_tube:" >> $GITHUB_STEP_SUMMARY
          echo "Total Coverage: ${TOTAL_COVERAGE}%" >> $GITHUB_STEP_SUMMARY
-          {
-            echo ""
-            echo "<details><summary>File-level coverage (click to expand)</summary>"
-            echo ""
-            echo '```'
-            uv run --project api coverage report -m
-            echo '```'
-            echo "</details>"
-          } >> $GITHUB_STEP_SUMMARY
+          uv run --project api coverage report --format=markdown >> $GITHUB_STEP_SUMMARY
+
--- a/.github/workflows/autofix.yml
+++ b/.github/workflows/autofix.yml
@ -13,12 +13,11 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
-      - uses: actions/setup-python@v5
+
+      # Use uv to ensure we have the same ruff version in CI and locally.
+      - uses: astral-sh/setup-uv@v6
        with:
          python-version: "3.11"
-
-      - uses: astral-sh/setup-uv@v6
-
      - run: |
          cd api
          uv sync --dev
@ -36,11 +35,10 @@ jobs:

      - name: ast-grep
        run: |
-          # ast-grep exits 1 if no matches are found; allow idempotent runs.
-          uvx --from ast-grep-cli ast-grep --pattern 'db.session.query($WHATEVER).filter($HERE)' --rewrite 'db.session.query($WHATEVER).where($HERE)' -l py --update-all || true
-          uvx --from ast-grep-cli ast-grep --pattern 'session.query($WHATEVER).filter($HERE)' --rewrite 'session.query($WHATEVER).where($HERE)' -l py --update-all || true
-          uvx --from ast-grep-cli ast-grep -p '$A = db.Column($$$B)' -r '$A = mapped_column($$$B)' -l py --update-all || true
-          uvx --from ast-grep-cli ast-grep -p '$A : $T = db.Column($$$B)' -r '$A : $T = mapped_column($$$B)' -l py --update-all || true
+          uvx --from ast-grep-cli sg --pattern 'db.session.query($WHATEVER).filter($HERE)' --rewrite 'db.session.query($WHATEVER).where($HERE)' -l py --update-all
+          uvx --from ast-grep-cli sg --pattern 'session.query($WHATEVER).filter($HERE)' --rewrite 'session.query($WHATEVER).where($HERE)' -l py --update-all
+          uvx --from ast-grep-cli sg -p '$A = db.Column($$$B)' -r '$A = mapped_column($$$B)' -l py --update-all
+          uvx --from ast-grep-cli sg -p '$A : $T = db.Column($$$B)' -r '$A : $T = mapped_column($$$B)' -l py --update-all
          # Convert Optional[T] to T | None (ignoring quoted types)
          cat > /tmp/optional-rule.yml << 'EOF'
          id: convert-optional-to-union
@ -58,15 +56,14 @@ jobs:
                    pattern: $T
          fix: $T | None
          EOF
-          uvx --from ast-grep-cli ast-grep scan . --inline-rules "$(cat /tmp/optional-rule.yml)" --update-all
+          uvx --from ast-grep-cli sg scan --inline-rules "$(cat /tmp/optional-rule.yml)" --update-all
          # Fix forward references that were incorrectly converted (Python doesn't support "Type" | None syntax)
          find . -name "*.py" -type f -exec sed -i.bak -E 's/"([^"]+)" \| None/Optional["\1"]/g; s/'"'"'([^'"'"']+)'"'"' \| None/Optional['"'"'\1'"'"']/g' {} \;
          find . -name "*.py.bak" -type f -delete

-      # mdformat breaks YAML front matter in markdown files. Add --exclude for directories containing YAML front matter.
      - name: mdformat
        run: |
-          uvx --python 3.13 mdformat . --exclude ".claude/skills/**"
+          uvx mdformat .

      - name: Install pnpm
        uses: pnpm/action-setup@v4
@ -87,6 +84,7 @@ jobs:

      - name: oxlint
        working-directory: ./web
-        run: pnpm exec oxlint --config .oxlintrc.json --fix .
+        run: |
+          pnpx oxlint --fix

      - uses: autofix-ci/action@635ffb0c9798bd160680f18fd73371e355b85f27
--- a/.github/workflows/semantic-pull-request.yml
+++ b/.github/workflows/semantic-pull-request.yml
@ -1,21 +0,0 @@
-name: Semantic Pull Request
-
-on:
-  pull_request:
-    types:
-      - opened
-      - edited
-      - reopened
-      - synchronize
-
-jobs:
-  lint:
-    name: Validate PR title
-    permissions:
-      pull-requests: read
-    runs-on: ubuntu-latest
-    steps:
-      - name: Check title
-        uses: amannn/action-semantic-pull-request@v6.1.1
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
@ -189,7 +189,6 @@ docker/volumes/matrixone/*
 docker/volumes/mysql/*
 docker/volumes/seekdb/*
 !docker/volumes/oceanbase/init.d
-docker/volumes/iris/*

 docker/nginx/conf.d/default.conf
 docker/nginx/ssl/*
--- a/.nvmrc
+++ b/.nvmrc
@ -1 +0,0 @@
-22.11.0
--- a/.windsurf/rules/testing.md
+++ b/.windsurf/rules/testing.md
@ -0,0 +1,5 @@
+# Windsurf Testing Rules
+
+- Use `web/testing/testing.md` as the single source of truth for frontend automated testing.
+- Honor every requirement in that document when generating or accepting tests.
+- When proposing or saving tests, re-read that document and follow every requirement.
--- a/api/.env.example
+++ b/api/.env.example
@ -543,25 +543,6 @@ APP_MAX_EXECUTION_TIME=1200
 APP_DEFAULT_ACTIVE_REQUESTS=0
 APP_MAX_ACTIVE_REQUESTS=0

-# Aliyun SLS Logstore Configuration
-# Aliyun Access Key ID
-ALIYUN_SLS_ACCESS_KEY_ID=
-# Aliyun Access Key Secret
-ALIYUN_SLS_ACCESS_KEY_SECRET=
-# Aliyun SLS Endpoint (e.g., cn-hangzhou.log.aliyuncs.com)
-ALIYUN_SLS_ENDPOINT=
-# Aliyun SLS Region (e.g., cn-hangzhou)
-ALIYUN_SLS_REGION=
-# Aliyun SLS Project Name
-ALIYUN_SLS_PROJECT_NAME=
-# Number of days to retain workflow run logs (default: 365 days， 3650 for permanent storage)
-ALIYUN_SLS_LOGSTORE_TTL=365
-# Enable dual-write to both SLS LogStore and SQL database (default: false)
-LOGSTORE_DUAL_WRITE_ENABLED=false
-# Enable dual-read fallback to SQL database when LogStore returns no results (default: true)
-# Useful for migration scenarios where historical data exists only in SQL database
-LOGSTORE_DUAL_READ_ENABLED=true
-
 # Celery beat configuration
 CELERY_BEAT_SCHEDULER_TIME=1

@ -673,20 +654,3 @@ TENANT_ISOLATED_TASK_CONCURRENCY=1

 # Maximum number of segments for dataset segments API (0 for unlimited)
 DATASET_MAX_SEGMENTS_PER_REQUEST=0
-
-# Multimodal knowledgebase limit
-SINGLE_CHUNK_ATTACHMENT_LIMIT=10
-ATTACHMENT_IMAGE_FILE_SIZE_LIMIT=2
-ATTACHMENT_IMAGE_DOWNLOAD_TIMEOUT=60
-IMAGE_FILE_BATCH_LIMIT=10
-
-# Maximum allowed CSV file size for annotation import in megabytes
-ANNOTATION_IMPORT_FILE_SIZE_LIMIT=2
-#Maximum number of annotation records allowed in a single import
-ANNOTATION_IMPORT_MAX_RECORDS=10000
-# Minimum number of annotation records required in a single import
-ANNOTATION_IMPORT_MIN_RECORDS=1
-ANNOTATION_IMPORT_RATE_LIMIT_PER_MINUTE=5
-ANNOTATION_IMPORT_RATE_LIMIT_PER_HOUR=20
-# Maximum number of concurrent annotation import tasks per tenant
-ANNOTATION_IMPORT_MAX_CONCURRENT=5
--- a/api/app_factory.py
+++ b/api/app_factory.py
@ -75,7 +75,6 @@ def initialize_extensions(app: DifyApp):
        ext_import_modules,
        ext_logging,
        ext_login,
-        ext_logstore,
        ext_mail,
        ext_migrate,
        ext_orjson,
@ -84,7 +83,6 @@ def initialize_extensions(app: DifyApp):
        ext_redis,
        ext_request_logging,
        ext_sentry,
-        ext_session_factory,
        ext_set_secretkey,
        ext_storage,
        ext_timezone,
@ -106,7 +104,6 @@ def initialize_extensions(app: DifyApp):
        ext_migrate,
        ext_redis,
        ext_storage,
-        ext_logstore,  # Initialize logstore after storage, before celery
        ext_celery,
        ext_login,
        ext_mail,
@ -117,7 +114,6 @@ def initialize_extensions(app: DifyApp):
        ext_commands,
        ext_otel,
        ext_request_logging,
-        ext_session_factory,
    ]
    for ext in extensions:
        short_name = ext.__name__.split(".")[-1]
--- a/api/configs/feature/init.py
+++ b/api/configs/feature/init.py
@ -360,57 +360,6 @@ class FileUploadConfig(BaseSettings):
        default=10,
    )

-    IMAGE_FILE_BATCH_LIMIT: PositiveInt = Field(
-        description="Maximum number of files allowed in a image batch upload operation",
-        default=10,
-    )
-
-    SINGLE_CHUNK_ATTACHMENT_LIMIT: PositiveInt = Field(
-        description="Maximum number of files allowed in a single chunk attachment",
-        default=10,
-    )
-
-    ATTACHMENT_IMAGE_FILE_SIZE_LIMIT: NonNegativeInt = Field(
-        description="Maximum allowed image file size for attachments in megabytes",
-        default=2,
-    )
-
-    ATTACHMENT_IMAGE_DOWNLOAD_TIMEOUT: NonNegativeInt = Field(
-        description="Timeout for downloading image attachments in seconds",
-        default=60,
-    )
-
-    # Annotation Import Security Configurations
-    ANNOTATION_IMPORT_FILE_SIZE_LIMIT: NonNegativeInt = Field(
-        description="Maximum allowed CSV file size for annotation import in megabytes",
-        default=2,
-    )
-
-    ANNOTATION_IMPORT_MAX_RECORDS: PositiveInt = Field(
-        description="Maximum number of annotation records allowed in a single import",
-        default=10000,
-    )
-
-    ANNOTATION_IMPORT_MIN_RECORDS: PositiveInt = Field(
-        description="Minimum number of annotation records required in a single import",
-        default=1,
-    )
-
-    ANNOTATION_IMPORT_RATE_LIMIT_PER_MINUTE: PositiveInt = Field(
-        description="Maximum number of annotation import requests per minute per tenant",
-        default=5,
-    )
-
-    ANNOTATION_IMPORT_RATE_LIMIT_PER_HOUR: PositiveInt = Field(
-        description="Maximum number of annotation import requests per hour per tenant",
-        default=20,
-    )
-
-    ANNOTATION_IMPORT_MAX_CONCURRENT: PositiveInt = Field(
-        description="Maximum number of concurrent annotation import tasks per tenant",
-        default=2,
-    )
-
    inner_UPLOAD_FILE_EXTENSION_BLACKLIST: str = Field(
        description=(
            "Comma-separated list of file extensions that are blocked from upload. "
--- a/api/configs/middleware/init.py
+++ b/api/configs/middleware/init.py
@ -26,7 +26,6 @@ from .vdb.clickzetta_config import ClickzettaConfig
 from .vdb.couchbase_config import CouchbaseConfig
 from .vdb.elasticsearch_config import ElasticsearchConfig
 from .vdb.huawei_cloud_config import HuaweiCloudConfig
-from .vdb.iris_config import IrisVectorConfig
 from .vdb.lindorm_config import LindormConfig
 from .vdb.matrixone_config import MatrixoneConfig
 from .vdb.milvus_config import MilvusConfig
@ -107,7 +106,7 @@ class KeywordStoreConfig(BaseSettings):

 class DatabaseConfig(BaseSettings):
    # Database type selector
-    DB_TYPE: Literal["postgresql", "mysql", "oceanbase", "seekdb"] = Field(
+    DB_TYPE: Literal["postgresql", "mysql", "oceanbase"] = Field(
        description="Database type to use. OceanBase is MySQL-compatible.",
        default="postgresql",
    )
@ -337,7 +336,6 @@ class MiddlewareConfig(
    ChromaConfig,
    ClickzettaConfig,
    HuaweiCloudConfig,
-    IrisVectorConfig,
    MilvusConfig,
    AlibabaCloudMySQLConfig,
    MyScaleConfig,
--- a/api/configs/middleware/vdb/iris_config.py
+++ b/api/configs/middleware/vdb/iris_config.py
@ -1,91 +0,0 @@
-"""Configuration for InterSystems IRIS vector database."""
-
-from pydantic import Field, PositiveInt, model_validator
-from pydantic_settings import BaseSettings
-
-
-class IrisVectorConfig(BaseSettings):
-    """Configuration settings for IRIS vector database connection and pooling."""
-
-    IRIS_HOST: str | None = Field(
-        description="Hostname or IP address of the IRIS server.",
-        default="localhost",
-    )
-
-    IRIS_SUPER_SERVER_PORT: PositiveInt | None = Field(
-        description="Port number for IRIS connection.",
-        default=1972,
-    )
-
-    IRIS_USER: str | None = Field(
-        description="Username for IRIS authentication.",
-        default="_SYSTEM",
-    )
-
-    IRIS_PASSWORD: str | None = Field(
-        description="Password for IRIS authentication.",
-        default="Dify@1234",
-    )
-
-    IRIS_SCHEMA: str | None = Field(
-        description="Schema name for IRIS tables.",
-        default="dify",
-    )
-
-    IRIS_DATABASE: str | None = Field(
-        description="Database namespace for IRIS connection.",
-        default="USER",
-    )
-
-    IRIS_CONNECTION_URL: str | None = Field(
-        description="Full connection URL for IRIS (overrides individual fields if provided).",
-        default=None,
-    )
-
-    IRIS_MIN_CONNECTION: PositiveInt = Field(
-        description="Minimum number of connections in the pool.",
-        default=1,
-    )
-
-    IRIS_MAX_CONNECTION: PositiveInt = Field(
-        description="Maximum number of connections in the pool.",
-        default=3,
-    )
-
-    IRIS_TEXT_INDEX: bool = Field(
-        description="Enable full-text search index using %iFind.Index.Basic.",
-        default=True,
-    )
-
-    IRIS_TEXT_INDEX_LANGUAGE: str = Field(
-        description="Language for full-text search index (e.g., 'en', 'ja', 'zh', 'de').",
-        default="en",
-    )
-
-    @model_validator(mode="before")
-    @classmethod
-    def validate_config(cls, values: dict) -> dict:
-        """Validate IRIS configuration values.
-
-        Args:
-            values: Configuration dictionary
-
-        Returns:
-            Validated configuration dictionary
-
-        Raises:
-            ValueError: If required fields are missing or pool settings are invalid
-        """
-        # Only validate required fields if IRIS is being used as the vector store
-        # This allows the config to be loaded even when IRIS is not in use
-
-        # vector_store = os.environ.get("VECTOR_STORE", "")
-        # We rely on Pydantic defaults for required fields if they are missing from env.
-        # Strict existence check is removed to allow defaults to work.
-
-        min_conn = values.get("IRIS_MIN_CONNECTION", 1)
-        max_conn = values.get("IRIS_MAX_CONNECTION", 3)
-        if min_conn > max_conn:
-            raise ValueError("IRIS_MIN_CONNECTION must be less than or equal to IRIS_MAX_CONNECTION")
-
-        return values
--- a/api/constants/languages.py
+++ b/api/constants/languages.py
@ -20,7 +20,6 @@ language_timezone_mapping = {
    "sl-SI": "Europe/Ljubljana",
    "th-TH": "Asia/Bangkok",
    "id-ID": "Asia/Jakarta",
-    "ar-TN": "Africa/Tunis",
 }

 languages = list(language_timezone_mapping.keys())
--- a/api/controllers/console/admin.py
+++ b/api/controllers/console/admin.py
@ -6,20 +6,19 @@ from flask import request
 from flask_restx import Resource
 from pydantic import BaseModel, Field, field_validator
 from sqlalchemy import select
+from sqlalchemy.orm import Session
 from werkzeug.exceptions import NotFound, Unauthorized

+P = ParamSpec("P")
+R = TypeVar("R")
 from configs import dify_config
 from constants.languages import supported_language
 from controllers.console import console_ns
 from controllers.console.wraps import only_edition_cloud
-from core.db.session_factory import session_factory
 from extensions.ext_database import db
 from libs.token import extract_access_token
 from models.model import App, InstalledApp, RecommendedApp

-P = ParamSpec("P")
-R = TypeVar("R")
-
 DEFAULT_REF_TEMPLATE_SWAGGER_2_0 = "#/definitions/{model}"


@ -91,7 +90,7 @@ class InsertExploreAppListApi(Resource):
            privacy_policy = site.privacy_policy or payload.privacy_policy or ""
            custom_disclaimer = site.custom_disclaimer or payload.custom_disclaimer or ""

-        with session_factory.create_session() as session:
+        with Session(db.engine) as session:
            recommended_app = session.execute(
                select(RecommendedApp).where(RecommendedApp.app_id == payload.app_id)
            ).scalar_one_or_none()
@ -139,7 +138,7 @@ class InsertExploreAppApi(Resource):
    @only_edition_cloud
    @admin_required
    def delete(self, app_id):
-        with session_factory.create_session() as session:
+        with Session(db.engine) as session:
            recommended_app = session.execute(
                select(RecommendedApp).where(RecommendedApp.app_id == str(app_id))
            ).scalar_one_or_none()
@ -147,13 +146,13 @@ class InsertExploreAppApi(Resource):
        if not recommended_app:
            return {"result": "success"}, 204

-        with session_factory.create_session() as session:
+        with Session(db.engine) as session:
            app = session.execute(select(App).where(App.id == recommended_app.app_id)).scalar_one_or_none()

        if app:
            app.is_public = False

-        with session_factory.create_session() as session:
+        with Session(db.engine) as session:
            installed_apps = (
                session.execute(
                    select(InstalledApp).where(
--- a/api/controllers/console/app/annotation.py
+++ b/api/controllers/console/app/annotation.py
@ -1,6 +1,6 @@
 from typing import Any, Literal

-from flask import abort, make_response, request
+from flask import request
 from flask_restx import Resource, fields, marshal, marshal_with
 from pydantic import BaseModel, Field, field_validator

@ -8,8 +8,6 @@ from controllers.common.errors import NoFileUploadedError, TooManyFilesError
 from controllers.console import console_ns
 from controllers.console.wraps import (
    account_initialization_required,
-    annotation_import_concurrency_limit,
-    annotation_import_rate_limit,
    cloud_edition_billing_resource_check,
    edit_permission_required,
    setup_required,
@ -259,7 +257,7 @@ class AnnotationApi(Resource):
@console_ns.route("/apps/<uuid:app_id>/annotations/export")
 class AnnotationExportApi(Resource):
    @console_ns.doc("export_annotations")
-    @console_ns.doc(description="Export all annotations for an app with CSV injection protection")
+    @console_ns.doc(description="Export all annotations for an app")
    @console_ns.doc(params={"app_id": "Application ID"})
    @console_ns.response(
        200,
@ -274,14 +272,8 @@ class AnnotationExportApi(Resource):
    def get(self, app_id):
        app_id = str(app_id)
        annotation_list = AppAnnotationService.export_annotation_list_by_app_id(app_id)
-        response_data = {"data": marshal(annotation_list, annotation_fields)}
-
-        # Create response with secure headers for CSV export
-        response = make_response(response_data, 200)
-        response.headers["Content-Type"] = "application/json; charset=utf-8"
-        response.headers["X-Content-Type-Options"] = "nosniff"
-
-        return response
+        response = {"data": marshal(annotation_list, annotation_fields)}
+        return response, 200


@console_ns.route("/apps/<uuid:app_id>/annotations/<uuid:annotation_id>")
@ -322,25 +314,18 @@ class AnnotationUpdateDeleteApi(Resource):
@console_ns.route("/apps/<uuid:app_id>/annotations/batch-import")
 class AnnotationBatchImportApi(Resource):
    @console_ns.doc("batch_import_annotations")
-    @console_ns.doc(description="Batch import annotations from CSV file with rate limiting and security checks")
+    @console_ns.doc(description="Batch import annotations from CSV file")
    @console_ns.doc(params={"app_id": "Application ID"})
    @console_ns.response(200, "Batch import started successfully")
    @console_ns.response(403, "Insufficient permissions")
    @console_ns.response(400, "No file uploaded or too many files")
-    @console_ns.response(413, "File too large")
-    @console_ns.response(429, "Too many requests or concurrent imports")
    @setup_required
    @login_required
    @account_initialization_required
    @cloud_edition_billing_resource_check("annotation")
-    @annotation_import_rate_limit
-    @annotation_import_concurrency_limit
    @edit_permission_required
    def post(self, app_id):
-        from configs import dify_config
-
        app_id = str(app_id)
-
        # check file
        if "file" not in request.files:
            raise NoFileUploadedError()
@ -350,27 +335,9 @@ class AnnotationBatchImportApi(Resource):

        # get file from request
        file = request.files["file"]
-
        # check file type
        if not file.filename or not file.filename.lower().endswith(".csv"):
            raise ValueError("Invalid file type. Only CSV files are allowed")
-
-        # Check file size before processing
-        file.seek(0, 2)  # Seek to end of file
-        file_size = file.tell()
-        file.seek(0)  # Reset to beginning
-
-        max_size_bytes = dify_config.ANNOTATION_IMPORT_FILE_SIZE_LIMIT * 1024 * 1024
-        if file_size > max_size_bytes:
-            abort(
-                413,
-                f"File size exceeds maximum limit of {dify_config.ANNOTATION_IMPORT_FILE_SIZE_LIMIT}MB. "
-                f"Please reduce the file size and try again.",
-            )
-
-        if file_size == 0:
-            raise ValueError("The uploaded file is empty")
-
        return AppAnnotationService.batch_import_app_annotations(app_id, file)


--- a/api/controllers/console/app/message.py
+++ b/api/controllers/console/app/message.py
@ -61,7 +61,6 @@ class ChatMessagesQuery(BaseModel):
 class MessageFeedbackPayload(BaseModel):
    message_id: str = Field(..., description="Message ID")
    rating: Literal["like", "dislike"] | None = Field(default=None, description="Feedback rating")
-    content: str | None = Field(default=None, description="Feedback content")

    @field_validator("message_id")
    @classmethod
@ -325,7 +324,6 @@ class MessageFeedbackApi(Resource):
            db.session.delete(feedback)
        elif args.rating and feedback:
            feedback.rating = args.rating
-            feedback.content = args.content
        elif not args.rating and not feedback:
            raise ValueError("rating cannot be None when feedback not exists")
        else:
@ -337,7 +335,6 @@ class MessageFeedbackApi(Resource):
                conversation_id=message.conversation_id,
                message_id=message.id,
                rating=rating_value,
-                content=args.content,
                from_source="admin",
                from_account_id=current_user.id,
            )
--- a/api/controllers/console/app/workflow_trigger.py
+++ b/api/controllers/console/app/workflow_trigger.py
@ -114,7 +114,7 @@ class AppTriggersApi(Resource):

@console_ns.route("/apps/<uuid:app_id>/trigger-enable")
 class AppTriggerEnableApi(Resource):
-    @console_ns.expect(console_ns.models[ParserEnable.__name__])
+    @console_ns.expect(console_ns.models[ParserEnable.__name__], validate=True)
    @setup_required
    @login_required
    @account_initialization_required
--- a/api/controllers/console/auth/login.py
+++ b/api/controllers/console/auth/login.py
@ -22,12 +22,7 @@ from controllers.console.error import (
    NotAllowedCreateWorkspace,
    WorkspacesLimitExceeded,
 )
-from controllers.console.wraps import (
-    decrypt_code_field,
-    decrypt_password_field,
-    email_password_login_enabled,
-    setup_required,
-)
+from controllers.console.wraps import email_password_login_enabled, setup_required
 from events.tenant_event import tenant_was_created
 from libs.helper import EmailStr, extract_remote_ip
 from libs.login import current_account_with_tenant
@ -84,7 +79,6 @@ class LoginApi(Resource):
    @setup_required
    @email_password_login_enabled
    @console_ns.expect(console_ns.models[LoginPayload.__name__])
-    @decrypt_password_field
    def post(self):
        """Authenticate user and login."""
        args = LoginPayload.model_validate(console_ns.payload)
@ -224,7 +218,6 @@ class EmailCodeLoginSendEmailApi(Resource):
 class EmailCodeLoginApi(Resource):
    @setup_required
    @console_ns.expect(console_ns.models[EmailCodeLoginPayload.__name__])
-    @decrypt_code_field
    def post(self):
        args = EmailCodeLoginPayload.model_validate(console_ns.payload)

--- a/api/controllers/console/datasets/data_source.py
+++ b/api/controllers/console/datasets/data_source.py
@ -140,18 +140,6 @@ class DataSourceNotionListApi(Resource):
        credential_id = request.args.get("credential_id", default=None, type=str)
        if not credential_id:
            raise ValueError("Credential id is required.")
-
-        # Get datasource_parameters from query string (optional, for GitHub and other datasources)
-        datasource_parameters_str = request.args.get("datasource_parameters", default=None, type=str)
-        datasource_parameters = {}
-        if datasource_parameters_str:
-            try:
-                datasource_parameters = json.loads(datasource_parameters_str)
-                if not isinstance(datasource_parameters, dict):
-                    raise ValueError("datasource_parameters must be a JSON object.")
-            except json.JSONDecodeError:
-                raise ValueError("Invalid datasource_parameters JSON format.")
-
        datasource_provider_service = DatasourceProviderService()
        credential = datasource_provider_service.get_datasource_credentials(
            tenant_id=current_tenant_id,
@ -199,7 +187,7 @@ class DataSourceNotionListApi(Resource):
            online_document_result: Generator[OnlineDocumentPagesMessage, None, None] = (
                datasource_runtime.get_online_document_pages(
                    user_id=current_user.id,
-                    datasource_parameters=datasource_parameters,
+                    datasource_parameters={},
                    provider_type=datasource_runtime.datasource_provider_type(),
                )
            )
@ -230,14 +218,14 @@ class DataSourceNotionListApi(Resource):


@console_ns.route(
-    "/notion/pages/<uuid:page_id>/<string:page_type>/preview",
+    "/notion/workspaces/<uuid:workspace_id>/pages/<uuid:page_id>/<string:page_type>/preview",
    "/datasets/notion-indexing-estimate",
 )
 class DataSourceNotionApi(Resource):
    @setup_required
    @login_required
    @account_initialization_required
-    def get(self, page_id, page_type):
+    def get(self, workspace_id, page_id, page_type):
        _, current_tenant_id = current_account_with_tenant()

        credential_id = request.args.get("credential_id", default=None, type=str)
@ -251,10 +239,11 @@ class DataSourceNotionApi(Resource):
            plugin_id="langgenius/notion_datasource",
        )

+        workspace_id = str(workspace_id)
        page_id = str(page_id)

        extractor = NotionExtractor(
-            notion_workspace_id="",
+            notion_workspace_id=workspace_id,
            notion_obj_id=page_id,
            notion_page_type=page_type,
            notion_access_token=credential.get("integration_secret"),
--- a/api/controllers/console/datasets/datasets.py
+++ b/api/controllers/console/datasets/datasets.py
@ -151,7 +151,6 @@ class DatasetUpdatePayload(BaseModel):
    external_knowledge_id: str | None = None
    external_knowledge_api_id: str | None = None
    icon_info: dict[str, Any] | None = None
-    is_multimodal: bool | None = False

    @field_validator("indexing_technique")
    @classmethod
@ -223,7 +222,6 @@ def _get_retrieval_methods_by_vector_type(vector_type: str | None, is_mock: bool
        VectorType.COUCHBASE,
        VectorType.OPENGAUSS,
        VectorType.OCEANBASE,
-        VectorType.SEEKDB,
        VectorType.TABLESTORE,
        VectorType.HUAWEI_CLOUD,
        VectorType.TENCENT,
@ -231,7 +229,6 @@ def _get_retrieval_methods_by_vector_type(vector_type: str | None, is_mock: bool
        VectorType.CLICKZETTA,
        VectorType.BAIDU,
        VectorType.ALIBABACLOUD_MYSQL,
-        VectorType.IRIS,
    }

    semantic_methods = {"retrieval_method": [RetrievalMethod.SEMANTIC_SEARCH.value]}
@ -424,18 +421,19 @@ class DatasetApi(Resource):
            raise NotFound("Dataset not found.")

        payload = DatasetUpdatePayload.model_validate(console_ns.payload or {})
+        payload_data = payload.model_dump(exclude_unset=True)
        current_user, current_tenant_id = current_account_with_tenant()
+
        # check embedding model setting
        if (
            payload.indexing_technique == "high_quality"
            and payload.embedding_model_provider is not None
            and payload.embedding_model is not None
        ):
-            is_multimodal = DatasetService.check_is_multimodal_model(
+            DatasetService.check_embedding_model_setting(
                dataset.tenant_id, payload.embedding_model_provider, payload.embedding_model
            )
-            payload.is_multimodal = is_multimodal
-        payload_data = payload.model_dump(exclude_unset=True)
+
        # The role of the current user in the ta table must be admin, owner, editor, or dataset_operator
        DatasetPermissionService.check_permission(
            current_user, dataset, payload.permission, payload.partial_member_list
--- a/api/controllers/console/datasets/datasets_document.py
+++ b/api/controllers/console/datasets/datasets_document.py
@ -424,10 +424,6 @@ class DatasetInitApi(Resource):
                    model_type=ModelType.TEXT_EMBEDDING,
                    model=knowledge_config.embedding_model,
                )
-                is_multimodal = DatasetService.check_is_multimodal_model(
-                    current_tenant_id, knowledge_config.embedding_model_provider, knowledge_config.embedding_model
-                )
-                knowledge_config.is_multimodal = is_multimodal
            except InvokeAuthorizationError:
                raise ProviderNotInitializeError(
                    "No Embedding Model available. Please configure a valid provider in the Settings -> Model Provider."
--- a/api/controllers/console/datasets/datasets_segments.py
+++ b/api/controllers/console/datasets/datasets_segments.py
@ -51,7 +51,6 @@ class SegmentCreatePayload(BaseModel):
    content: str
    answer: str | None = None
    keywords: list[str] | None = None
-    attachment_ids: list[str] | None = None


 class SegmentUpdatePayload(BaseModel):
@ -59,7 +58,6 @@ class SegmentUpdatePayload(BaseModel):
    answer: str | None = None
    keywords: list[str] | None = None
    regenerate_child_chunks: bool = False
-    attachment_ids: list[str] | None = None


 class BatchImportPayload(BaseModel):
--- a/api/controllers/console/datasets/hit_testing_base.py
+++ b/api/controllers/console/datasets/hit_testing_base.py
@ -1,7 +1,7 @@
 import logging
 from typing import Any

-from flask_restx import marshal, reqparse
+from flask_restx import marshal
 from pydantic import BaseModel, Field
 from werkzeug.exceptions import Forbidden, InternalServerError, NotFound

@ -33,7 +33,6 @@ class HitTestingPayload(BaseModel):
    query: str = Field(max_length=250)
    retrieval_model: dict[str, Any] | None = None
    external_retrieval_model: dict[str, Any] | None = None
-    attachment_ids: list[str] | None = None


 class DatasetsHitTestingBase:
@ -55,28 +54,16 @@ class DatasetsHitTestingBase:
    def hit_testing_args_check(args: dict[str, Any]):
        HitTestingService.hit_testing_args_check(args)

-    @staticmethod
-    def parse_args():
-        parser = (
-            reqparse.RequestParser()
-            .add_argument("query", type=str, required=False, location="json")
-            .add_argument("attachment_ids", type=list, required=False, location="json")
-            .add_argument("retrieval_model", type=dict, required=False, location="json")
-            .add_argument("external_retrieval_model", type=dict, required=False, location="json")
-        )
-        return parser.parse_args()
-
    @staticmethod
    def perform_hit_testing(dataset, args):
        assert isinstance(current_user, Account)
        try:
            response = HitTestingService.retrieve(
                dataset=dataset,
-                query=args.get("query"),
+                query=args["query"],
                account=current_user,
-                retrieval_model=args.get("retrieval_model"),
-                external_retrieval_model=args.get("external_retrieval_model"),
-                attachment_ids=args.get("attachment_ids"),
+                retrieval_model=args["retrieval_model"],
+                external_retrieval_model=args["external_retrieval_model"],
                limit=10,
            )
            return {"query": response["query"], "records": marshal(response["records"], hit_testing_record_fields)}
--- a/api/controllers/console/datasets/rag_pipeline/datasource_content_preview.py
+++ b/api/controllers/console/datasets/rag_pipeline/datasource_content_preview.py
@ -26,7 +26,7 @@ console_ns.schema_model(Parser.__name__, Parser.model_json_schema(ref_template=D

@console_ns.route("/rag/pipelines/<uuid:pipeline_id>/workflows/published/datasource/nodes/<string:node_id>/preview")
 class DataSourceContentPreviewApi(Resource):
-    @console_ns.expect(console_ns.models[Parser.__name__])
+    @console_ns.expect(console_ns.models[Parser.__name__], validate=True)
    @setup_required
    @login_required
    @account_initialization_required
--- a/api/controllers/console/datasets/rag_pipeline/rag_pipeline_workflow.py
+++ b/api/controllers/console/datasets/rag_pipeline/rag_pipeline_workflow.py
@ -4,7 +4,7 @@ from typing import Any, Literal, cast
 from uuid import UUID

 from flask import abort, request
-from flask_restx import Resource, marshal_with, reqparse  # type: ignore
+from flask_restx import Resource, marshal_with  # type: ignore
 from pydantic import BaseModel, Field
 from sqlalchemy.orm import Session
 from werkzeug.exceptions import Forbidden, InternalServerError, NotFound
@ -975,11 +975,6 @@ class RagPipelineRecommendedPluginApi(Resource):
    @login_required
    @account_initialization_required
    def get(self):
-        parser = reqparse.RequestParser()
-        parser.add_argument("type", type=str, location="args", required=False, default="all")
-        args = parser.parse_args()
-        type = args["type"]
-
        rag_pipeline_service = RagPipelineService()
-        recommended_plugins = rag_pipeline_service.get_recommended_plugins(type)
+        recommended_plugins = rag_pipeline_service.get_recommended_plugins()
        return recommended_plugins
--- a/api/controllers/console/explore/completion.py
+++ b/api/controllers/console/explore/completion.py
@ -2,7 +2,7 @@ import logging
 from typing import Any, Literal
 from uuid import UUID

-from pydantic import BaseModel, Field, field_validator
+from pydantic import BaseModel, Field
 from werkzeug.exceptions import InternalServerError, NotFound

 import services
@ -52,24 +52,10 @@ class ChatMessagePayload(BaseModel):
    inputs: dict[str, Any]
    query: str
    files: list[dict[str, Any]] | None = None
-    conversation_id: str | None = None
-    parent_message_id: str | None = None
+    conversation_id: UUID | None = None
+    parent_message_id: UUID | None = None
    retriever_from: str = Field(default="explore_app")

-    @field_validator("conversation_id", "parent_message_id", mode="before")
-    @classmethod
-    def normalize_uuid(cls, value: str | UUID | None) -> str | None:
-        """
-        Accept blank IDs and validate UUID format when provided.
-        """
-        if not value:
-            return None
-
-        try:
-            return helper.uuid_value(value)
-        except ValueError as exc:
-            raise ValueError("must be a valid UUID") from exc
-

 register_schema_models(console_ns, CompletionMessagePayload, ChatMessagePayload)

--- a/api/controllers/console/explore/conversation.py
+++ b/api/controllers/console/explore/conversation.py
@ -3,7 +3,7 @@ from uuid import UUID

 from flask import request
 from flask_restx import marshal_with
-from pydantic import BaseModel, Field, model_validator
+from pydantic import BaseModel, Field
 from sqlalchemy.orm import Session
 from werkzeug.exceptions import NotFound

@ -30,16 +30,9 @@ class ConversationListQuery(BaseModel):


 class ConversationRenamePayload(BaseModel):
-    name: str | None = None
+    name: str
    auto_generate: bool = False

-    @model_validator(mode="after")
-    def validate_name_requirement(self):
-        if not self.auto_generate:
-            if self.name is None or not self.name.strip():
-                raise ValueError("name is required when auto_generate is false")
-        return self
-

 register_schema_models(console_ns, ConversationListQuery, ConversationRenamePayload)

--- a/api/controllers/console/files.py
+++ b/api/controllers/console/files.py
@ -45,9 +45,6 @@ class FileApi(Resource):
            "video_file_size_limit": dify_config.UPLOAD_VIDEO_FILE_SIZE_LIMIT,
            "audio_file_size_limit": dify_config.UPLOAD_AUDIO_FILE_SIZE_LIMIT,
            "workflow_file_upload_limit": dify_config.WORKFLOW_FILE_UPLOAD_LIMIT,
-            "image_file_batch_limit": dify_config.IMAGE_FILE_BATCH_LIMIT,
-            "single_chunk_attachment_limit": dify_config.SINGLE_CHUNK_ATTACHMENT_LIMIT,
-            "attachment_image_file_size_limit": dify_config.ATTACHMENT_IMAGE_FILE_SIZE_LIMIT,
        }, 200

    @setup_required
--- a/api/controllers/console/workspace/models.py
+++ b/api/controllers/console/workspace/models.py
@ -230,7 +230,7 @@ class ModelProviderModelApi(Resource):

        return {"result": "success"}, 200

-    @console_ns.expect(console_ns.models[ParserDeleteModels.__name__])
+    @console_ns.expect(console_ns.models[ParserDeleteModels.__name__], validate=True)
    @setup_required
    @login_required
    @is_admin_or_owner_required
@ -282,10 +282,9 @@ class ModelProviderModelCredentialApi(Resource):
                tenant_id=tenant_id, provider_name=provider
            )
        else:
-            # Normalize model_type to the origin value stored in DB (e.g., "text-generation" for LLM)
-            normalized_model_type = args.model_type.to_origin_model_type()
+            model_type = args.model_type
            available_credentials = model_provider_service.provider_manager.get_provider_model_available_credentials(
-                tenant_id=tenant_id, provider_name=provider, model_type=normalized_model_type, model_name=args.model
+                tenant_id=tenant_id, provider_name=provider, model_type=model_type, model_name=args.model
            )

        return jsonable_encoder(
--- a/api/controllers/console/workspace/plugin.py
+++ b/api/controllers/console/workspace/plugin.py
@ -46,8 +46,8 @@ class PluginDebuggingKeyApi(Resource):


 class ParserList(BaseModel):
-    page: int = Field(default=1, ge=1, description="Page number")
-    page_size: int = Field(default=256, ge=1, le=256, description="Page size (1-256)")
+    page: int = Field(default=1)
+    page_size: int = Field(default=256)


 reg(ParserList)
@ -106,8 +106,8 @@ class ParserPluginIdentifierQuery(BaseModel):


 class ParserTasks(BaseModel):
-    page: int = Field(default=1, ge=1, description="Page number")
-    page_size: int = Field(default=256, ge=1, le=256, description="Page size (1-256)")
+    page: int
+    page_size: int


 class ParserMarketplaceUpgrade(BaseModel):
--- a/api/controllers/console/wraps.py
+++ b/api/controllers/console/wraps.py
@ -9,12 +9,10 @@ from typing import ParamSpec, TypeVar
 from flask import abort, request

 from configs import dify_config
-from controllers.console.auth.error import AuthenticationFailedError, EmailCodeError
 from controllers.console.workspace.error import AccountNotInitializedError
 from enums.cloud_plan import CloudPlan
 from extensions.ext_database import db
 from extensions.ext_redis import redis_client
-from libs.encryption import FieldEncryption
 from libs.login import current_account_with_tenant
 from models.account import AccountStatus
 from models.dataset import RateLimitLog
@ -27,14 +25,6 @@ from .error import NotInitValidateError, NotSetupError, UnauthorizedAndForceLogo
 P = ParamSpec("P")
 R = TypeVar("R")

-# Field names for decryption
-FIELD_NAME_PASSWORD = "password"
-FIELD_NAME_CODE = "code"
-
-# Error messages for decryption failures
-ERROR_MSG_INVALID_ENCRYPTED_DATA = "Invalid encrypted data"
-ERROR_MSG_INVALID_ENCRYPTED_CODE = "Invalid encrypted code"
-

 def account_initialization_required(view: Callable[P, R]):
    @wraps(view)
@ -341,163 +331,3 @@ def is_admin_or_owner_required(f: Callable[P, R]):
        return f(*args, **kwargs)

    return decorated_function
-
-
-def annotation_import_rate_limit(view: Callable[P, R]):
-    """
-    Rate limiting decorator for annotation import operations.
-
-    Implements sliding window rate limiting with two tiers:
-    - Short-term: Configurable requests per minute (default: 5)
-    - Long-term: Configurable requests per hour (default: 20)
-
-    Uses Redis ZSET for distributed rate limiting across multiple instances.
-    """
-
-    @wraps(view)
-    def decorated(*args: P.args, **kwargs: P.kwargs):
-        _, current_tenant_id = current_account_with_tenant()
-        current_time = int(time.time() * 1000)
-
-        # Check per-minute rate limit
-        minute_key = f"annotation_import_rate_limit:{current_tenant_id}:1min"
-        redis_client.zadd(minute_key, {current_time: current_time})
-        redis_client.zremrangebyscore(minute_key, 0, current_time - 60000)
-        minute_count = redis_client.zcard(minute_key)
-        redis_client.expire(minute_key, 120)  # 2 minutes TTL
-
-        if minute_count > dify_config.ANNOTATION_IMPORT_RATE_LIMIT_PER_MINUTE:
-            abort(
-                429,
-                f"Too many annotation import requests. Maximum {dify_config.ANNOTATION_IMPORT_RATE_LIMIT_PER_MINUTE} "
-                f"requests per minute allowed. Please try again later.",
-            )
-
-        # Check per-hour rate limit
-        hour_key = f"annotation_import_rate_limit:{current_tenant_id}:1hour"
-        redis_client.zadd(hour_key, {current_time: current_time})
-        redis_client.zremrangebyscore(hour_key, 0, current_time - 3600000)
-        hour_count = redis_client.zcard(hour_key)
-        redis_client.expire(hour_key, 7200)  # 2 hours TTL
-
-        if hour_count > dify_config.ANNOTATION_IMPORT_RATE_LIMIT_PER_HOUR:
-            abort(
-                429,
-                f"Too many annotation import requests. Maximum {dify_config.ANNOTATION_IMPORT_RATE_LIMIT_PER_HOUR} "
-                f"requests per hour allowed. Please try again later.",
-            )
-
-        return view(*args, **kwargs)
-
-    return decorated
-
-
-def annotation_import_concurrency_limit(view: Callable[P, R]):
-    """
-    Concurrency control decorator for annotation import operations.
-
-    Limits the number of concurrent import tasks per tenant to prevent
-    resource exhaustion and ensure fair resource allocation.
-
-    Uses Redis ZSET to track active import jobs with automatic cleanup
-    of stale entries (jobs older than 2 minutes).
-    """
-
-    @wraps(view)
-    def decorated(*args: P.args, **kwargs: P.kwargs):
-        _, current_tenant_id = current_account_with_tenant()
-        current_time = int(time.time() * 1000)
-
-        active_jobs_key = f"annotation_import_active:{current_tenant_id}"
-
-        # Clean up stale entries (jobs that should have completed or timed out)
-        stale_threshold = current_time - 120000  # 2 minutes ago
-        redis_client.zremrangebyscore(active_jobs_key, 0, stale_threshold)
-
-        # Check current active job count
-        active_count = redis_client.zcard(active_jobs_key)
-
-        if active_count >= dify_config.ANNOTATION_IMPORT_MAX_CONCURRENT:
-            abort(
-                429,
-                f"Too many concurrent import tasks. Maximum {dify_config.ANNOTATION_IMPORT_MAX_CONCURRENT} "
-                f"concurrent imports allowed per workspace. Please wait for existing imports to complete.",
-            )
-
-        # Allow the request to proceed
-        # The actual job registration will happen in the service layer
-        return view(*args, **kwargs)
-
-    return decorated
-
-
-def _decrypt_field(field_name: str, error_class: type[Exception], error_message: str) -> None:
-    """
-    Helper to decode a Base64 encoded field in the request payload.
-
-    Args:
-        field_name: Name of the field to decode
-        error_class: Exception class to raise on decoding failure
-        error_message: Error message to include in the exception
-    """
-    if not request or not request.is_json:
-        return
-    # Get the payload dict - it's cached and mutable
-    payload = request.get_json()
-    if not payload or field_name not in payload:
-        return
-    encoded_value = payload[field_name]
-    decoded_value = FieldEncryption.decrypt_field(encoded_value)
-
-    # If decoding failed, raise error immediately
-    if decoded_value is None:
-        raise error_class(error_message)
-
-    # Update payload dict in-place with decoded value
-    # Since payload is a mutable dict and get_json() returns the cached reference,
-    # modifying it will affect all subsequent accesses including console_ns.payload
-    payload[field_name] = decoded_value
-
-
-def decrypt_password_field(view: Callable[P, R]):
-    """
-    Decorator to decrypt password field in request payload.
-
-    Automatically decrypts the 'password' field if encryption is enabled.
-    If decryption fails, raises AuthenticationFailedError.
-
-    Usage:
-        @decrypt_password_field
-        def post(self):
-            args = LoginPayload.model_validate(console_ns.payload)
-            # args.password is now decrypted
-    """
-
-    @wraps(view)
-    def decorated(*args: P.args, **kwargs: P.kwargs):
-        _decrypt_field(FIELD_NAME_PASSWORD, AuthenticationFailedError, ERROR_MSG_INVALID_ENCRYPTED_DATA)
-        return view(*args, **kwargs)
-
-    return decorated
-
-
-def decrypt_code_field(view: Callable[P, R]):
-    """
-    Decorator to decrypt verification code field in request payload.
-
-    Automatically decrypts the 'code' field if encryption is enabled.
-    If decryption fails, raises EmailCodeError.
-
-    Usage:
-        @decrypt_code_field
-        def post(self):
-            args = EmailCodeLoginPayload.model_validate(console_ns.payload)
-            # args.code is now decrypted
-    """
-
-    @wraps(view)
-    def decorated(*args: P.args, **kwargs: P.kwargs):
-        _decrypt_field(FIELD_NAME_CODE, EmailCodeError, ERROR_MSG_INVALID_ENCRYPTED_CODE)
-        return view(*args, **kwargs)
-
-    return decorated
--- a/api/controllers/service_api/app/completion.py
+++ b/api/controllers/service_api/app/completion.py
@ -4,7 +4,7 @@ from uuid import UUID

 from flask import request
 from flask_restx import Resource
-from pydantic import BaseModel, Field, field_validator
+from pydantic import BaseModel, Field
 from werkzeug.exceptions import BadRequest, InternalServerError, NotFound

 import services
@ -52,26 +52,11 @@ class ChatRequestPayload(BaseModel):
    query: str
    files: list[dict[str, Any]] | None = None
    response_mode: Literal["blocking", "streaming"] | None = None
-    conversation_id: str | None = Field(default=None, description="Conversation UUID")
+    conversation_id: UUID | None = None
    retriever_from: str = Field(default="dev")
    auto_generate_name: bool = Field(default=True, description="Auto generate conversation name")
    workflow_id: str | None = Field(default=None, description="Workflow ID for advanced chat")

-    @field_validator("conversation_id", mode="before")
-    @classmethod
-    def normalize_conversation_id(cls, value: str | UUID | None) -> str | None:
-        """Allow missing or blank conversation IDs; enforce UUID format when provided."""
-        if isinstance(value, str):
-            value = value.strip()
-
-        if not value:
-            return None
-
-        try:
-            return helper.uuid_value(value)
-        except ValueError as exc:
-            raise ValueError("conversation_id must be a valid UUID") from exc
-

 register_schema_models(service_api_ns, CompletionRequestPayload, ChatRequestPayload)

--- a/api/controllers/service_api/app/conversation.py
+++ b/api/controllers/service_api/app/conversation.py
@ -4,7 +4,7 @@ from uuid import UUID
 from flask import request
 from flask_restx import Resource
 from flask_restx._http import HTTPStatus
-from pydantic import BaseModel, Field, model_validator
+from pydantic import BaseModel, Field
 from sqlalchemy.orm import Session
 from werkzeug.exceptions import BadRequest, NotFound

@ -37,16 +37,9 @@ class ConversationListQuery(BaseModel):


 class ConversationRenamePayload(BaseModel):
-    name: str | None = Field(default=None, description="New conversation name (required if auto_generate is false)")
+    name: str = Field(description="New conversation name")
    auto_generate: bool = Field(default=False, description="Auto-generate conversation name")

-    @model_validator(mode="after")
-    def validate_name_requirement(self):
-        if not self.auto_generate:
-            if self.name is None or not self.name.strip():
-                raise ValueError("name is required when auto_generate is false")
-        return self
-

 class ConversationVariablesQuery(BaseModel):
    last_id: UUID | None = Field(default=None, description="Last variable ID for pagination")
--- a/api/controllers/trigger/trigger.py
+++ b/api/controllers/trigger/trigger.py
@ -33,7 +33,7 @@ def trigger_endpoint(endpoint_id: str):
            if response:
                break
        if not response:
-            logger.info("Endpoint not found for %s", endpoint_id)
+            logger.error("Endpoint not found for {endpoint_id}")
            return jsonify({"error": "Endpoint not found"}), 404
        return response
    except ValueError as e:
--- a/api/core/app/apps/advanced_chat/generate_task_pipeline.py
+++ b/api/core/app/apps/advanced_chat/generate_task_pipeline.py
@ -62,7 +62,8 @@ from core.app.task_pipeline.message_cycle_manager import MessageCycleManager
 from core.base.tts import AppGeneratorTTSPublisher, AudioTrunk
 from core.model_runtime.entities.llm_entities import LLMUsage
 from core.model_runtime.utils.encoders import jsonable_encoder
-from core.ops.ops_trace_manager import TraceQueueManager
+from core.ops.entities.trace_entity import TraceTaskName
+from core.ops.ops_trace_manager import TraceQueueManager, TraceTask
 from core.workflow.enums import WorkflowExecutionStatus
 from core.workflow.nodes import NodeType
 from core.workflow.repositories.draft_variable_repository import DraftVariableSaverFactory
@ -72,7 +73,7 @@ from extensions.ext_database import db
 from libs.datetime_utils import naive_utc_now
 from models import Account, Conversation, EndUser, Message, MessageFile
 from models.enums import CreatorUserRole
-from models.workflow import Workflow
+from models.workflow import Workflow, WorkflowNodeExecutionModel

 logger = logging.getLogger(__name__)

@ -580,7 +581,7 @@ class AdvancedChatAppGenerateTaskPipeline(GraphRuntimeStateSupport):

            with self._database_session() as session:
                # Save message
-                self._save_message(session=session, graph_runtime_state=resolved_state)
+                self._save_message(session=session, graph_runtime_state=resolved_state, trace_manager=trace_manager)

            yield workflow_finish_resp
        elif event.stopped_by in (
@ -590,7 +591,7 @@ class AdvancedChatAppGenerateTaskPipeline(GraphRuntimeStateSupport):
            # When hitting input-moderation or annotation-reply, the workflow will not start
            with self._database_session() as session:
                # Save message
-                self._save_message(session=session)
+                self._save_message(session=session, trace_manager=trace_manager)

        yield self._message_end_to_stream_response()

@ -599,6 +600,7 @@ class AdvancedChatAppGenerateTaskPipeline(GraphRuntimeStateSupport):
        event: QueueAdvancedChatMessageEndEvent,
        *,
        graph_runtime_state: GraphRuntimeState | None = None,
+        trace_manager: TraceQueueManager | None = None,
        **kwargs,
    ) -> Generator[StreamResponse, None, None]:
        """Handle advanced chat message end events."""
@ -616,7 +618,7 @@ class AdvancedChatAppGenerateTaskPipeline(GraphRuntimeStateSupport):

        # Save message
        with self._database_session() as session:
-            self._save_message(session=session, graph_runtime_state=resolved_state)
+            self._save_message(session=session, graph_runtime_state=resolved_state, trace_manager=trace_manager)

        yield self._message_end_to_stream_response()

@ -770,7 +772,13 @@ class AdvancedChatAppGenerateTaskPipeline(GraphRuntimeStateSupport):
        if self._conversation_name_generate_thread:
            logger.debug("Conversation name generation running as daemon thread")

-    def _save_message(self, *, session: Session, graph_runtime_state: GraphRuntimeState | None = None):
+    def _save_message(
+        self,
+        *,
+        session: Session,
+        graph_runtime_state: GraphRuntimeState | None = None,
+        trace_manager: TraceQueueManager | None = None,
+    ):
        message = self._get_message(session=session)

        # If there are assistant files, remove markdown image links from answer
@ -809,6 +817,14 @@ class AdvancedChatAppGenerateTaskPipeline(GraphRuntimeStateSupport):

        metadata = self._task_state.metadata.model_dump()
        message.message_metadata = json.dumps(jsonable_encoder(metadata))
+
+        # Extract model provider and model_id from workflow node executions for tracing
+        if message.workflow_run_id:
+            model_info = self._extract_model_info_from_workflow(session, message.workflow_run_id)
+            if model_info:
+                message.model_provider = model_info.get("provider")
+                message.model_id = model_info.get("model")
+
        message_files = [
            MessageFile(
                message_id=message.id,
@ -826,6 +842,68 @@ class AdvancedChatAppGenerateTaskPipeline(GraphRuntimeStateSupport):
        ]
        session.add_all(message_files)

+        # Trigger MESSAGE_TRACE for tracing integrations
+        if trace_manager:
+            trace_manager.add_trace_task(
+                TraceTask(
+                    TraceTaskName.MESSAGE_TRACE, conversation_id=self._conversation_id, message_id=self._message_id
+                )
+            )
+
+    def _extract_model_info_from_workflow(self, session: Session, workflow_run_id: str) -> dict[str, str] | None:
+        """
+        Extract model provider and model_id from workflow node executions.
+        Returns dict with 'provider' and 'model' keys, or None if not found.
+        """
+        try:
+            # Query workflow node executions for LLM or Agent nodes
+            stmt = (
+                select(WorkflowNodeExecutionModel)
+                .where(WorkflowNodeExecutionModel.workflow_run_id == workflow_run_id)
+                .where(WorkflowNodeExecutionModel.node_type.in_(["llm", "agent"]))
+                .order_by(WorkflowNodeExecutionModel.created_at.desc())
+                .limit(1)
+            )
+            node_execution = session.scalar(stmt)
+
+            if not node_execution:
+                return None
+
+            # Try to extract from execution_metadata for agent nodes
+            if node_execution.execution_metadata:
+                try:
+                    metadata = json.loads(node_execution.execution_metadata)
+                    agent_log = metadata.get("agent_log", [])
+                    # Look for the first agent thought with provider info
+                    for log_entry in agent_log:
+                        entry_metadata = log_entry.get("metadata", {})
+                        provider_str = entry_metadata.get("provider")
+                        if provider_str:
+                            # Parse format like "langgenius/deepseek/deepseek"
+                            parts = provider_str.split("/")
+                            if len(parts) >= 3:
+                                return {"provider": parts[1], "model": parts[2]}
+                            elif len(parts) == 2:
+                                return {"provider": parts[0], "model": parts[1]}
+                except (json.JSONDecodeError, KeyError, AttributeError) as e:
+                    logger.debug("Failed to parse execution_metadata: %s", e)
+
+            # Try to extract from process_data for llm nodes
+            if node_execution.process_data:
+                try:
+                    process_data = json.loads(node_execution.process_data)
+                    provider = process_data.get("model_provider")
+                    model = process_data.get("model_name")
+                    if provider and model:
+                        return {"provider": provider, "model": model}
+                except (json.JSONDecodeError, KeyError) as e:
+                    logger.debug("Failed to parse process_data: %s", e)
+
+            return None
+        except Exception as e:
+            logger.warning("Failed to extract model info from workflow: %s", e)
+            return None
+
    def _seed_graph_runtime_state_from_queue_manager(self) -> None:
        """Bootstrap the cached runtime state from the queue manager when present."""
        candidate = self._base_task_pipeline.queue_manager.graph_runtime_state
--- a/api/core/app/apps/base_app_runner.py
+++ b/api/core/app/apps/base_app_runner.py
@ -83,7 +83,6 @@ class AppRunner:
        context: str | None = None,
        memory: TokenBufferMemory | None = None,
        image_detail_config: ImagePromptMessageContent.DETAIL | None = None,
-        context_files: list["File"] | None = None,
    ) -> tuple[list[PromptMessage], list[str] | None]:
        """
        Organize prompt messages
@ -112,7 +111,6 @@ class AppRunner:
                memory=memory,
                model_config=model_config,
                image_detail_config=image_detail_config,
-                context_files=context_files,
            )
        else:
            memory_config = MemoryConfig(window=MemoryConfig.WindowConfig(enabled=False))
--- a/api/core/app/apps/chat/app_runner.py
+++ b/api/core/app/apps/chat/app_runner.py
@ -11,7 +11,6 @@ from core.app.entities.app_invoke_entities import (
 )
 from core.app.entities.queue_entities import QueueAnnotationReplyEvent
 from core.callback_handler.index_tool_callback_handler import DatasetIndexToolCallbackHandler
-from core.file import File
 from core.memory.token_buffer_memory import TokenBufferMemory
 from core.model_manager import ModelInstance
 from core.model_runtime.entities.message_entities import ImagePromptMessageContent
@ -147,7 +146,6 @@ class ChatAppRunner(AppRunner):

        # get context from datasets
        context = None
-        context_files: list[File] = []
        if app_config.dataset and app_config.dataset.dataset_ids:
            hit_callback = DatasetIndexToolCallbackHandler(
                queue_manager,
@ -158,7 +156,7 @@ class ChatAppRunner(AppRunner):
            )

            dataset_retrieval = DatasetRetrieval(application_generate_entity)
-            context, retrieved_files = dataset_retrieval.retrieve(
+            context = dataset_retrieval.retrieve(
                app_id=app_record.id,
                user_id=application_generate_entity.user_id,
                tenant_id=app_record.tenant_id,
@ -173,11 +171,7 @@ class ChatAppRunner(AppRunner):
                memory=memory,
                message_id=message.id,
                inputs=inputs,
-                vision_enabled=application_generate_entity.app_config.app_model_config_dict.get("file_upload", {}).get(
-                    "enabled", False
-                ),
            )
-            context_files = retrieved_files or []

        # reorganize all inputs and template to prompt messages
        # Include: prompt template, inputs, query(optional), files(optional)
@ -192,7 +186,6 @@ class ChatAppRunner(AppRunner):
            context=context,
            memory=memory,
            image_detail_config=image_detail_config,
-            context_files=context_files,
        )

        # check hosting moderation
--- a/api/core/app/apps/completion/app_runner.py
+++ b/api/core/app/apps/completion/app_runner.py
@ -10,7 +10,6 @@ from core.app.entities.app_invoke_entities import (
    CompletionAppGenerateEntity,
 )
 from core.callback_handler.index_tool_callback_handler import DatasetIndexToolCallbackHandler
-from core.file import File
 from core.model_manager import ModelInstance
 from core.model_runtime.entities.message_entities import ImagePromptMessageContent
 from core.moderation.base import ModerationError
@ -103,7 +102,6 @@ class CompletionAppRunner(AppRunner):

        # get context from datasets
        context = None
-        context_files: list[File] = []
        if app_config.dataset and app_config.dataset.dataset_ids:
            hit_callback = DatasetIndexToolCallbackHandler(
                queue_manager,
@ -118,7 +116,7 @@ class CompletionAppRunner(AppRunner):
                query = inputs.get(dataset_config.retrieve_config.query_variable, "")

            dataset_retrieval = DatasetRetrieval(application_generate_entity)
-            context, retrieved_files = dataset_retrieval.retrieve(
+            context = dataset_retrieval.retrieve(
                app_id=app_record.id,
                user_id=application_generate_entity.user_id,
                tenant_id=app_record.tenant_id,
@ -132,11 +130,7 @@ class CompletionAppRunner(AppRunner):
                hit_callback=hit_callback,
                message_id=message.id,
                inputs=inputs,
-                vision_enabled=application_generate_entity.app_config.app_model_config_dict.get("file_upload", {}).get(
-                    "enabled", False
-                ),
            )
-            context_files = retrieved_files or []

        # reorganize all inputs and template to prompt messages
        # Include: prompt template, inputs, query(optional), files(optional)
@ -150,7 +144,6 @@ class CompletionAppRunner(AppRunner):
            query=query,
            context=context,
            image_detail_config=image_detail_config,
-            context_files=context_files,
        )

        # check hosting moderation
--- a/api/core/app/entities/task_entities.py
+++ b/api/core/app/entities/task_entities.py
@ -40,6 +40,9 @@ class EasyUITaskState(TaskState):
    """

    llm_result: LLMResult
+    first_token_time: float | None = None
+    last_token_time: float | None = None
+    is_streaming_response: bool = False


 class WorkflowTaskState(TaskState):
--- a/api/core/app/task_pipeline/easy_ui_based_generate_task_pipeline.py
+++ b/api/core/app/task_pipeline/easy_ui_based_generate_task_pipeline.py
@ -332,6 +332,12 @@ class EasyUIBasedGenerateTaskPipeline(BasedGenerateTaskPipeline):
                if not self._task_state.llm_result.prompt_messages:
                    self._task_state.llm_result.prompt_messages = chunk.prompt_messages

+                # Track streaming response times
+                if self._task_state.first_token_time is None:
+                    self._task_state.first_token_time = time.perf_counter()
+                    self._task_state.is_streaming_response = True
+                self._task_state.last_token_time = time.perf_counter()
+
                # handle output moderation chunk
                should_direct_answer = self._handle_output_moderation_chunk(cast(str, delta_text))
                if should_direct_answer:
@ -398,6 +404,18 @@ class EasyUIBasedGenerateTaskPipeline(BasedGenerateTaskPipeline):
        message.total_price = usage.total_price
        message.currency = usage.currency
        self._task_state.llm_result.usage.latency = message.provider_response_latency
+
+        # Add streaming metrics to usage if available
+        if self._task_state.is_streaming_response and self._task_state.first_token_time:
+            start_time = self.start_at
+            first_token_time = self._task_state.first_token_time
+            last_token_time = self._task_state.last_token_time or first_token_time
+            usage.time_to_first_token = round(first_token_time - start_time, 3)
+            usage.time_to_generate = round(last_token_time - first_token_time, 3)
+
+        # Update metadata with the complete usage info
+        self._task_state.metadata.usage = usage
+
        message.message_metadata = self._task_state.metadata.model_dump_json()

        if trace_manager:
--- a/api/core/callback_handler/index_tool_callback_handler.py
+++ b/api/core/callback_handler/index_tool_callback_handler.py
@ -7,7 +7,7 @@ from core.app.apps.base_app_queue_manager import AppQueueManager, PublishFrom
 from core.app.entities.app_invoke_entities import InvokeFrom
 from core.app.entities.queue_entities import QueueRetrieverResourcesEvent
 from core.rag.entities.citation_metadata import RetrievalSourceMetadata
-from core.rag.index_processor.constant.index_type import IndexStructureType
+from core.rag.index_processor.constant.index_type import IndexType
 from core.rag.models.document import Document
 from extensions.ext_database import db
 from models.dataset import ChildChunk, DatasetQuery, DocumentSegment
@ -59,7 +59,7 @@ class DatasetIndexToolCallbackHandler:
                        document_id,
                    )
                    continue
-                if dataset_document.doc_form == IndexStructureType.PARENT_CHILD_INDEX:
+                if dataset_document.doc_form == IndexType.PARENT_CHILD_INDEX:
                    child_chunk_stmt = select(ChildChunk).where(
                        ChildChunk.index_node_id == document.metadata["doc_id"],
                        ChildChunk.dataset_id == dataset_document.dataset_id,
--- a/api/core/db/session_factory.py
+++ b/api/core/db/session_factory.py
@ -1,38 +0,0 @@
-from sqlalchemy import Engine
-from sqlalchemy.orm import Session, sessionmaker
-
-_session_maker: sessionmaker | None = None
-
-
-def configure_session_factory(engine: Engine, expire_on_commit: bool = False):
-    """Configure the global session factory"""
-    global _session_maker
-    _session_maker = sessionmaker(bind=engine, expire_on_commit=expire_on_commit)
-
-
-def get_session_maker() -> sessionmaker:
-    if _session_maker is None:
-        raise RuntimeError("Session factory not configured. Call configure_session_factory() first.")
-    return _session_maker
-
-
-def create_session() -> Session:
-    return get_session_maker()()
-
-
-# Class wrapper for convenience
-class SessionFactory:
-    @staticmethod
-    def configure(engine: Engine, expire_on_commit: bool = False):
-        configure_session_factory(engine, expire_on_commit)
-
-    @staticmethod
-    def get_session_maker() -> sessionmaker:
-        return get_session_maker()
-
-    @staticmethod
-    def create_session() -> Session:
-        return create_session()
-
-
-session_factory = SessionFactory()
--- a/api/core/entities/knowledge_entities.py
+++ b/api/core/entities/knowledge_entities.py
@ -1,4 +1,4 @@
-from pydantic import BaseModel, Field, field_validator
+from pydantic import BaseModel


 class PreviewDetail(BaseModel):
@ -20,17 +20,9 @@ class IndexingEstimate(BaseModel):
 class PipelineDataset(BaseModel):
    id: str
    name: str
-    description: str = Field(default="", description="knowledge dataset description")
+    description: str
    chunk_structure: str

-    @field_validator("description", mode="before")
-    @classmethod
-    def normalize_description(cls, value: str | None) -> str:
-        """Coerce None to empty string so description is always a string."""
-        if value is None:
-            return ""
-        return value
-

 class PipelineDocument(BaseModel):
    id: str
--- a/api/core/entities/mcp_provider.py
+++ b/api/core/entities/mcp_provider.py
@ -213,23 +213,12 @@ class MCPProviderEntity(BaseModel):
        return None

    def retrieve_tokens(self) -> OAuthTokens | None:
-        """Retrieve OAuth tokens if authentication is complete.
-
-        Returns:
-            OAuthTokens if the provider has been authenticated, None otherwise.
-        """
+        """OAuth tokens if available"""
        if not self.credentials:
            return None
        credentials = self.decrypt_credentials()
-        access_token = credentials.get("access_token", "")
-        # Return None if access_token is empty to avoid generating invalid "Authorization: Bearer " header.
-        # Note: We don't check for whitespace-only strings here because:
-        # 1. OAuth servers don't return whitespace-only access tokens in practice
-        # 2. Even if they did, the server would return 401, triggering the OAuth flow correctly
-        if not access_token:
-            return None
        return OAuthTokens(
-            access_token=access_token,
+            access_token=credentials.get("access_token", ""),
            token_type=credentials.get("token_type", DEFAULT_TOKEN_TYPE),
            expires_in=int(credentials.get("expires_in", str(DEFAULT_EXPIRES_IN)) or DEFAULT_EXPIRES_IN),
            refresh_token=credentials.get("refresh_token", ""),
--- a/api/core/helper/csv_sanitizer.py
+++ b/api/core/helper/csv_sanitizer.py
@ -1,89 +0,0 @@
-"""CSV sanitization utilities to prevent formula injection attacks."""
-
-from typing import Any
-
-
-class CSVSanitizer:
-    """
-    Sanitizer for CSV export to prevent formula injection attacks.
-
-    This class provides methods to sanitize data before CSV export by escaping
-    characters that could be interpreted as formulas by spreadsheet applications
-    (Excel, LibreOffice, Google Sheets).
-
-    Formula injection occurs when user-controlled data starting with special
-    characters (=, +, -, @, tab, carriage return) is exported to CSV and opened
-    in a spreadsheet application, potentially executing malicious commands.
-    """
-
-    # Characters that can start a formula in Excel/LibreOffice/Google Sheets
-    FORMULA_CHARS = frozenset({"=", "+", "-", "@", "\t", "\r"})
-
-    @classmethod
-    def sanitize_value(cls, value: Any) -> str:
-        """
-        Sanitize a value for safe CSV export.
-
-        Prefixes formula-initiating characters with a single quote to prevent
-        Excel/LibreOffice/Google Sheets from treating them as formulas.
-
-        Args:
-            value: The value to sanitize (will be converted to string)
-
-        Returns:
-            Sanitized string safe for CSV export
-
-        Examples:
-            >>> CSVSanitizer.sanitize_value("=1+1")
-            "'=1+1"
-            >>> CSVSanitizer.sanitize_value("Hello World")
-            "Hello World"
-            >>> CSVSanitizer.sanitize_value(None)
-            ""
-        """
-        if value is None:
-            return ""
-
-        # Convert to string
-        str_value = str(value)
-
-        # If empty, return as is
-        if not str_value:
-            return ""
-
-        # Check if first character is a formula initiator
-        if str_value[0] in cls.FORMULA_CHARS:
-            # Prefix with single quote to escape
-            return f"'{str_value}"
-
-        return str_value
-
-    @classmethod
-    def sanitize_dict(cls, data: dict[str, Any], fields_to_sanitize: list[str] | None = None) -> dict[str, Any]:
-        """
-        Sanitize specified fields in a dictionary.
-
-        Args:
-            data: Dictionary containing data to sanitize
-            fields_to_sanitize: List of field names to sanitize.
-                               If None, sanitizes all string fields.
-
-        Returns:
-            Dictionary with sanitized values (creates a shallow copy)
-
-        Examples:
-            >>> data = {"question": "=1+1", "answer": "+calc", "id": "123"}
-            >>> CSVSanitizer.sanitize_dict(data, ["question", "answer"])
-            {"question": "'=1+1", "answer": "'+calc", "id": "123"}
-        """
-        sanitized = data.copy()
-
-        if fields_to_sanitize is None:
-            # Sanitize all string fields
-            fields_to_sanitize = [k for k, v in data.items() if isinstance(v, str)]
-
-        for field in fields_to_sanitize:
-            if field in sanitized:
-                sanitized[field] = cls.sanitize_value(sanitized[field])
-
-        return sanitized
--- a/api/core/helper/ssrf_proxy.py
+++ b/api/core/helper/ssrf_proxy.py
@ -9,7 +9,6 @@ import httpx

 from configs import dify_config
 from core.helper.http_client_pooling import get_pooled_http_client
-from core.tools.errors import ToolSSRFError

 logger = logging.getLogger(__name__)

@ -94,18 +93,6 @@ def make_request(method, url, max_retries=SSRF_DEFAULT_MAX_RETRIES, **kwargs):
    while retries <= max_retries:
        try:
            response = client.request(method=method, url=url, **kwargs)
-            # Check for SSRF protection by Squid proxy
-            if response.status_code in (401, 403):
-                # Check if this is a Squid SSRF rejection
-                server_header = response.headers.get("server", "").lower()
-                via_header = response.headers.get("via", "").lower()
-
-                # Squid typically identifies itself in Server or Via headers
-                if "squid" in server_header or "squid" in via_header:
-                    raise ToolSSRFError(
-                        f"Access to '{url}' was blocked by SSRF protection. "
-                        f"The URL may point to a private or local network address. "
-                    )

            if response.status_code not in STATUS_FORCELIST:
                return response
--- a/api/core/indexing_runner.py
+++ b/api/core/indexing_runner.py
@ -7,7 +7,7 @@ import time
 import uuid
 from typing import Any

-from flask import Flask, current_app
+from flask import current_app
 from sqlalchemy import select
 from sqlalchemy.orm.exc import ObjectDeletedError

@ -21,7 +21,7 @@ from core.rag.datasource.keyword.keyword_factory import Keyword
 from core.rag.docstore.dataset_docstore import DatasetDocumentStore
 from core.rag.extractor.entity.datasource_type import DatasourceType
 from core.rag.extractor.entity.extract_setting import ExtractSetting, NotionInfo, WebsiteInfo
-from core.rag.index_processor.constant.index_type import IndexStructureType
+from core.rag.index_processor.constant.index_type import IndexType
 from core.rag.index_processor.index_processor_base import BaseIndexProcessor
 from core.rag.index_processor.index_processor_factory import IndexProcessorFactory
 from core.rag.models.document import ChildDocument, Document
@ -36,7 +36,6 @@ from extensions.ext_redis import redis_client
 from extensions.ext_storage import storage
 from libs import helper
 from libs.datetime_utils import naive_utc_now
-from models import Account
 from models.dataset import ChildChunk, Dataset, DatasetProcessRule, DocumentSegment
 from models.dataset import Document as DatasetDocument
 from models.model import UploadFile
@ -90,17 +89,8 @@ class IndexingRunner:
                text_docs = self._extract(index_processor, requeried_document, processing_rule.to_dict())

                # transform
-                current_user = db.session.query(Account).filter_by(id=requeried_document.created_by).first()
-                if not current_user:
-                    raise ValueError("no current user found")
-                current_user.set_tenant_id(dataset.tenant_id)
                documents = self._transform(
-                    index_processor,
-                    dataset,
-                    text_docs,
-                    requeried_document.doc_language,
-                    processing_rule.to_dict(),
-                    current_user=current_user,
+                    index_processor, dataset, text_docs, requeried_document.doc_language, processing_rule.to_dict()
                )
                # save segment
                self._load_segments(dataset, requeried_document, documents)
@ -146,7 +136,7 @@ class IndexingRunner:

            for document_segment in document_segments:
                db.session.delete(document_segment)
-                if requeried_document.doc_form == IndexStructureType.PARENT_CHILD_INDEX:
+                if requeried_document.doc_form == IndexType.PARENT_CHILD_INDEX:
                    # delete child chunks
                    db.session.query(ChildChunk).where(ChildChunk.segment_id == document_segment.id).delete()
            db.session.commit()
@ -162,17 +152,8 @@ class IndexingRunner:
            text_docs = self._extract(index_processor, requeried_document, processing_rule.to_dict())

            # transform
-            current_user = db.session.query(Account).filter_by(id=requeried_document.created_by).first()
-            if not current_user:
-                raise ValueError("no current user found")
-            current_user.set_tenant_id(dataset.tenant_id)
            documents = self._transform(
-                index_processor,
-                dataset,
-                text_docs,
-                requeried_document.doc_language,
-                processing_rule.to_dict(),
-                current_user=current_user,
+                index_processor, dataset, text_docs, requeried_document.doc_language, processing_rule.to_dict()
            )
            # save segment
            self._load_segments(dataset, requeried_document, documents)
@ -228,7 +209,7 @@ class IndexingRunner:
                                "dataset_id": document_segment.dataset_id,
                            },
                        )
-                        if requeried_document.doc_form == IndexStructureType.PARENT_CHILD_INDEX:
+                        if requeried_document.doc_form == IndexType.PARENT_CHILD_INDEX:
                            child_chunks = document_segment.get_child_chunks()
                            if child_chunks:
                                child_documents = []
@ -321,7 +302,6 @@ class IndexingRunner:
            text_docs = index_processor.extract(extract_setting, process_rule_mode=tmp_processing_rule["mode"])
            documents = index_processor.transform(
                text_docs,
-                current_user=None,
                embedding_model_instance=embedding_model_instance,
                process_rule=processing_rule.to_dict(),
                tenant_id=tenant_id,
@ -571,10 +551,7 @@ class IndexingRunner:
        indexing_start_at = time.perf_counter()
        tokens = 0
        create_keyword_thread = None
-        if (
-            dataset_document.doc_form != IndexStructureType.PARENT_CHILD_INDEX
-            and dataset.indexing_technique == "economy"
-        ):
+        if dataset_document.doc_form != IndexType.PARENT_CHILD_INDEX and dataset.indexing_technique == "economy":
            # create keyword index
            create_keyword_thread = threading.Thread(
                target=self._process_keyword_index,
@ -613,7 +590,7 @@ class IndexingRunner:
                for future in futures:
                    tokens += future.result()
        if (
-            dataset_document.doc_form != IndexStructureType.PARENT_CHILD_INDEX
+            dataset_document.doc_form != IndexType.PARENT_CHILD_INDEX
            and dataset.indexing_technique == "economy"
            and create_keyword_thread is not None
        ):
@ -658,13 +635,7 @@ class IndexingRunner:
                db.session.commit()

    def _process_chunk(
-        self,
-        flask_app: Flask,
-        index_processor: BaseIndexProcessor,
-        chunk_documents: list[Document],
-        dataset: Dataset,
-        dataset_document: DatasetDocument,
-        embedding_model_instance: ModelInstance | None,
+        self, flask_app, index_processor, chunk_documents, dataset, dataset_document, embedding_model_instance
    ):
        with flask_app.app_context():
            # check document is paused
@ -675,15 +646,8 @@ class IndexingRunner:
                page_content_list = [document.page_content for document in chunk_documents]
                tokens += sum(embedding_model_instance.get_text_embedding_num_tokens(page_content_list))

-            multimodal_documents = []
-            for document in chunk_documents:
-                if document.attachments and dataset.is_multimodal:
-                    multimodal_documents.extend(document.attachments)
-
            # load index
-            index_processor.load(
-                dataset, chunk_documents, multimodal_documents=multimodal_documents, with_keywords=False
-            )
+            index_processor.load(dataset, chunk_documents, with_keywords=False)

            document_ids = [document.metadata["doc_id"] for document in chunk_documents]
            db.session.query(DocumentSegment).where(
@ -746,7 +710,6 @@ class IndexingRunner:
        text_docs: list[Document],
        doc_language: str,
        process_rule: dict,
-        current_user: Account | None = None,
    ) -> list[Document]:
        # get embedding model instance
        embedding_model_instance = None
@ -766,7 +729,6 @@ class IndexingRunner:

        documents = index_processor.transform(
            text_docs,
-            current_user,
            embedding_model_instance=embedding_model_instance,
            process_rule=process_rule,
            tenant_id=dataset.tenant_id,
@ -775,16 +737,14 @@ class IndexingRunner:

        return documents

-    def _load_segments(self, dataset: Dataset, dataset_document: DatasetDocument, documents: list[Document]):
+    def _load_segments(self, dataset, dataset_document, documents):
        # save node to document segment
        doc_store = DatasetDocumentStore(
            dataset=dataset, user_id=dataset_document.created_by, document_id=dataset_document.id
        )

        # add document segments
-        doc_store.add_documents(
-            docs=documents, save_child=dataset_document.doc_form == IndexStructureType.PARENT_CHILD_INDEX
-        )
+        doc_store.add_documents(docs=documents, save_child=dataset_document.doc_form == IndexType.PARENT_CHILD_INDEX)

        # update document status to indexing
        cur_time = naive_utc_now()
--- a/api/core/llm_generator/llm_generator.py
+++ b/api/core/llm_generator/llm_generator.py
@ -72,22 +72,15 @@ class LLMGenerator:
                prompt_messages=list(prompts), model_parameters={"max_tokens": 500, "temperature": 1}, stream=False
            )
        answer = cast(str, response.message.content)
-        if answer is None:
+        cleaned_answer = re.sub(r"^.*(\{.*\}).*$", r"\1", answer, flags=re.DOTALL)
+        if cleaned_answer is None:
            return ""
        try:
-            result_dict = json.loads(answer)
+            result_dict = json.loads(cleaned_answer)
+            answer = result_dict["Your Output"]
        except json.JSONDecodeError:
-            result_dict = json_repair.loads(answer)
-
-        if not isinstance(result_dict, dict):
+            logger.exception("Failed to generate name after answer, use query instead")
            answer = query
-        else:
-            output = result_dict.get("Your Output")
-            if isinstance(output, str) and output.strip():
-                answer = output.strip()
-            else:
-                answer = query
-
        name = answer.strip()

        if len(name) > 75:
@ -561,16 +554,11 @@ class LLMGenerator:
                prompt_messages=list(prompt_messages), model_parameters=model_parameters, stream=False
            )

-            generated_raw = response.message.get_text_content()
+            generated_raw = cast(str, response.message.content)
            first_brace = generated_raw.find("{")
            last_brace = generated_raw.rfind("}")
-            if first_brace == -1 or last_brace == -1 or last_brace < first_brace:
-                raise ValueError(f"Could not find a valid JSON object in response: {generated_raw}")
-            json_str = generated_raw[first_brace : last_brace + 1]
-            data = json_repair.loads(json_str)
-            if not isinstance(data, dict):
-                raise TypeError(f"Expected a JSON object, but got {type(data).__name__}")
-            return data
+            return {**json.loads(generated_raw[first_brace : last_brace + 1])}
+
        except InvokeError as e:
            error = str(e)
            return {"error": f"Failed to generate code. Error: {error}"}
--- a/api/core/model_manager.py
+++ b/api/core/model_manager.py
@ -10,9 +10,9 @@ from core.errors.error import ProviderTokenNotInitError
 from core.model_runtime.callbacks.base_callback import Callback
 from core.model_runtime.entities.llm_entities import LLMResult
 from core.model_runtime.entities.message_entities import PromptMessage, PromptMessageTool
-from core.model_runtime.entities.model_entities import ModelFeature, ModelType
+from core.model_runtime.entities.model_entities import ModelType
 from core.model_runtime.entities.rerank_entities import RerankResult
-from core.model_runtime.entities.text_embedding_entities import EmbeddingResult
+from core.model_runtime.entities.text_embedding_entities import TextEmbeddingResult
 from core.model_runtime.errors.invoke import InvokeAuthorizationError, InvokeConnectionError, InvokeRateLimitError
 from core.model_runtime.model_providers.__base.large_language_model import LargeLanguageModel
 from core.model_runtime.model_providers.__base.moderation_model import ModerationModel
@ -200,7 +200,7 @@ class ModelInstance:

    def invoke_text_embedding(
        self, texts: list[str], user: str | None = None, input_type: EmbeddingInputType = EmbeddingInputType.DOCUMENT
-    ) -> EmbeddingResult:
+    ) -> TextEmbeddingResult:
        """
        Invoke large language model

@ -212,7 +212,7 @@ class ModelInstance:
        if not isinstance(self.model_type_instance, TextEmbeddingModel):
            raise Exception("Model type instance is not TextEmbeddingModel")
        return cast(
-            EmbeddingResult,
+            TextEmbeddingResult,
            self._round_robin_invoke(
                function=self.model_type_instance.invoke,
                model=self.model,
@ -223,34 +223,6 @@ class ModelInstance:
            ),
        )

-    def invoke_multimodal_embedding(
-        self,
-        multimodel_documents: list[dict],
-        user: str | None = None,
-        input_type: EmbeddingInputType = EmbeddingInputType.DOCUMENT,
-    ) -> EmbeddingResult:
-        """
-        Invoke large language model
-
-        :param multimodel_documents: multimodel documents to embed
-        :param user: unique user id
-        :param input_type: input type
-        :return: embeddings result
-        """
-        if not isinstance(self.model_type_instance, TextEmbeddingModel):
-            raise Exception("Model type instance is not TextEmbeddingModel")
-        return cast(
-            EmbeddingResult,
-            self._round_robin_invoke(
-                function=self.model_type_instance.invoke,
-                model=self.model,
-                credentials=self.credentials,
-                multimodel_documents=multimodel_documents,
-                user=user,
-                input_type=input_type,
-            ),
-        )
-
    def get_text_embedding_num_tokens(self, texts: list[str]) -> list[int]:
        """
        Get number of tokens for text embedding
@ -304,40 +276,6 @@ class ModelInstance:
            ),
        )

-    def invoke_multimodal_rerank(
-        self,
-        query: dict,
-        docs: list[dict],
-        score_threshold: float | None = None,
-        top_n: int | None = None,
-        user: str | None = None,
-    ) -> RerankResult:
-        """
-        Invoke rerank model
-
-        :param query: search query
-        :param docs: docs for reranking
-        :param score_threshold: score threshold
-        :param top_n: top n
-        :param user: unique user id
-        :return: rerank result
-        """
-        if not isinstance(self.model_type_instance, RerankModel):
-            raise Exception("Model type instance is not RerankModel")
-        return cast(
-            RerankResult,
-            self._round_robin_invoke(
-                function=self.model_type_instance.invoke_multimodal_rerank,
-                model=self.model,
-                credentials=self.credentials,
-                query=query,
-                docs=docs,
-                score_threshold=score_threshold,
-                top_n=top_n,
-                user=user,
-            ),
-        )
-
    def invoke_moderation(self, text: str, user: str | None = None) -> bool:
        """
        Invoke moderation model
@ -523,32 +461,6 @@ class ModelManager:
            model=default_model_entity.model,
        )

-    def check_model_support_vision(self, tenant_id: str, provider: str, model: str, model_type: ModelType) -> bool:
-        """
-        Check if model supports vision
-        :param tenant_id: tenant id
-        :param provider: provider name
-        :param model: model name
-        :return: True if model supports vision, False otherwise
-        """
-        model_instance = self.get_model_instance(tenant_id, provider, model_type, model)
-        model_type_instance = model_instance.model_type_instance
-        match model_type:
-            case ModelType.LLM:
-                model_type_instance = cast(LargeLanguageModel, model_type_instance)
-            case ModelType.TEXT_EMBEDDING:
-                model_type_instance = cast(TextEmbeddingModel, model_type_instance)
-            case ModelType.RERANK:
-                model_type_instance = cast(RerankModel, model_type_instance)
-            case _:
-                raise ValueError(f"Model type {model_type} is not supported")
-        model_schema = model_type_instance.get_model_schema(model, model_instance.credentials)
-        if not model_schema:
-            return False
-        if model_schema.features and ModelFeature.VISION in model_schema.features:
-            return True
-        return False
-

 class LBModelManager:
    def __init__(
--- a/api/core/model_runtime/entities/text_embedding_entities.py
+++ b/api/core/model_runtime/entities/text_embedding_entities.py
@ -19,7 +19,7 @@ class EmbeddingUsage(ModelUsage):
    latency: float


-class EmbeddingResult(BaseModel):
+class TextEmbeddingResult(BaseModel):
    """
    Model class for text embedding result.
    """
@ -27,13 +27,3 @@ class EmbeddingResult(BaseModel):
    model: str
    embeddings: list[list[float]]
    usage: EmbeddingUsage
-
-
-class FileEmbeddingResult(BaseModel):
-    """
-    Model class for file embedding result.
-    """
-
-    model: str
-    embeddings: list[list[float]]
-    usage: EmbeddingUsage
--- a/api/core/model_runtime/model_providers/__base/rerank_model.py
+++ b/api/core/model_runtime/model_providers/__base/rerank_model.py
@ -50,43 +50,3 @@ class RerankModel(AIModel):
            )
        except Exception as e:
            raise self._transform_invoke_error(e)
-
-    def invoke_multimodal_rerank(
-        self,
-        model: str,
-        credentials: dict,
-        query: dict,
-        docs: list[dict],
-        score_threshold: float | None = None,
-        top_n: int | None = None,
-        user: str | None = None,
-    ) -> RerankResult:
-        """
-        Invoke multimodal rerank model
-        :param model: model name
-        :param credentials: model credentials
-        :param query: search query
-        :param docs: docs for reranking
-        :param score_threshold: score threshold
-        :param top_n: top n
-        :param user: unique user id
-        :return: rerank result
-        """
-        try:
-            from core.plugin.impl.model import PluginModelClient
-
-            plugin_model_manager = PluginModelClient()
-            return plugin_model_manager.invoke_multimodal_rerank(
-                tenant_id=self.tenant_id,
-                user_id=user or "unknown",
-                plugin_id=self.plugin_id,
-                provider=self.provider_name,
-                model=model,
-                credentials=credentials,
-                query=query,
-                docs=docs,
-                score_threshold=score_threshold,
-                top_n=top_n,
-            )
-        except Exception as e:
-            raise self._transform_invoke_error(e)
--- a/api/core/model_runtime/model_providers/__base/text_embedding_model.py
+++ b/api/core/model_runtime/model_providers/__base/text_embedding_model.py
@ -2,7 +2,7 @@ from pydantic import ConfigDict

 from core.entities.embedding_type import EmbeddingInputType
 from core.model_runtime.entities.model_entities import ModelPropertyKey, ModelType
-from core.model_runtime.entities.text_embedding_entities import EmbeddingResult
+from core.model_runtime.entities.text_embedding_entities import TextEmbeddingResult
 from core.model_runtime.model_providers.__base.ai_model import AIModel


@ -20,18 +20,16 @@ class TextEmbeddingModel(AIModel):
        self,
        model: str,
        credentials: dict,
-        texts: list[str] | None = None,
-        multimodel_documents: list[dict] | None = None,
+        texts: list[str],
        user: str | None = None,
        input_type: EmbeddingInputType = EmbeddingInputType.DOCUMENT,
-    ) -> EmbeddingResult:
+    ) -> TextEmbeddingResult:
        """
        Invoke text embedding model

        :param model: model name
        :param credentials: model credentials
        :param texts: texts to embed
-        :param files: files to embed
        :param user: unique user id
        :param input_type: input type
        :return: embeddings result
@ -40,29 +38,16 @@ class TextEmbeddingModel(AIModel):

        try:
            plugin_model_manager = PluginModelClient()
-            if texts:
-                return plugin_model_manager.invoke_text_embedding(
-                    tenant_id=self.tenant_id,
-                    user_id=user or "unknown",
-                    plugin_id=self.plugin_id,
-                    provider=self.provider_name,
-                    model=model,
-                    credentials=credentials,
-                    texts=texts,
-                    input_type=input_type,
-                )
-            if multimodel_documents:
-                return plugin_model_manager.invoke_multimodal_embedding(
-                    tenant_id=self.tenant_id,
-                    user_id=user or "unknown",
-                    plugin_id=self.plugin_id,
-                    provider=self.provider_name,
-                    model=model,
-                    credentials=credentials,
-                    documents=multimodel_documents,
-                    input_type=input_type,
-                )
-            raise ValueError("No texts or files provided")
+            return plugin_model_manager.invoke_text_embedding(
+                tenant_id=self.tenant_id,
+                user_id=user or "unknown",
+                plugin_id=self.plugin_id,
+                provider=self.provider_name,
+                model=model,
+                credentials=credentials,
+                texts=texts,
+                input_type=input_type,
+            )
        except Exception as e:
            raise self._transform_invoke_error(e)

--- a/api/core/ops/arize_phoenix_trace/arize_phoenix_trace.py
+++ b/api/core/ops/arize_phoenix_trace/arize_phoenix_trace.py
@ -6,13 +6,7 @@ from datetime import datetime, timedelta
 from typing import Any, Union, cast
 from urllib.parse import urlparse

-from openinference.semconv.trace import (
-    MessageAttributes,
-    OpenInferenceMimeTypeValues,
-    OpenInferenceSpanKindValues,
-    SpanAttributes,
-    ToolCallAttributes,
-)
+from openinference.semconv.trace import OpenInferenceMimeTypeValues, OpenInferenceSpanKindValues, SpanAttributes
 from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter as GrpcOTLPSpanExporter
 from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter as HttpOTLPSpanExporter
 from opentelemetry.sdk import trace as trace_sdk
@ -101,14 +95,14 @@ def setup_tracer(arize_phoenix_config: ArizeConfig | PhoenixConfig) -> tuple[tra


 def datetime_to_nanos(dt: datetime | None) -> int:
-    """Convert datetime to nanoseconds since epoch for Arize/Phoenix."""
+    """Convert datetime to nanoseconds since epoch. If None, use current time."""
    if dt is None:
        dt = datetime.now()
    return int(dt.timestamp() * 1_000_000_000)


 def error_to_string(error: Exception | str | None) -> str:
-    """Convert an error to a string with traceback information for Arize/Phoenix."""
+    """Convert an error to a string with traceback information."""
    error_message = "Empty Stack Trace"
    if error:
        if isinstance(error, Exception):
@ -120,7 +114,7 @@ def error_to_string(error: Exception | str | None) -> str:


 def set_span_status(current_span: Span, error: Exception | str | None = None):
-    """Set the status of the current span based on the presence of an error for Arize/Phoenix."""
+    """Set the status of the current span based on the presence of an error."""
    if error:
        error_string = error_to_string(error)
        current_span.set_status(Status(StatusCode.ERROR, error_string))
@ -144,17 +138,10 @@ def set_span_status(current_span: Span, error: Exception | str | None = None):


 def safe_json_dumps(obj: Any) -> str:
-    """A convenience wrapper to ensure that any object can be safely encoded for Arize/Phoenix."""
+    """A convenience wrapper around `json.dumps` that ensures that any object can be safely encoded."""
    return json.dumps(obj, default=str, ensure_ascii=False)


-def wrap_span_metadata(metadata, **kwargs):
-    """Add common metatada to all trace entity types for Arize/Phoenix."""
-    metadata["created_from"] = "Dify"
-    metadata.update(kwargs)
-    return metadata
-
-
 class ArizePhoenixDataTrace(BaseTraceInstance):
    def __init__(
        self,
@ -196,27 +183,16 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
            raise

    def workflow_trace(self, trace_info: WorkflowTraceInfo):
-        file_list = trace_info.file_list if isinstance(trace_info.file_list, list) else []
-
-        metadata = wrap_span_metadata(
-            trace_info.metadata,
-            trace_id=trace_info.trace_id or "",
-            message_id=trace_info.message_id or "",
-            status=trace_info.workflow_run_status or "",
-            status_message=trace_info.error or "",
-            level="ERROR" if trace_info.error else "DEFAULT",
-            trace_entity_type="workflow",
-            conversation_id=trace_info.conversation_id or "",
-            workflow_app_log_id=trace_info.workflow_app_log_id or "",
-            workflow_id=trace_info.workflow_id or "",
-            tenant_id=trace_info.tenant_id or "",
-            workflow_run_id=trace_info.workflow_run_id or "",
-            workflow_run_elapsed_time=trace_info.workflow_run_elapsed_time or 0,
-            workflow_run_version=trace_info.workflow_run_version or "",
-            total_tokens=trace_info.total_tokens or 0,
-            file_list=safe_json_dumps(file_list),
-            query=trace_info.query or "",
-        )
+        workflow_metadata = {
+            "workflow_run_id": trace_info.workflow_run_id or "",
+            "message_id": trace_info.message_id or "",
+            "workflow_app_log_id": trace_info.workflow_app_log_id or "",
+            "status": trace_info.workflow_run_status or "",
+            "status_message": trace_info.error or "",
+            "level": "ERROR" if trace_info.error else "DEFAULT",
+            "total_tokens": trace_info.total_tokens or 0,
+        }
+        workflow_metadata.update(trace_info.metadata)

        dify_trace_id = trace_info.trace_id or trace_info.message_id or trace_info.workflow_run_id
        self.ensure_root_span(dify_trace_id)
@ -225,12 +201,10 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
        workflow_span = self.tracer.start_span(
            name=TraceTaskName.WORKFLOW_TRACE.value,
            attributes={
+                SpanAttributes.INPUT_VALUE: json.dumps(trace_info.workflow_run_inputs, ensure_ascii=False),
+                SpanAttributes.OUTPUT_VALUE: json.dumps(trace_info.workflow_run_outputs, ensure_ascii=False),
                SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.CHAIN.value,
-                SpanAttributes.INPUT_VALUE: safe_json_dumps(trace_info.workflow_run_inputs),
-                SpanAttributes.INPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
-                SpanAttributes.OUTPUT_VALUE: safe_json_dumps(trace_info.workflow_run_outputs),
-                SpanAttributes.OUTPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
-                SpanAttributes.METADATA: safe_json_dumps(metadata),
+                SpanAttributes.METADATA: json.dumps(workflow_metadata, ensure_ascii=False),
                SpanAttributes.SESSION_ID: trace_info.conversation_id or "",
            },
            start_time=datetime_to_nanos(trace_info.start_time),
@ -283,7 +257,6 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
                        "app_id": app_id,
                        "app_name": node_execution.title,
                        "status": node_execution.status,
-                        "status_message": node_execution.error or "",
                        "level": "ERROR" if node_execution.status == "failed" else "DEFAULT",
                    }
                )
@ -317,11 +290,11 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
                node_span = self.tracer.start_span(
                    name=node_execution.node_type,
                    attributes={
-                        SpanAttributes.OPENINFERENCE_SPAN_KIND: span_kind.value,
                        SpanAttributes.INPUT_VALUE: safe_json_dumps(inputs_value),
                        SpanAttributes.INPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
                        SpanAttributes.OUTPUT_VALUE: safe_json_dumps(outputs_value),
                        SpanAttributes.OUTPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
+                        SpanAttributes.OPENINFERENCE_SPAN_KIND: span_kind.value,
                        SpanAttributes.METADATA: safe_json_dumps(node_metadata),
                        SpanAttributes.SESSION_ID: trace_info.conversation_id or "",
                    },
@ -366,37 +339,30 @@ class ArizePhoenixDataTrace(BaseTraceInstance):

    def message_trace(self, trace_info: MessageTraceInfo):
        if trace_info.message_data is None:
-            logger.warning("[Arize/Phoenix] Message data is None, skipping message trace.")
            return

-        file_list = trace_info.file_list if isinstance(trace_info.file_list, list) else []
+        file_list = cast(list[str], trace_info.file_list) or []
        message_file_data: MessageFile | None = trace_info.message_file_data

        if message_file_data is not None:
            file_url = f"{self.file_base_url}/{message_file_data.url}" if message_file_data else ""
            file_list.append(file_url)

-        metadata = wrap_span_metadata(
-            trace_info.metadata,
-            trace_id=trace_info.trace_id or "",
-            message_id=trace_info.message_id or "",
-            status=trace_info.message_data.status or "",
-            status_message=trace_info.error or "",
-            level="ERROR" if trace_info.error else "DEFAULT",
-            trace_entity_type="message",
-            conversation_model=trace_info.conversation_model or "",
-            message_tokens=trace_info.message_tokens or 0,
-            answer_tokens=trace_info.answer_tokens or 0,
-            total_tokens=trace_info.total_tokens or 0,
-            conversation_mode=trace_info.conversation_mode or "",
-            gen_ai_server_time_to_first_token=trace_info.gen_ai_server_time_to_first_token or 0,
-            llm_streaming_time_to_generate=trace_info.llm_streaming_time_to_generate or 0,
-            is_streaming_request=trace_info.is_streaming_request or False,
-            user_id=trace_info.message_data.from_account_id or "",
-            file_list=safe_json_dumps(file_list),
-            model_provider=trace_info.message_data.model_provider or "",
-            model_id=trace_info.message_data.model_id or "",
-        )
+        message_metadata = {
+            "message_id": trace_info.message_id or "",
+            "conversation_mode": str(trace_info.conversation_mode or ""),
+            "user_id": trace_info.message_data.from_account_id or "",
+            "file_list": json.dumps(file_list),
+            "status": trace_info.message_data.status or "",
+            "status_message": trace_info.error or "",
+            "level": "ERROR" if trace_info.error else "DEFAULT",
+            "total_tokens": trace_info.total_tokens or 0,
+            "prompt_tokens": trace_info.message_tokens or 0,
+            "completion_tokens": trace_info.answer_tokens or 0,
+            "ls_provider": trace_info.message_data.model_provider or "",
+            "ls_model_name": trace_info.message_data.model_id or "",
+        }
+        message_metadata.update(trace_info.metadata)

        # Add end user data if available
        if trace_info.message_data.from_end_user_id:
@ -404,16 +370,14 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
                db.session.query(EndUser).where(EndUser.id == trace_info.message_data.from_end_user_id).first()
            )
            if end_user_data is not None:
-                metadata["end_user_id"] = end_user_data.session_id
+                message_metadata["end_user_id"] = end_user_data.session_id

        attributes = {
-            SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.CHAIN.value,
            SpanAttributes.INPUT_VALUE: trace_info.message_data.query,
-            SpanAttributes.INPUT_MIME_TYPE: OpenInferenceMimeTypeValues.TEXT.value,
            SpanAttributes.OUTPUT_VALUE: trace_info.message_data.answer,
-            SpanAttributes.OUTPUT_MIME_TYPE: OpenInferenceMimeTypeValues.TEXT.value,
-            SpanAttributes.METADATA: safe_json_dumps(metadata),
-            SpanAttributes.SESSION_ID: trace_info.message_data.conversation_id or "",
+            SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.CHAIN.value,
+            SpanAttributes.METADATA: json.dumps(message_metadata, ensure_ascii=False),
+            SpanAttributes.SESSION_ID: trace_info.message_data.conversation_id,
        }

        dify_trace_id = trace_info.trace_id or trace_info.message_id
@ -429,10 +393,8 @@ class ArizePhoenixDataTrace(BaseTraceInstance):

        try:
            # Convert outputs to string based on type
-            outputs_mime_type = OpenInferenceMimeTypeValues.TEXT.value
            if isinstance(trace_info.outputs, dict | list):
-                outputs_str = safe_json_dumps(trace_info.outputs)
-                outputs_mime_type = OpenInferenceMimeTypeValues.JSON.value
+                outputs_str = json.dumps(trace_info.outputs, ensure_ascii=False)
            elif isinstance(trace_info.outputs, str):
                outputs_str = trace_info.outputs
            else:
@ -440,12 +402,10 @@ class ArizePhoenixDataTrace(BaseTraceInstance):

            llm_attributes = {
                SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.LLM.value,
-                SpanAttributes.INPUT_VALUE: safe_json_dumps(trace_info.inputs),
-                SpanAttributes.INPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
+                SpanAttributes.INPUT_VALUE: json.dumps(trace_info.inputs, ensure_ascii=False),
                SpanAttributes.OUTPUT_VALUE: outputs_str,
-                SpanAttributes.OUTPUT_MIME_TYPE: outputs_mime_type,
-                SpanAttributes.METADATA: safe_json_dumps(metadata),
-                SpanAttributes.SESSION_ID: trace_info.message_data.conversation_id or "",
+                SpanAttributes.METADATA: json.dumps(message_metadata, ensure_ascii=False),
+                SpanAttributes.SESSION_ID: trace_info.message_data.conversation_id,
            }
            llm_attributes.update(self._construct_llm_attributes(trace_info.inputs))
            if trace_info.total_tokens is not None and trace_info.total_tokens > 0:
@ -489,20 +449,16 @@ class ArizePhoenixDataTrace(BaseTraceInstance):

    def moderation_trace(self, trace_info: ModerationTraceInfo):
        if trace_info.message_data is None:
-            logger.warning("[Arize/Phoenix] Message data is None, skipping moderation trace.")
            return

-        metadata = wrap_span_metadata(
-            trace_info.metadata,
-            trace_id=trace_info.trace_id or "",
-            message_id=trace_info.message_id or "",
-            status=trace_info.message_data.status or "",
-            status_message=trace_info.message_data.error or "",
-            level="ERROR" if trace_info.message_data.error else "DEFAULT",
-            trace_entity_type="moderation",
-            model_provider=trace_info.message_data.model_provider or "",
-            model_id=trace_info.message_data.model_id or "",
-        )
+        metadata = {
+            "message_id": trace_info.message_id,
+            "tool_name": "moderation",
+            "status": trace_info.message_data.status,
+            "status_message": trace_info.message_data.error or "",
+            "level": "ERROR" if trace_info.message_data.error else "DEFAULT",
+        }
+        metadata.update(trace_info.metadata)

        dify_trace_id = trace_info.trace_id or trace_info.message_id
        self.ensure_root_span(dify_trace_id)
@ -511,19 +467,18 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
        span = self.tracer.start_span(
            name=TraceTaskName.MODERATION_TRACE.value,
            attributes={
-                SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.TOOL.value,
-                SpanAttributes.INPUT_VALUE: safe_json_dumps(trace_info.inputs),
-                SpanAttributes.INPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
-                SpanAttributes.OUTPUT_VALUE: safe_json_dumps(
+                SpanAttributes.INPUT_VALUE: json.dumps(trace_info.inputs, ensure_ascii=False),
+                SpanAttributes.OUTPUT_VALUE: json.dumps(
                    {
-                        "flagged": trace_info.flagged,
                        "action": trace_info.action,
+                        "flagged": trace_info.flagged,
                        "preset_response": trace_info.preset_response,
-                        "query": trace_info.query,
-                    }
+                        "inputs": trace_info.inputs,
+                    },
+                    ensure_ascii=False,
                ),
-                SpanAttributes.OUTPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
-                SpanAttributes.METADATA: safe_json_dumps(metadata),
+                SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.CHAIN.value,
+                SpanAttributes.METADATA: json.dumps(metadata, ensure_ascii=False),
            },
            start_time=datetime_to_nanos(trace_info.start_time),
            context=root_span_context,
@ -539,28 +494,22 @@ class ArizePhoenixDataTrace(BaseTraceInstance):

    def suggested_question_trace(self, trace_info: SuggestedQuestionTraceInfo):
        if trace_info.message_data is None:
-            logger.warning("[Arize/Phoenix] Message data is None, skipping suggested question trace.")
            return

        start_time = trace_info.start_time or trace_info.message_data.created_at
        end_time = trace_info.end_time or trace_info.message_data.updated_at

-        metadata = wrap_span_metadata(
-            trace_info.metadata,
-            trace_id=trace_info.trace_id or "",
-            message_id=trace_info.message_id or "",
-            status=trace_info.status or "",
-            status_message=trace_info.status_message or "",
-            level=trace_info.level or "",
-            trace_entity_type="suggested_question",
-            total_tokens=trace_info.total_tokens or 0,
-            from_account_id=trace_info.from_account_id or "",
-            agent_based=trace_info.agent_based or False,
-            from_source=trace_info.from_source or "",
-            model_provider=trace_info.model_provider or "",
-            model_id=trace_info.model_id or "",
-            workflow_run_id=trace_info.workflow_run_id or "",
-        )
+        metadata = {
+            "message_id": trace_info.message_id,
+            "tool_name": "suggested_question",
+            "status": trace_info.status,
+            "status_message": trace_info.error or "",
+            "level": "ERROR" if trace_info.error else "DEFAULT",
+            "total_tokens": trace_info.total_tokens,
+            "ls_provider": trace_info.model_provider or "",
+            "ls_model_name": trace_info.model_id or "",
+        }
+        metadata.update(trace_info.metadata)

        dify_trace_id = trace_info.trace_id or trace_info.message_id
        self.ensure_root_span(dify_trace_id)
@ -569,12 +518,10 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
        span = self.tracer.start_span(
            name=TraceTaskName.SUGGESTED_QUESTION_TRACE.value,
            attributes={
-                SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.TOOL.value,
-                SpanAttributes.INPUT_VALUE: safe_json_dumps(trace_info.inputs),
-                SpanAttributes.INPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
-                SpanAttributes.OUTPUT_VALUE: safe_json_dumps(trace_info.suggested_question),
-                SpanAttributes.OUTPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
-                SpanAttributes.METADATA: safe_json_dumps(metadata),
+                SpanAttributes.INPUT_VALUE: json.dumps(trace_info.inputs, ensure_ascii=False),
+                SpanAttributes.OUTPUT_VALUE: json.dumps(trace_info.suggested_question, ensure_ascii=False),
+                SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.CHAIN.value,
+                SpanAttributes.METADATA: json.dumps(metadata, ensure_ascii=False),
            },
            start_time=datetime_to_nanos(start_time),
            context=root_span_context,
@ -590,23 +537,21 @@ class ArizePhoenixDataTrace(BaseTraceInstance):

    def dataset_retrieval_trace(self, trace_info: DatasetRetrievalTraceInfo):
        if trace_info.message_data is None:
-            logger.warning("[Arize/Phoenix] Message data is None, skipping dataset retrieval trace.")
            return

        start_time = trace_info.start_time or trace_info.message_data.created_at
        end_time = trace_info.end_time or trace_info.message_data.updated_at

-        metadata = wrap_span_metadata(
-            trace_info.metadata,
-            trace_id=trace_info.trace_id or "",
-            message_id=trace_info.message_id or "",
-            status=trace_info.message_data.status or "",
-            status_message=trace_info.error or "",
-            level="ERROR" if trace_info.error else "DEFAULT",
-            trace_entity_type="dataset_retrieval",
-            model_provider=trace_info.message_data.model_provider or "",
-            model_id=trace_info.message_data.model_id or "",
-        )
+        metadata = {
+            "message_id": trace_info.message_id,
+            "tool_name": "dataset_retrieval",
+            "status": trace_info.message_data.status,
+            "status_message": trace_info.message_data.error or "",
+            "level": "ERROR" if trace_info.message_data.error else "DEFAULT",
+            "ls_provider": trace_info.message_data.model_provider or "",
+            "ls_model_name": trace_info.message_data.model_id or "",
+        }
+        metadata.update(trace_info.metadata)

        dify_trace_id = trace_info.trace_id or trace_info.message_id
        self.ensure_root_span(dify_trace_id)
@ -615,20 +560,20 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
        span = self.tracer.start_span(
            name=TraceTaskName.DATASET_RETRIEVAL_TRACE.value,
            attributes={
+                SpanAttributes.INPUT_VALUE: json.dumps(trace_info.inputs, ensure_ascii=False),
+                SpanAttributes.OUTPUT_VALUE: json.dumps({"documents": trace_info.documents}, ensure_ascii=False),
                SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.RETRIEVER.value,
-                SpanAttributes.INPUT_VALUE: safe_json_dumps(trace_info.inputs),
-                SpanAttributes.INPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
-                SpanAttributes.OUTPUT_VALUE: safe_json_dumps({"documents": trace_info.documents}),
-                SpanAttributes.OUTPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
-                SpanAttributes.METADATA: safe_json_dumps(metadata),
+                SpanAttributes.METADATA: json.dumps(metadata, ensure_ascii=False),
+                "start_time": start_time.isoformat() if start_time else "",
+                "end_time": end_time.isoformat() if end_time else "",
            },
            start_time=datetime_to_nanos(start_time),
            context=root_span_context,
        )

        try:
-            if trace_info.error:
-                set_span_status(span, trace_info.error)
+            if trace_info.message_data.error:
+                set_span_status(span, trace_info.message_data.error)
            else:
                set_span_status(span)
        finally:
@ -639,34 +584,30 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
            logger.warning("[Arize/Phoenix] Message data is None, skipping tool trace.")
            return

-        metadata = wrap_span_metadata(
-            trace_info.metadata,
-            trace_id=trace_info.trace_id or "",
-            message_id=trace_info.message_id or "",
-            status=trace_info.message_data.status or "",
-            status_message=trace_info.error or "",
-            level="ERROR" if trace_info.error else "DEFAULT",
-            trace_entity_type="tool",
-            tool_config=safe_json_dumps(trace_info.tool_config),
-            time_cost=trace_info.time_cost or 0,
-            file_url=trace_info.file_url or "",
-        )
+        metadata = {
+            "message_id": trace_info.message_id,
+            "tool_config": json.dumps(trace_info.tool_config, ensure_ascii=False),
+        }

        dify_trace_id = trace_info.trace_id or trace_info.message_id
        self.ensure_root_span(dify_trace_id)
        root_span_context = self.propagator.extract(carrier=self.carrier)

+        tool_params_str = (
+            json.dumps(trace_info.tool_parameters, ensure_ascii=False)
+            if isinstance(trace_info.tool_parameters, dict)
+            else str(trace_info.tool_parameters)
+        )
+
        span = self.tracer.start_span(
            name=trace_info.tool_name,
            attributes={
-                SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.TOOL.value,
-                SpanAttributes.INPUT_VALUE: safe_json_dumps(trace_info.tool_inputs),
-                SpanAttributes.INPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
+                SpanAttributes.INPUT_VALUE: json.dumps(trace_info.tool_inputs, ensure_ascii=False),
                SpanAttributes.OUTPUT_VALUE: trace_info.tool_outputs,
-                SpanAttributes.OUTPUT_MIME_TYPE: OpenInferenceMimeTypeValues.TEXT.value,
-                SpanAttributes.METADATA: safe_json_dumps(metadata),
+                SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.TOOL.value,
+                SpanAttributes.METADATA: json.dumps(metadata, ensure_ascii=False),
                SpanAttributes.TOOL_NAME: trace_info.tool_name,
-                SpanAttributes.TOOL_PARAMETERS: safe_json_dumps(trace_info.tool_parameters),
+                SpanAttributes.TOOL_PARAMETERS: tool_params_str,
            },
            start_time=datetime_to_nanos(trace_info.start_time),
            context=root_span_context,
@ -682,22 +623,16 @@ class ArizePhoenixDataTrace(BaseTraceInstance):

    def generate_name_trace(self, trace_info: GenerateNameTraceInfo):
        if trace_info.message_data is None:
-            logger.warning("[Arize/Phoenix] Message data is None, skipping generate name trace.")
            return

-        metadata = wrap_span_metadata(
-            trace_info.metadata,
-            trace_id=trace_info.trace_id or "",
-            message_id=trace_info.message_id or "",
-            status=trace_info.message_data.status or "",
-            status_message=trace_info.message_data.error or "",
-            level="ERROR" if trace_info.message_data.error else "DEFAULT",
-            trace_entity_type="generate_name",
-            model_provider=trace_info.message_data.model_provider or "",
-            model_id=trace_info.message_data.model_id or "",
-            conversation_id=trace_info.conversation_id or "",
-            tenant_id=trace_info.tenant_id,
-        )
+        metadata = {
+            "project_name": self.project,
+            "message_id": trace_info.message_id,
+            "status": trace_info.message_data.status,
+            "status_message": trace_info.message_data.error or "",
+            "level": "ERROR" if trace_info.message_data.error else "DEFAULT",
+        }
+        metadata.update(trace_info.metadata)

        dify_trace_id = trace_info.trace_id or trace_info.message_id or trace_info.conversation_id
        self.ensure_root_span(dify_trace_id)
@ -706,13 +641,13 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
        span = self.tracer.start_span(
            name=TraceTaskName.GENERATE_NAME_TRACE.value,
            attributes={
+                SpanAttributes.INPUT_VALUE: json.dumps(trace_info.inputs, ensure_ascii=False),
+                SpanAttributes.OUTPUT_VALUE: json.dumps(trace_info.outputs, ensure_ascii=False),
                SpanAttributes.OPENINFERENCE_SPAN_KIND: OpenInferenceSpanKindValues.CHAIN.value,
-                SpanAttributes.INPUT_VALUE: safe_json_dumps(trace_info.inputs),
-                SpanAttributes.INPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
-                SpanAttributes.OUTPUT_VALUE: safe_json_dumps(trace_info.outputs),
-                SpanAttributes.OUTPUT_MIME_TYPE: OpenInferenceMimeTypeValues.JSON.value,
-                SpanAttributes.METADATA: safe_json_dumps(metadata),
-                SpanAttributes.SESSION_ID: trace_info.conversation_id or "",
+                SpanAttributes.METADATA: json.dumps(metadata, ensure_ascii=False),
+                SpanAttributes.SESSION_ID: trace_info.message_data.conversation_id,
+                "start_time": trace_info.start_time.isoformat() if trace_info.start_time else "",
+                "end_time": trace_info.end_time.isoformat() if trace_info.end_time else "",
            },
            start_time=datetime_to_nanos(trace_info.start_time),
            context=root_span_context,
@ -753,85 +688,32 @@ class ArizePhoenixDataTrace(BaseTraceInstance):
            raise ValueError(f"[Arize/Phoenix] API check failed: {str(e)}")

    def get_project_url(self):
-        """Build a redirect URL that forwards the user to the correct project for Arize/Phoenix."""
        try:
-            project_name = self.arize_phoenix_config.project
-            endpoint = self.arize_phoenix_config.endpoint.rstrip("/")
-
-            # Arize
-            if isinstance(self.arize_phoenix_config, ArizeConfig):
-                return f"https://app.arize.com/?redirect_project_name={project_name}"
-
-            # Phoenix
-            return f"{endpoint}/projects/?redirect_project_name={project_name}"
-
+            if self.arize_phoenix_config.endpoint == "https://otlp.arize.com":
+                return "https://app.arize.com/"
+            else:
+                return f"{self.arize_phoenix_config.endpoint}/projects/"
        except Exception as e:
-            logger.info("[Arize/Phoenix] Failed to construct project URL: %s", str(e), exc_info=True)
-            raise ValueError(f"[Arize/Phoenix] Failed to construct project URL: {str(e)}")
+            logger.info("[Arize/Phoenix] Get run url failed: %s", str(e), exc_info=True)
+            raise ValueError(f"[Arize/Phoenix] Get run url failed: {str(e)}")

    def _construct_llm_attributes(self, prompts: dict | list | str | None) -> dict[str, str]:
-        """Construct LLM attributes with passed prompts for Arize/Phoenix."""
-        attributes: dict[str, str] = {}
-
-        def set_attribute(path: str, value: object) -> None:
-            """Store an attribute safely as a string."""
-            if value is None:
-                return
-            try:
-                if isinstance(value, (dict, list)):
-                    value = safe_json_dumps(value)
-                attributes[path] = str(value)
-            except Exception:
-                attributes[path] = str(value)
-
-        def set_message_attribute(message_index: int, key: str, value: object) -> None:
-            path = f"{SpanAttributes.LLM_INPUT_MESSAGES}.{message_index}.{key}"
-            set_attribute(path, value)
-
-        def set_tool_call_attributes(message_index: int, tool_index: int, tool_call: dict | object | None) -> None:
-            """Extract and assign tool call details safely."""
-            if not tool_call:
-                return
-
-            def safe_get(obj, key, default=None):
-                if isinstance(obj, dict):
-                    return obj.get(key, default)
-                return getattr(obj, key, default)
-
-            function_obj = safe_get(tool_call, "function", {})
-            function_name = safe_get(function_obj, "name", "")
-            function_args = safe_get(function_obj, "arguments", {})
-            call_id = safe_get(tool_call, "id", "")
-
-            base_path = (
-                f"{SpanAttributes.LLM_INPUT_MESSAGES}."
-                f"{message_index}.{MessageAttributes.MESSAGE_TOOL_CALLS}.{tool_index}"
-            )
-
-            set_attribute(f"{base_path}.{ToolCallAttributes.TOOL_CALL_FUNCTION_NAME}", function_name)
-            set_attribute(f"{base_path}.{ToolCallAttributes.TOOL_CALL_FUNCTION_ARGUMENTS_JSON}", function_args)
-            set_attribute(f"{base_path}.{ToolCallAttributes.TOOL_CALL_ID}", call_id)
-
-        # Handle list of messages
+        """Helper method to construct LLM attributes with passed prompts."""
+        attributes = {}
        if isinstance(prompts, list):
-            for message_index, message in enumerate(prompts):
-                if not isinstance(message, dict):
-                    continue
-
-                role = message.get("role", "user")
-                content = message.get("text") or message.get("content") or ""
-
-                set_message_attribute(message_index, MessageAttributes.MESSAGE_ROLE, role)
-                set_message_attribute(message_index, MessageAttributes.MESSAGE_CONTENT, content)
-
-                tool_calls = message.get("tool_calls") or []
-                if isinstance(tool_calls, list):
-                    for tool_index, tool_call in enumerate(tool_calls):
-                        set_tool_call_attributes(message_index, tool_index, tool_call)
-
-        # Handle single dict or plain string prompt
-        elif isinstance(prompts, (dict, str)):
-            set_message_attribute(0, MessageAttributes.MESSAGE_CONTENT, prompts)
-            set_message_attribute(0, MessageAttributes.MESSAGE_ROLE, "user")
+            for i, msg in enumerate(prompts):
+                if isinstance(msg, dict):
+                    attributes[f"{SpanAttributes.LLM_INPUT_MESSAGES}.{i}.message.content"] = msg.get("text", "")
+                    attributes[f"{SpanAttributes.LLM_INPUT_MESSAGES}.{i}.message.role"] = msg.get("role", "user")
+                    # todo: handle assistant and tool role messages, as they don't always
+                    # have a text field, but may have a tool_calls field instead
+                    # e.g. 'tool_calls': [{'id': '98af3a29-b066-45a5-b4b1-46c74ddafc58',
+                    # 'type': 'function', 'function': {'name': 'current_time', 'arguments': '{}'}}]}
+        elif isinstance(prompts, dict):
+            attributes[f"{SpanAttributes.LLM_INPUT_MESSAGES}.0.message.content"] = json.dumps(prompts)
+            attributes[f"{SpanAttributes.LLM_INPUT_MESSAGES}.0.message.role"] = "user"
+        elif isinstance(prompts, str):
+            attributes[f"{SpanAttributes.LLM_INPUT_MESSAGES}.0.message.content"] = prompts
+            attributes[f"{SpanAttributes.LLM_INPUT_MESSAGES}.0.message.role"] = "user"

        return attributes
--- a/api/core/ops/tencent_trace/span_builder.py
+++ b/api/core/ops/tencent_trace/span_builder.py
@ -222,6 +222,59 @@ class TencentSpanBuilder:
            links=links,
        )

+    @staticmethod
+    def build_message_llm_span(
+        trace_info: MessageTraceInfo, trace_id: int, parent_span_id: int, user_id: str
+    ) -> SpanData:
+        """Build LLM span for message traces with detailed LLM attributes."""
+        status = Status(StatusCode.OK)
+        if trace_info.error:
+            status = Status(StatusCode.ERROR, trace_info.error)
+
+        # Extract model information from `metadata`` or `message_data`
+        trace_metadata = trace_info.metadata or {}
+        message_data = trace_info.message_data or {}
+
+        model_provider = trace_metadata.get("ls_provider") or (
+            message_data.get("model_provider", "") if isinstance(message_data, dict) else ""
+        )
+        model_name = trace_metadata.get("ls_model_name") or (
+            message_data.get("model_id", "") if isinstance(message_data, dict) else ""
+        )
+
+        inputs_str = str(trace_info.inputs or "")
+        outputs_str = str(trace_info.outputs or "")
+
+        attributes = {
+            GEN_AI_SESSION_ID: trace_metadata.get("conversation_id", ""),
+            GEN_AI_USER_ID: str(user_id),
+            GEN_AI_SPAN_KIND: GenAISpanKind.GENERATION.value,
+            GEN_AI_FRAMEWORK: "dify",
+            GEN_AI_MODEL_NAME: str(model_name),
+            GEN_AI_PROVIDER: str(model_provider),
+            GEN_AI_USAGE_INPUT_TOKENS: str(trace_info.message_tokens or 0),
+            GEN_AI_USAGE_OUTPUT_TOKENS: str(trace_info.answer_tokens or 0),
+            GEN_AI_USAGE_TOTAL_TOKENS: str(trace_info.total_tokens or 0),
+            GEN_AI_PROMPT: inputs_str,
+            GEN_AI_COMPLETION: outputs_str,
+            INPUT_VALUE: inputs_str,
+            OUTPUT_VALUE: outputs_str,
+        }
+
+        if trace_info.is_streaming_request:
+            attributes[GEN_AI_IS_STREAMING_REQUEST] = "true"
+
+        return SpanData(
+            trace_id=trace_id,
+            parent_span_id=parent_span_id,
+            span_id=TencentTraceUtils.convert_to_span_id(trace_info.message_id, "llm"),
+            name="GENERATION",
+            start_time=TencentSpanBuilder._get_time_nanoseconds(trace_info.start_time),
+            end_time=TencentSpanBuilder._get_time_nanoseconds(trace_info.end_time),
+            attributes=attributes,
+            status=status,
+        )
+
    @staticmethod
    def build_tool_span(trace_info: ToolTraceInfo, trace_id: int, parent_span_id: int) -> SpanData:
        """Build tool span."""
--- a/api/core/ops/tencent_trace/tencent_trace.py
+++ b/api/core/ops/tencent_trace/tencent_trace.py
@ -107,9 +107,13 @@ class TencentDataTrace(BaseTraceInstance):
                links.append(TencentTraceUtils.create_link(trace_info.trace_id))

            message_span = TencentSpanBuilder.build_message_span(trace_info, trace_id, str(user_id), links)
-
            self.trace_client.add_span(message_span)

+            # Add LLM child span with detailed attributes
+            parent_span_id = TencentTraceUtils.convert_to_span_id(trace_info.message_id, "message")
+            llm_span = TencentSpanBuilder.build_message_llm_span(trace_info, trace_id, parent_span_id, str(user_id))
+            self.trace_client.add_span(llm_span)
+
            self._record_message_llm_metrics(trace_info)

            # Record trace duration for entry span
--- a/api/core/plugin/impl/model.py
+++ b/api/core/plugin/impl/model.py
@ -6,7 +6,7 @@ from core.model_runtime.entities.llm_entities import LLMResultChunk
 from core.model_runtime.entities.message_entities import PromptMessage, PromptMessageTool
 from core.model_runtime.entities.model_entities import AIModelEntity
 from core.model_runtime.entities.rerank_entities import RerankResult
-from core.model_runtime.entities.text_embedding_entities import EmbeddingResult
+from core.model_runtime.entities.text_embedding_entities import TextEmbeddingResult
 from core.model_runtime.utils.encoders import jsonable_encoder
 from core.plugin.entities.plugin_daemon import (
    PluginBasicBooleanResponse,
@ -243,14 +243,14 @@ class PluginModelClient(BasePluginClient):
        credentials: dict,
        texts: list[str],
        input_type: str,
-    ) -> EmbeddingResult:
+    ) -> TextEmbeddingResult:
        """
        Invoke text embedding
        """
        response = self._request_with_plugin_daemon_response_stream(
            method="POST",
            path=f"plugin/{tenant_id}/dispatch/text_embedding/invoke",
-            type_=EmbeddingResult,
+            type_=TextEmbeddingResult,
            data=jsonable_encoder(
                {
                    "user_id": user_id,
@ -275,48 +275,6 @@ class PluginModelClient(BasePluginClient):

        raise ValueError("Failed to invoke text embedding")

-    def invoke_multimodal_embedding(
-        self,
-        tenant_id: str,
-        user_id: str,
-        plugin_id: str,
-        provider: str,
-        model: str,
-        credentials: dict,
-        documents: list[dict],
-        input_type: str,
-    ) -> EmbeddingResult:
-        """
-        Invoke file embedding
-        """
-        response = self._request_with_plugin_daemon_response_stream(
-            method="POST",
-            path=f"plugin/{tenant_id}/dispatch/multimodal_embedding/invoke",
-            type_=EmbeddingResult,
-            data=jsonable_encoder(
-                {
-                    "user_id": user_id,
-                    "data": {
-                        "provider": provider,
-                        "model_type": "text-embedding",
-                        "model": model,
-                        "credentials": credentials,
-                        "documents": documents,
-                        "input_type": input_type,
-                    },
-                }
-            ),
-            headers={
-                "X-Plugin-ID": plugin_id,
-                "Content-Type": "application/json",
-            },
-        )
-
-        for resp in response:
-            return resp
-
-        raise ValueError("Failed to invoke file embedding")
-
    def get_text_embedding_num_tokens(
        self,
        tenant_id: str,
@ -403,51 +361,6 @@ class PluginModelClient(BasePluginClient):

        raise ValueError("Failed to invoke rerank")

-    def invoke_multimodal_rerank(
-        self,
-        tenant_id: str,
-        user_id: str,
-        plugin_id: str,
-        provider: str,
-        model: str,
-        credentials: dict,
-        query: dict,
-        docs: list[dict],
-        score_threshold: float | None = None,
-        top_n: int | None = None,
-    ) -> RerankResult:
-        """
-        Invoke multimodal rerank
-        """
-        response = self._request_with_plugin_daemon_response_stream(
-            method="POST",
-            path=f"plugin/{tenant_id}/dispatch/multimodal_rerank/invoke",
-            type_=RerankResult,
-            data=jsonable_encoder(
-                {
-                    "user_id": user_id,
-                    "data": {
-                        "provider": provider,
-                        "model_type": "rerank",
-                        "model": model,
-                        "credentials": credentials,
-                        "query": query,
-                        "docs": docs,
-                        "score_threshold": score_threshold,
-                        "top_n": top_n,
-                    },
-                }
-            ),
-            headers={
-                "X-Plugin-ID": plugin_id,
-                "Content-Type": "application/json",
-            },
-        )
-        for resp in response:
-            return resp
-
-        raise ValueError("Failed to invoke multimodal rerank")
-
    def invoke_tts(
        self,
        tenant_id: str,
--- a/api/core/prompt/simple_prompt_transform.py
+++ b/api/core/prompt/simple_prompt_transform.py
@ -49,7 +49,6 @@ class SimplePromptTransform(PromptTransform):
        memory: TokenBufferMemory | None,
        model_config: ModelConfigWithCredentialsEntity,
        image_detail_config: ImagePromptMessageContent.DETAIL | None = None,
-        context_files: list["File"] | None = None,
    ) -> tuple[list[PromptMessage], list[str] | None]:
        inputs = {key: str(value) for key, value in inputs.items()}

@ -65,7 +64,6 @@ class SimplePromptTransform(PromptTransform):
                memory=memory,
                model_config=model_config,
                image_detail_config=image_detail_config,
-                context_files=context_files,
            )
        else:
            prompt_messages, stops = self._get_completion_model_prompt_messages(
@ -78,7 +76,6 @@ class SimplePromptTransform(PromptTransform):
                memory=memory,
                model_config=model_config,
                image_detail_config=image_detail_config,
-                context_files=context_files,
            )

        return prompt_messages, stops
@ -190,7 +187,6 @@ class SimplePromptTransform(PromptTransform):
        memory: TokenBufferMemory | None,
        model_config: ModelConfigWithCredentialsEntity,
        image_detail_config: ImagePromptMessageContent.DETAIL | None = None,
-        context_files: list["File"] | None = None,
    ) -> tuple[list[PromptMessage], list[str] | None]:
        prompt_messages: list[PromptMessage] = []

@ -220,9 +216,9 @@ class SimplePromptTransform(PromptTransform):
            )

        if query:
-            prompt_messages.append(self._get_last_user_message(query, files, image_detail_config, context_files))
+            prompt_messages.append(self._get_last_user_message(query, files, image_detail_config))
        else:
-            prompt_messages.append(self._get_last_user_message(prompt, files, image_detail_config, context_files))
+            prompt_messages.append(self._get_last_user_message(prompt, files, image_detail_config))

        return prompt_messages, None

@ -237,7 +233,6 @@ class SimplePromptTransform(PromptTransform):
        memory: TokenBufferMemory | None,
        model_config: ModelConfigWithCredentialsEntity,
        image_detail_config: ImagePromptMessageContent.DETAIL | None = None,
-        context_files: list["File"] | None = None,
    ) -> tuple[list[PromptMessage], list[str] | None]:
        # get prompt
        prompt, prompt_rules = self._get_prompt_str_and_rules(
@ -280,27 +275,20 @@ class SimplePromptTransform(PromptTransform):
        if stops is not None and len(stops) == 0:
            stops = None

-        return [self._get_last_user_message(prompt, files, image_detail_config, context_files)], stops
+        return [self._get_last_user_message(prompt, files, image_detail_config)], stops

    def _get_last_user_message(
        self,
        prompt: str,
        files: Sequence["File"],
        image_detail_config: ImagePromptMessageContent.DETAIL | None = None,
-        context_files: list["File"] | None = None,
    ) -> UserPromptMessage:
-        prompt_message_contents: list[PromptMessageContentUnionTypes] = []
        if files:
+            prompt_message_contents: list[PromptMessageContentUnionTypes] = []
            for file in files:
                prompt_message_contents.append(
                    file_manager.to_prompt_message_content(file, image_detail_config=image_detail_config)
                )
-        if context_files:
-            for file in context_files:
-                prompt_message_contents.append(
-                    file_manager.to_prompt_message_content(file, image_detail_config=image_detail_config)
-                )
-        if prompt_message_contents:
            prompt_message_contents.append(TextPromptMessageContent(data=prompt))

            prompt_message = UserPromptMessage(content=prompt_message_contents)
--- a/api/core/rag/data_post_processor/data_post_processor.py
+++ b/api/core/rag/data_post_processor/data_post_processor.py
@ -2,7 +2,6 @@ from core.model_manager import ModelInstance, ModelManager
 from core.model_runtime.entities.model_entities import ModelType
 from core.model_runtime.errors.invoke import InvokeAuthorizationError
 from core.rag.data_post_processor.reorder import ReorderRunner
-from core.rag.index_processor.constant.query_type import QueryType
 from core.rag.models.document import Document
 from core.rag.rerank.entity.weight import KeywordSetting, VectorSetting, Weights
 from core.rag.rerank.rerank_base import BaseRerankRunner
@ -31,10 +30,9 @@ class DataPostProcessor:
        score_threshold: float | None = None,
        top_n: int | None = None,
        user: str | None = None,
-        query_type: QueryType = QueryType.TEXT_QUERY,
    ) -> list[Document]:
        if self.rerank_runner:
-            documents = self.rerank_runner.run(query, documents, score_threshold, top_n, user, query_type)
+            documents = self.rerank_runner.run(query, documents, score_threshold, top_n, user)

        if self.reorder_runner:
            documents = self.reorder_runner.run(documents)
--- a/api/core/rag/datasource/retrieval_service.py
+++ b/api/core/rag/datasource/retrieval_service.py
@ -1,30 +1,23 @@
 import concurrent.futures
 from concurrent.futures import ThreadPoolExecutor
-from typing import Any

 from flask import Flask, current_app
 from sqlalchemy import select
 from sqlalchemy.orm import Session, load_only

 from configs import dify_config
-from core.model_manager import ModelManager
-from core.model_runtime.entities.model_entities import ModelType
 from core.rag.data_post_processor.data_post_processor import DataPostProcessor
 from core.rag.datasource.keyword.keyword_factory import Keyword
 from core.rag.datasource.vdb.vector_factory import Vector
 from core.rag.embedding.retrieval import RetrievalSegments
 from core.rag.entities.metadata_entities import MetadataCondition
-from core.rag.index_processor.constant.doc_type import DocType
-from core.rag.index_processor.constant.index_type import IndexStructureType
-from core.rag.index_processor.constant.query_type import QueryType
+from core.rag.index_processor.constant.index_type import IndexType
 from core.rag.models.document import Document
 from core.rag.rerank.rerank_type import RerankMode
 from core.rag.retrieval.retrieval_methods import RetrievalMethod
-from core.tools.signature import sign_upload_file
 from extensions.ext_database import db
-from models.dataset import ChildChunk, Dataset, DocumentSegment, SegmentAttachmentBinding
+from models.dataset import ChildChunk, Dataset, DocumentSegment
 from models.dataset import Document as DatasetDocument
-from models.model import UploadFile
 from services.external_knowledge_service import ExternalDatasetService

 default_retrieval_model = {
@ -44,15 +37,14 @@ class RetrievalService:
        retrieval_method: RetrievalMethod,
        dataset_id: str,
        query: str,
-        top_k: int = 4,
+        top_k: int,
        score_threshold: float | None = 0.0,
        reranking_model: dict | None = None,
        reranking_mode: str = "reranking_model",
        weights: dict | None = None,
        document_ids_filter: list[str] | None = None,
-        attachment_ids: list | None = None,
    ):
-        if not query and not attachment_ids:
+        if not query:
            return []
        dataset = cls._get_dataset(dataset_id)
        if not dataset:
@ -64,52 +56,69 @@ class RetrievalService:
        # Optimize multithreading with thread pools
        with ThreadPoolExecutor(max_workers=dify_config.RETRIEVAL_SERVICE_EXECUTORS) as executor:  # type: ignore
            futures = []
-            retrieval_service = RetrievalService()
-            if query:
+            if retrieval_method == RetrievalMethod.KEYWORD_SEARCH:
                futures.append(
                    executor.submit(
-                        retrieval_service._retrieve,
+                        cls.keyword_search,
                        flask_app=current_app._get_current_object(),  # type: ignore
-                        retrieval_method=retrieval_method,
-                        dataset=dataset,
+                        dataset_id=dataset_id,
+                        query=query,
+                        top_k=top_k,
+                        all_documents=all_documents,
+                        exceptions=exceptions,
+                        document_ids_filter=document_ids_filter,
+                    )
+                )
+            if RetrievalMethod.is_support_semantic_search(retrieval_method):
+                futures.append(
+                    executor.submit(
+                        cls.embedding_search,
+                        flask_app=current_app._get_current_object(),  # type: ignore
+                        dataset_id=dataset_id,
                        query=query,
                        top_k=top_k,
                        score_threshold=score_threshold,
                        reranking_model=reranking_model,
-                        reranking_mode=reranking_mode,
-                        weights=weights,
-                        document_ids_filter=document_ids_filter,
-                        attachment_id=None,
                        all_documents=all_documents,
+                        retrieval_method=retrieval_method,
                        exceptions=exceptions,
+                        document_ids_filter=document_ids_filter,
                    )
                )
-            if attachment_ids:
-                for attachment_id in attachment_ids:
-                    futures.append(
-                        executor.submit(
-                            retrieval_service._retrieve,
-                            flask_app=current_app._get_current_object(),  # type: ignore
-                            retrieval_method=retrieval_method,
-                            dataset=dataset,
-                            query=None,
-                            top_k=top_k,
-                            score_threshold=score_threshold,
-                            reranking_model=reranking_model,
-                            reranking_mode=reranking_mode,
-                            weights=weights,
-                            document_ids_filter=document_ids_filter,
-                            attachment_id=attachment_id,
-                            all_documents=all_documents,
-                            exceptions=exceptions,
-                        )
+            if RetrievalMethod.is_support_fulltext_search(retrieval_method):
+                futures.append(
+                    executor.submit(
+                        cls.full_text_index_search,
+                        flask_app=current_app._get_current_object(),  # type: ignore
+                        dataset_id=dataset_id,
+                        query=query,
+                        top_k=top_k,
+                        score_threshold=score_threshold,
+                        reranking_model=reranking_model,
+                        all_documents=all_documents,
+                        retrieval_method=retrieval_method,
+                        exceptions=exceptions,
+                        document_ids_filter=document_ids_filter,
                    )
-
-            concurrent.futures.wait(futures, timeout=3600, return_when=concurrent.futures.ALL_COMPLETED)
+                )
+            concurrent.futures.wait(futures, timeout=30, return_when=concurrent.futures.ALL_COMPLETED)

        if exceptions:
            raise ValueError(";\n".join(exceptions))

+        # Deduplicate documents for hybrid search to avoid duplicate chunks
+        if retrieval_method == RetrievalMethod.HYBRID_SEARCH:
+            all_documents = cls._deduplicate_documents(all_documents)
+            data_post_processor = DataPostProcessor(
+                str(dataset.tenant_id), reranking_mode, reranking_model, weights, False
+            )
+            all_documents = data_post_processor.invoke(
+                query=query,
+                documents=all_documents,
+                score_threshold=score_threshold,
+                top_n=top_k,
+            )
+
        return all_documents

    @classmethod
@ -214,7 +223,6 @@ class RetrievalService:
        retrieval_method: RetrievalMethod,
        exceptions: list,
        document_ids_filter: list[str] | None = None,
-        query_type: QueryType = QueryType.TEXT_QUERY,
    ):
        with flask_app.app_context():
            try:
@ -223,30 +231,14 @@ class RetrievalService:
                    raise ValueError("dataset not found")

                vector = Vector(dataset=dataset)
-                documents = []
-                if query_type == QueryType.TEXT_QUERY:
-                    documents.extend(
-                        vector.search_by_vector(
-                            query,
-                            search_type="similarity_score_threshold",
-                            top_k=top_k,
-                            score_threshold=score_threshold,
-                            filter={"group_id": [dataset.id]},
-                            document_ids_filter=document_ids_filter,
-                        )
-                    )
-                if query_type == QueryType.IMAGE_QUERY:
-                    if not dataset.is_multimodal:
-                        return
-                    documents.extend(
-                        vector.search_by_file(
-                            file_id=query,
-                            top_k=top_k,
-                            score_threshold=score_threshold,
-                            filter={"group_id": [dataset.id]},
-                            document_ids_filter=document_ids_filter,
-                        )
-                    )
+                documents = vector.search_by_vector(
+                    query,
+                    search_type="similarity_score_threshold",
+                    top_k=top_k,
+                    score_threshold=score_threshold,
+                    filter={"group_id": [dataset.id]},
+                    document_ids_filter=document_ids_filter,
+                )

                if documents:
                    if (
@ -258,37 +250,14 @@ class RetrievalService:
                        data_post_processor = DataPostProcessor(
                            str(dataset.tenant_id), str(RerankMode.RERANKING_MODEL), reranking_model, None, False
                        )
-                        if dataset.is_multimodal:
-                            model_manager = ModelManager()
-                            is_support_vision = model_manager.check_model_support_vision(
-                                tenant_id=dataset.tenant_id,
-                                provider=reranking_model.get("reranking_provider_name") or "",
-                                model=reranking_model.get("reranking_model_name") or "",
-                                model_type=ModelType.RERANK,
-                            )
-                            if is_support_vision:
-                                all_documents.extend(
-                                    data_post_processor.invoke(
-                                        query=query,
-                                        documents=documents,
-                                        score_threshold=score_threshold,
-                                        top_n=len(documents),
-                                        query_type=query_type,
-                                    )
-                                )
-                            else:
-                                # not effective, return original documents
-                                all_documents.extend(documents)
-                        else:
-                            all_documents.extend(
-                                data_post_processor.invoke(
-                                    query=query,
-                                    documents=documents,
-                                    score_threshold=score_threshold,
-                                    top_n=len(documents),
-                                    query_type=query_type,
-                                )
+                        all_documents.extend(
+                            data_post_processor.invoke(
+                                query=query,
+                                documents=documents,
+                                score_threshold=score_threshold,
+                                top_n=len(documents),
                            )
+                        )
                    else:
                        all_documents.extend(documents)
            except Exception as e:
@ -370,161 +339,103 @@ class RetrievalService:
            records = []
            include_segment_ids = set()
            segment_child_map = {}
-            segment_file_map = {}
-            with Session(bind=db.engine, expire_on_commit=False) as session:
-                # Process documents
-                for document in documents:
-                    segment_id = None
-                    attachment_info = None
-                    child_chunk = None
-                    document_id = document.metadata.get("document_id")
-                    if document_id not in dataset_documents:
+
+            # Process documents
+            for document in documents:
+                document_id = document.metadata.get("document_id")
+                if document_id not in dataset_documents:
+                    continue
+
+                dataset_document = dataset_documents[document_id]
+                if not dataset_document:
+                    continue
+
+                if dataset_document.doc_form == IndexType.PARENT_CHILD_INDEX:
+                    # Handle parent-child documents
+                    child_index_node_id = document.metadata.get("doc_id")
+                    child_chunk_stmt = select(ChildChunk).where(ChildChunk.index_node_id == child_index_node_id)
+                    child_chunk = db.session.scalar(child_chunk_stmt)
+
+                    if not child_chunk:
                        continue

-                    dataset_document = dataset_documents[document_id]
-                    if not dataset_document:
-                        continue
-
-                    if dataset_document.doc_form == IndexStructureType.PARENT_CHILD_INDEX:
-                        # Handle parent-child documents
-                        if document.metadata.get("doc_type") == DocType.IMAGE:
-                            attachment_info_dict = cls.get_segment_attachment_info(
-                                dataset_document.dataset_id,
-                                dataset_document.tenant_id,
-                                document.metadata.get("doc_id") or "",
-                                session,
-                            )
-                            if attachment_info_dict:
-                                attachment_info = attachment_info_dict["attachment_info"]
-                                segment_id = attachment_info_dict["segment_id"]
-                        else:
-                            child_index_node_id = document.metadata.get("doc_id")
-                            child_chunk_stmt = select(ChildChunk).where(ChildChunk.index_node_id == child_index_node_id)
-                            child_chunk = session.scalar(child_chunk_stmt)
-
-                            if not child_chunk:
-                                continue
-                            segment_id = child_chunk.segment_id
-
-                        if not segment_id:
-                            continue
-
-                        segment = (
-                            session.query(DocumentSegment)
-                            .where(
-                                DocumentSegment.dataset_id == dataset_document.dataset_id,
-                                DocumentSegment.enabled == True,
-                                DocumentSegment.status == "completed",
-                                DocumentSegment.id == segment_id,
-                            )
-                            .first()
+                    segment = (
+                        db.session.query(DocumentSegment)
+                        .where(
+                            DocumentSegment.dataset_id == dataset_document.dataset_id,
+                            DocumentSegment.enabled == True,
+                            DocumentSegment.status == "completed",
+                            DocumentSegment.id == child_chunk.segment_id,
                        )
+                        .options(
+                            load_only(
+                                DocumentSegment.id,
+                                DocumentSegment.content,
+                                DocumentSegment.answer,
+                            )
+                        )
+                        .first()
+                    )

-                        if not segment:
-                            continue
+                    if not segment:
+                        continue

-                        if segment.id not in include_segment_ids:
-                            include_segment_ids.add(segment.id)
-                            if child_chunk:
-                                child_chunk_detail = {
-                                    "id": child_chunk.id,
-                                    "content": child_chunk.content,
-                                    "position": child_chunk.position,
-                                    "score": document.metadata.get("score", 0.0),
-                                }
-                                map_detail = {
-                                    "max_score": document.metadata.get("score", 0.0),
-                                    "child_chunks": [child_chunk_detail],
-                                }
-                                segment_child_map[segment.id] = map_detail
-                            record = {
-                                "segment": segment,
-                            }
-                            if attachment_info:
-                                segment_file_map[segment.id] = [attachment_info]
-                            records.append(record)
-                        else:
-                            if child_chunk:
-                                child_chunk_detail = {
-                                    "id": child_chunk.id,
-                                    "content": child_chunk.content,
-                                    "position": child_chunk.position,
-                                    "score": document.metadata.get("score", 0.0),
-                                }
-                                if segment.id in segment_child_map:
-                                    segment_child_map[segment.id]["child_chunks"].append(child_chunk_detail)
-                                    segment_child_map[segment.id]["max_score"] = max(
-                                        segment_child_map[segment.id]["max_score"], document.metadata.get("score", 0.0)
-                                    )
-                                else:
-                                    segment_child_map[segment.id] = {
-                                        "max_score": document.metadata.get("score", 0.0),
-                                        "child_chunks": [child_chunk_detail],
-                                    }
-                            if attachment_info:
-                                if segment.id in segment_file_map:
-                                    segment_file_map[segment.id].append(attachment_info)
-                                else:
-                                    segment_file_map[segment.id] = [attachment_info]
+                    if segment.id not in include_segment_ids:
+                        include_segment_ids.add(segment.id)
+                        child_chunk_detail = {
+                            "id": child_chunk.id,
+                            "content": child_chunk.content,
+                            "position": child_chunk.position,
+                            "score": document.metadata.get("score", 0.0),
+                        }
+                        map_detail = {
+                            "max_score": document.metadata.get("score", 0.0),
+                            "child_chunks": [child_chunk_detail],
+                        }
+                        segment_child_map[segment.id] = map_detail
+                        record = {
+                            "segment": segment,
+                        }
+                        records.append(record)
                    else:
-                        # Handle normal documents
-                        segment = None
-                        if document.metadata.get("doc_type") == DocType.IMAGE:
-                            attachment_info_dict = cls.get_segment_attachment_info(
-                                dataset_document.dataset_id,
-                                dataset_document.tenant_id,
-                                document.metadata.get("doc_id") or "",
-                                session,
-                            )
-                            if attachment_info_dict:
-                                attachment_info = attachment_info_dict["attachment_info"]
-                                segment_id = attachment_info_dict["segment_id"]
-                                document_segment_stmt = select(DocumentSegment).where(
-                                    DocumentSegment.dataset_id == dataset_document.dataset_id,
-                                    DocumentSegment.enabled == True,
-                                    DocumentSegment.status == "completed",
-                                    DocumentSegment.id == segment_id,
-                                )
-                                segment = session.scalar(document_segment_stmt)
-                                if segment:
-                                    segment_file_map[segment.id] = [attachment_info]
-                        else:
-                            index_node_id = document.metadata.get("doc_id")
-                            if not index_node_id:
-                                continue
-                            document_segment_stmt = select(DocumentSegment).where(
-                                DocumentSegment.dataset_id == dataset_document.dataset_id,
-                                DocumentSegment.enabled == True,
-                                DocumentSegment.status == "completed",
-                                DocumentSegment.index_node_id == index_node_id,
-                            )
-                            segment = session.scalar(document_segment_stmt)
+                        child_chunk_detail = {
+                            "id": child_chunk.id,
+                            "content": child_chunk.content,
+                            "position": child_chunk.position,
+                            "score": document.metadata.get("score", 0.0),
+                        }
+                        segment_child_map[segment.id]["child_chunks"].append(child_chunk_detail)
+                        segment_child_map[segment.id]["max_score"] = max(
+                            segment_child_map[segment.id]["max_score"], document.metadata.get("score", 0.0)
+                        )
+                else:
+                    # Handle normal documents
+                    index_node_id = document.metadata.get("doc_id")
+                    if not index_node_id:
+                        continue
+                    document_segment_stmt = select(DocumentSegment).where(
+                        DocumentSegment.dataset_id == dataset_document.dataset_id,
+                        DocumentSegment.enabled == True,
+                        DocumentSegment.status == "completed",
+                        DocumentSegment.index_node_id == index_node_id,
+                    )
+                    segment = db.session.scalar(document_segment_stmt)

-                        if not segment:
-                            continue
-                        if segment.id not in include_segment_ids:
-                            include_segment_ids.add(segment.id)
-                            record = {
-                                "segment": segment,
-                                "score": document.metadata.get("score"),  # type: ignore
-                            }
-                            if attachment_info:
-                                segment_file_map[segment.id] = [attachment_info]
-                            records.append(record)
-                        else:
-                            if attachment_info:
-                                attachment_infos = segment_file_map.get(segment.id, [])
-                                if attachment_info not in attachment_infos:
-                                    attachment_infos.append(attachment_info)
-                                segment_file_map[segment.id] = attachment_infos
+                    if not segment:
+                        continue
+
+                    include_segment_ids.add(segment.id)
+                    record = {
+                        "segment": segment,
+                        "score": document.metadata.get("score"),  # type: ignore
+                    }
+                    records.append(record)

            # Add child chunks information to records
            for record in records:
                if record["segment"].id in segment_child_map:
                    record["child_chunks"] = segment_child_map[record["segment"].id].get("child_chunks")  # type: ignore
                    record["score"] = segment_child_map[record["segment"].id]["max_score"]
-                if record["segment"].id in segment_file_map:
-                    record["files"] = segment_file_map[record["segment"].id]  # type: ignore[assignment]

            result = []
            for record in records:
@ -536,11 +447,6 @@ class RetrievalService:
                if not isinstance(child_chunks, list):
                    child_chunks = None

-                # Extract files, ensuring it's a list or None
-                files = record.get("files")
-                if not isinstance(files, list):
-                    files = None
-
                # Extract score, ensuring it's a float or None
                score_value = record.get("score")
                score = (
@ -550,149 +456,10 @@ class RetrievalService:
                )

                # Create RetrievalSegments object
-                retrieval_segment = RetrievalSegments(
-                    segment=segment, child_chunks=child_chunks, score=score, files=files
-                )
+                retrieval_segment = RetrievalSegments(segment=segment, child_chunks=child_chunks, score=score)
                result.append(retrieval_segment)

            return result
        except Exception as e:
            db.session.rollback()
            raise e
-
-    def _retrieve(
-        self,
-        flask_app: Flask,
-        retrieval_method: RetrievalMethod,
-        dataset: Dataset,
-        query: str | None = None,
-        top_k: int = 4,
-        score_threshold: float | None = 0.0,
-        reranking_model: dict | None = None,
-        reranking_mode: str = "reranking_model",
-        weights: dict | None = None,
-        document_ids_filter: list[str] | None = None,
-        attachment_id: str | None = None,
-        all_documents: list[Document] = [],
-        exceptions: list[str] = [],
-    ):
-        if not query and not attachment_id:
-            return
-        with flask_app.app_context():
-            all_documents_item: list[Document] = []
-            # Optimize multithreading with thread pools
-            with ThreadPoolExecutor(max_workers=dify_config.RETRIEVAL_SERVICE_EXECUTORS) as executor:  # type: ignore
-                futures = []
-                if retrieval_method == RetrievalMethod.KEYWORD_SEARCH and query:
-                    futures.append(
-                        executor.submit(
-                            self.keyword_search,
-                            flask_app=current_app._get_current_object(),  # type: ignore
-                            dataset_id=dataset.id,
-                            query=query,
-                            top_k=top_k,
-                            all_documents=all_documents_item,
-                            exceptions=exceptions,
-                            document_ids_filter=document_ids_filter,
-                        )
-                    )
-                if RetrievalMethod.is_support_semantic_search(retrieval_method):
-                    if query:
-                        futures.append(
-                            executor.submit(
-                                self.embedding_search,
-                                flask_app=current_app._get_current_object(),  # type: ignore
-                                dataset_id=dataset.id,
-                                query=query,
-                                top_k=top_k,
-                                score_threshold=score_threshold,
-                                reranking_model=reranking_model,
-                                all_documents=all_documents_item,
-                                retrieval_method=retrieval_method,
-                                exceptions=exceptions,
-                                document_ids_filter=document_ids_filter,
-                                query_type=QueryType.TEXT_QUERY,
-                            )
-                        )
-                    if attachment_id:
-                        futures.append(
-                            executor.submit(
-                                self.embedding_search,
-                                flask_app=current_app._get_current_object(),  # type: ignore
-                                dataset_id=dataset.id,
-                                query=attachment_id,
-                                top_k=top_k,
-                                score_threshold=score_threshold,
-                                reranking_model=reranking_model,
-                                all_documents=all_documents_item,
-                                retrieval_method=retrieval_method,
-                                exceptions=exceptions,
-                                document_ids_filter=document_ids_filter,
-                                query_type=QueryType.IMAGE_QUERY,
-                            )
-                        )
-                if RetrievalMethod.is_support_fulltext_search(retrieval_method) and query:
-                    futures.append(
-                        executor.submit(
-                            self.full_text_index_search,
-                            flask_app=current_app._get_current_object(),  # type: ignore
-                            dataset_id=dataset.id,
-                            query=query,
-                            top_k=top_k,
-                            score_threshold=score_threshold,
-                            reranking_model=reranking_model,
-                            all_documents=all_documents_item,
-                            retrieval_method=retrieval_method,
-                            exceptions=exceptions,
-                            document_ids_filter=document_ids_filter,
-                        )
-                    )
-                concurrent.futures.wait(futures, timeout=300, return_when=concurrent.futures.ALL_COMPLETED)
-
-            if exceptions:
-                raise ValueError(";\n".join(exceptions))
-
-            # Deduplicate documents for hybrid search to avoid duplicate chunks
-            if retrieval_method == RetrievalMethod.HYBRID_SEARCH:
-                if attachment_id and reranking_mode == RerankMode.WEIGHTED_SCORE:
-                    all_documents.extend(all_documents_item)
-                all_documents_item = self._deduplicate_documents(all_documents_item)
-                data_post_processor = DataPostProcessor(
-                    str(dataset.tenant_id), reranking_mode, reranking_model, weights, False
-                )
-
-                query = query or attachment_id
-                if not query:
-                    return
-                all_documents_item = data_post_processor.invoke(
-                    query=query,
-                    documents=all_documents_item,
-                    score_threshold=score_threshold,
-                    top_n=top_k,
-                    query_type=QueryType.TEXT_QUERY if query else QueryType.IMAGE_QUERY,
-                )
-
-            all_documents.extend(all_documents_item)
-
-    @classmethod
-    def get_segment_attachment_info(
-        cls, dataset_id: str, tenant_id: str, attachment_id: str, session: Session
-    ) -> dict[str, Any] | None:
-        upload_file = session.query(UploadFile).where(UploadFile.id == attachment_id).first()
-        if upload_file:
-            attachment_binding = (
-                session.query(SegmentAttachmentBinding)
-                .where(SegmentAttachmentBinding.attachment_id == upload_file.id)
-                .first()
-            )
-            if attachment_binding:
-                attachment_info = {
-                    "id": upload_file.id,
-                    "name": upload_file.name,
-                    "extension": "." + upload_file.extension,
-                    "mime_type": upload_file.mime_type,
-                    "source_url": sign_upload_file(upload_file.id, upload_file.extension),
-                    "size": upload_file.size,
-                }
-                return {"attachment_info": attachment_info, "segment_id": attachment_binding.segment_id}
-        return None
--- a/api/core/rag/datasource/vdb/iris/init.py
+++ b/api/core/rag/datasource/vdb/iris/init.py
--- a/api/core/rag/datasource/vdb/iris/iris_vector.py
+++ b/api/core/rag/datasource/vdb/iris/iris_vector.py
@ -1,407 +0,0 @@
-"""InterSystems IRIS vector database implementation for Dify.
-
-This module provides vector storage and retrieval using IRIS native VECTOR type
-with HNSW indexing for efficient similarity search.
-"""
-
-from __future__ import annotations
-
-import json
-import logging
-import threading
-import uuid
-from contextlib import contextmanager
-from typing import TYPE_CHECKING, Any
-
-from configs import dify_config
-from configs.middleware.vdb.iris_config import IrisVectorConfig
-from core.rag.datasource.vdb.vector_base import BaseVector
-from core.rag.datasource.vdb.vector_factory import AbstractVectorFactory
-from core.rag.datasource.vdb.vector_type import VectorType
-from core.rag.embedding.embedding_base import Embeddings
-from core.rag.models.document import Document
-from extensions.ext_redis import redis_client
-from models.dataset import Dataset
-
-if TYPE_CHECKING:
-    import iris
-else:
-    try:
-        import iris
-    except ImportError:
-        iris = None  # type: ignore[assignment]
-
-logger = logging.getLogger(__name__)
-
-# Singleton connection pool to minimize IRIS license usage
-_pool_lock = threading.Lock()
-_pool_instance: IrisConnectionPool | None = None
-
-
-def get_iris_pool(config: IrisVectorConfig) -> IrisConnectionPool:
-    """Get or create the global IRIS connection pool (singleton pattern)."""
-    global _pool_instance  # pylint: disable=global-statement
-    with _pool_lock:
-        if _pool_instance is None:
-            logger.info("Initializing IRIS connection pool")
-            _pool_instance = IrisConnectionPool(config)
-        return _pool_instance
-
-
-class IrisConnectionPool:
-    """Thread-safe connection pool for IRIS database."""
-
-    def __init__(self, config: IrisVectorConfig) -> None:
-        self.config = config
-        self._pool: list[Any] = []
-        self._lock = threading.Lock()
-        self._min_size = config.IRIS_MIN_CONNECTION
-        self._max_size = config.IRIS_MAX_CONNECTION
-        self._in_use = 0
-        self._schemas_initialized: set[str] = set()  # Cache for initialized schemas
-        self._initialize_pool()
-
-    def _initialize_pool(self) -> None:
-        for _ in range(self._min_size):
-            self._pool.append(self._create_connection())
-
-    def _create_connection(self) -> Any:
-        return iris.connect(
-            hostname=self.config.IRIS_HOST,
-            port=self.config.IRIS_SUPER_SERVER_PORT,
-            namespace=self.config.IRIS_DATABASE,
-            username=self.config.IRIS_USER,
-            password=self.config.IRIS_PASSWORD,
-        )
-
-    def get_connection(self) -> Any:
-        """Get a connection from pool or create new if available."""
-        with self._lock:
-            if self._pool:
-                conn = self._pool.pop()
-                self._in_use += 1
-                return conn
-            if self._in_use < self._max_size:
-                conn = self._create_connection()
-                self._in_use += 1
-                return conn
-            raise RuntimeError("Connection pool exhausted")
-
-    def return_connection(self, conn: Any) -> None:
-        """Return connection to pool after validating it."""
-        if not conn:
-            return
-
-        # Validate connection health
-        is_valid = False
-        try:
-            cursor = conn.cursor()
-            cursor.execute("SELECT 1")
-            cursor.close()
-            is_valid = True
-        except (OSError, RuntimeError) as e:
-            logger.debug("Connection validation failed: %s", e)
-            try:
-                conn.close()
-            except (OSError, RuntimeError):
-                pass
-
-        with self._lock:
-            self._pool.append(conn if is_valid else self._create_connection())
-            self._in_use -= 1
-
-    def ensure_schema_exists(self, schema: str) -> None:
-        """Ensure schema exists in IRIS database.
-
-        This method is idempotent and thread-safe. It uses a memory cache to avoid
-        redundant database queries for already-verified schemas.
-
-        Args:
-            schema: Schema name to ensure exists
-
-        Raises:
-            Exception: If schema creation fails
-        """
-        # Fast path: check cache first (no lock needed for read-only set lookup)
-        if schema in self._schemas_initialized:
-            return
-
-        # Slow path: acquire lock and check again (double-checked locking)
-        with self._lock:
-            if schema in self._schemas_initialized:
-                return
-
-            # Get a connection to check/create schema
-            conn = self._pool[0] if self._pool else self._create_connection()
-            cursor = conn.cursor()
-            try:
-                # Check if schema exists using INFORMATION_SCHEMA
-                check_sql = """
-                    SELECT COUNT(*) FROM INFORMATION_SCHEMA.SCHEMATA
-                    WHERE SCHEMA_NAME = ?
-                """
-                cursor.execute(check_sql, (schema,))  # Must be tuple or list
-                exists = cursor.fetchone()[0] > 0
-
-                if not exists:
-                    # Schema doesn't exist, create it
-                    cursor.execute(f"CREATE SCHEMA {schema}")
-                    conn.commit()
-                    logger.info("Created schema: %s", schema)
-                else:
-                    logger.debug("Schema already exists: %s", schema)
-
-                # Add to cache to skip future checks
-                self._schemas_initialized.add(schema)
-
-            except Exception as e:
-                conn.rollback()
-                logger.exception("Failed to ensure schema %s exists", schema)
-                raise
-            finally:
-                cursor.close()
-
-    def close_all(self) -> None:
-        """Close all connections (application shutdown only)."""
-        with self._lock:
-            for conn in self._pool:
-                try:
-                    conn.close()
-                except (OSError, RuntimeError):
-                    pass
-            self._pool.clear()
-            self._in_use = 0
-            self._schemas_initialized.clear()
-
-
-class IrisVector(BaseVector):
-    """IRIS vector database implementation using native VECTOR type and HNSW indexing."""
-
-    def __init__(self, collection_name: str, config: IrisVectorConfig) -> None:
-        super().__init__(collection_name)
-        self.config = config
-        self.table_name = f"embedding_{collection_name}".upper()
-        self.schema = config.IRIS_SCHEMA or "dify"
-        self.pool = get_iris_pool(config)
-
-    def get_type(self) -> str:
-        return VectorType.IRIS
-
-    @contextmanager
-    def _get_cursor(self):
-        """Context manager for database cursor with connection pooling."""
-        conn = self.pool.get_connection()
-        cursor = conn.cursor()
-        try:
-            yield cursor
-            conn.commit()
-        except Exception:
-            conn.rollback()
-            raise
-        finally:
-            cursor.close()
-            self.pool.return_connection(conn)
-
-    def create(self, texts: list[Document], embeddings: list[list[float]], **kwargs) -> list[str]:
-        dimension = len(embeddings[0])
-        self._create_collection(dimension)
-        return self.add_texts(texts, embeddings)
-
-    def add_texts(self, documents: list[Document], embeddings: list[list[float]], **_kwargs) -> list[str]:
-        """Add documents with embeddings to the collection."""
-        added_ids = []
-        with self._get_cursor() as cursor:
-            for i, doc in enumerate(documents):
-                doc_id = doc.metadata.get("doc_id", str(uuid.uuid4())) if doc.metadata else str(uuid.uuid4())
-                metadata = json.dumps(doc.metadata) if doc.metadata else "{}"
-                embedding_str = json.dumps(embeddings[i])
-
-                sql = f"INSERT INTO {self.schema}.{self.table_name} (id, text, meta, embedding) VALUES (?, ?, ?, ?)"
-                cursor.execute(sql, (doc_id, doc.page_content, metadata, embedding_str))
-                added_ids.append(doc_id)
-
-        return added_ids
-
-    def text_exists(self, id: str) -> bool:  # pylint: disable=redefined-builtin
-        try:
-            with self._get_cursor() as cursor:
-                sql = f"SELECT 1 FROM {self.schema}.{self.table_name} WHERE id = ?"
-                cursor.execute(sql, (id,))
-                return cursor.fetchone() is not None
-        except (OSError, RuntimeError, ValueError):
-            return False
-
-    def delete_by_ids(self, ids: list[str]) -> None:
-        if not ids:
-            return
-
-        with self._get_cursor() as cursor:
-            placeholders = ",".join(["?" for _ in ids])
-            sql = f"DELETE FROM {self.schema}.{self.table_name} WHERE id IN ({placeholders})"
-            cursor.execute(sql, ids)
-
-    def delete_by_metadata_field(self, key: str, value: str) -> None:
-        """Delete documents by metadata field (JSON LIKE pattern matching)."""
-        with self._get_cursor() as cursor:
-            pattern = f'%"{key}": "{value}"%'
-            sql = f"DELETE FROM {self.schema}.{self.table_name} WHERE meta LIKE ?"
-            cursor.execute(sql, (pattern,))
-
-    def search_by_vector(self, query_vector: list[float], **kwargs: Any) -> list[Document]:
-        """Search similar documents using VECTOR_COSINE with HNSW index."""
-        top_k = kwargs.get("top_k", 4)
-        score_threshold = float(kwargs.get("score_threshold") or 0.0)
-        embedding_str = json.dumps(query_vector)
-
-        with self._get_cursor() as cursor:
-            sql = f"""
-                SELECT TOP {top_k} id, text, meta, VECTOR_COSINE(embedding, ?) as score
-                FROM {self.schema}.{self.table_name}
-                ORDER BY score DESC
-            """
-            cursor.execute(sql, (embedding_str,))
-
-            docs = []
-            for row in cursor.fetchall():
-                if len(row) >= 4:
-                    text, meta_str, score = row[1], row[2], float(row[3])
-                    if score >= score_threshold:
-                        metadata = json.loads(meta_str) if meta_str else {}
-                        metadata["score"] = score
-                        docs.append(Document(page_content=text, metadata=metadata))
-            return docs
-
-    def search_by_full_text(self, query: str, **kwargs: Any) -> list[Document]:
-        """Search documents by full-text using iFind index or fallback to LIKE search."""
-        top_k = kwargs.get("top_k", 5)
-
-        with self._get_cursor() as cursor:
-            if self.config.IRIS_TEXT_INDEX:
-                # Use iFind full-text search with index
-                text_index_name = f"idx_{self.table_name}_text"
-                sql = f"""
-                    SELECT TOP {top_k} id, text, meta
-                    FROM {self.schema}.{self.table_name}
-                    WHERE %ID %FIND search_index({text_index_name}, ?)
-                """
-                cursor.execute(sql, (query,))
-            else:
-                # Fallback to LIKE search (inefficient for large datasets)
-                query_pattern = f"%{query}%"
-                sql = f"""
-                    SELECT TOP {top_k} id, text, meta
-                    FROM {self.schema}.{self.table_name}
-                    WHERE text LIKE ?
-                """
-                cursor.execute(sql, (query_pattern,))
-
-            docs = []
-            for row in cursor.fetchall():
-                if len(row) >= 3:
-                    metadata = json.loads(row[2]) if row[2] else {}
-                    docs.append(Document(page_content=row[1], metadata=metadata))
-
-            if not docs:
-                logger.info("Full-text search for '%s' returned no results", query)
-
-            return docs
-
-    def delete(self) -> None:
-        """Delete the entire collection (drop table - permanent)."""
-        with self._get_cursor() as cursor:
-            sql = f"DROP TABLE {self.schema}.{self.table_name}"
-            cursor.execute(sql)
-
-    def _create_collection(self, dimension: int) -> None:
-        """Create table with VECTOR column and HNSW index.
-
-        Uses Redis lock to prevent concurrent creation attempts across multiple
-        API server instances (api, worker, worker_beat).
-        """
-        cache_key = f"vector_indexing_{self._collection_name}"
-        lock_name = f"{cache_key}_lock"
-
-        with redis_client.lock(lock_name, timeout=20):  # pylint: disable=not-context-manager
-            if redis_client.get(cache_key):
-                return
-
-            # Ensure schema exists (idempotent, cached after first call)
-            self.pool.ensure_schema_exists(self.schema)
-
-            with self._get_cursor() as cursor:
-                # Create table with VECTOR column
-                sql = f"""
-                    CREATE TABLE {self.schema}.{self.table_name} (
-                        id VARCHAR(255) PRIMARY KEY,
-                        text CLOB,
-                        meta CLOB,
-                        embedding VECTOR(DOUBLE, {dimension})
-                    )
-                """
-                logger.info("Creating table: %s.%s", self.schema, self.table_name)
-                cursor.execute(sql)
-
-                # Create HNSW index for vector similarity search
-                index_name = f"idx_{self.table_name}_embedding"
-                sql_index = (
-                    f"CREATE INDEX {index_name} ON {self.schema}.{self.table_name} "
-                    "(embedding) AS HNSW(Distance='Cosine')"
-                )
-                logger.info("Creating HNSW index: %s", index_name)
-                cursor.execute(sql_index)
-                logger.info("HNSW index created successfully: %s", index_name)
-
-                # Create full-text search index if enabled
-                logger.info(
-                    "IRIS_TEXT_INDEX config value: %s (type: %s)",
-                    self.config.IRIS_TEXT_INDEX,
-                    type(self.config.IRIS_TEXT_INDEX),
-                )
-                if self.config.IRIS_TEXT_INDEX:
-                    text_index_name = f"idx_{self.table_name}_text"
-                    language = self.config.IRIS_TEXT_INDEX_LANGUAGE
-                    # Fixed: Removed extra parentheses and corrected syntax
-                    sql_text_index = f"""
-                        CREATE INDEX {text_index_name} ON {self.schema}.{self.table_name} (text)
-                        AS %iFind.Index.Basic
-                        (LANGUAGE = '{language}', LOWER = 1, INDEXOPTION = 0)
-                    """
-                    logger.info("Creating text index: %s with language: %s", text_index_name, language)
-                    logger.info("SQL for text index: %s", sql_text_index)
-                    cursor.execute(sql_text_index)
-                    logger.info("Text index created successfully: %s", text_index_name)
-                else:
-                    logger.warning("Text index creation skipped - IRIS_TEXT_INDEX is disabled")
-
-            redis_client.set(cache_key, 1, ex=3600)
-
-
-class IrisVectorFactory(AbstractVectorFactory):
-    """Factory for creating IrisVector instances."""
-
-    def init_vector(self, dataset: Dataset, attributes: list, embeddings: Embeddings) -> IrisVector:
-        if dataset.index_struct_dict:
-            class_prefix: str = dataset.index_struct_dict["vector_store"]["class_prefix"]
-            collection_name = class_prefix
-        else:
-            dataset_id = dataset.id
-            collection_name = Dataset.gen_collection_name_by_id(dataset_id)
-            index_struct_dict = self.gen_index_struct_dict(VectorType.IRIS, collection_name)
-            dataset.index_struct = json.dumps(index_struct_dict)
-
-        return IrisVector(
-            collection_name=collection_name,
-            config=IrisVectorConfig(
-                IRIS_HOST=dify_config.IRIS_HOST,
-                IRIS_SUPER_SERVER_PORT=dify_config.IRIS_SUPER_SERVER_PORT,
-                IRIS_USER=dify_config.IRIS_USER,
-                IRIS_PASSWORD=dify_config.IRIS_PASSWORD,
-                IRIS_DATABASE=dify_config.IRIS_DATABASE,
-                IRIS_SCHEMA=dify_config.IRIS_SCHEMA,
-                IRIS_CONNECTION_URL=dify_config.IRIS_CONNECTION_URL,
-                IRIS_MIN_CONNECTION=dify_config.IRIS_MIN_CONNECTION,
-                IRIS_MAX_CONNECTION=dify_config.IRIS_MAX_CONNECTION,
-                IRIS_TEXT_INDEX=dify_config.IRIS_TEXT_INDEX,
-                IRIS_TEXT_INDEX_LANGUAGE=dify_config.IRIS_TEXT_INDEX_LANGUAGE,
-            ),
-        )
--- a/api/core/rag/datasource/vdb/vector_factory.py
+++ b/api/core/rag/datasource/vdb/vector_factory.py
@ -1,4 +1,3 @@
-import base64
 import logging
 import time
 from abc import ABC, abstractmethod
@ -13,13 +12,10 @@ from core.rag.datasource.vdb.vector_base import BaseVector
 from core.rag.datasource.vdb.vector_type import VectorType
 from core.rag.embedding.cached_embedding import CacheEmbedding
 from core.rag.embedding.embedding_base import Embeddings
-from core.rag.index_processor.constant.doc_type import DocType
 from core.rag.models.document import Document
 from extensions.ext_database import db
 from extensions.ext_redis import redis_client
-from extensions.ext_storage import storage
 from models.dataset import Dataset, Whitelist
-from models.model import UploadFile

 logger = logging.getLogger(__name__)

@ -163,7 +159,7 @@ class Vector:
                from core.rag.datasource.vdb.lindorm.lindorm_vector import LindormVectorStoreFactory

                return LindormVectorStoreFactory
-            case VectorType.OCEANBASE | VectorType.SEEKDB:
+            case VectorType.OCEANBASE:
                from core.rag.datasource.vdb.oceanbase.oceanbase_vector import OceanBaseVectorFactory

                return OceanBaseVectorFactory
@ -187,10 +183,6 @@ class Vector:
                from core.rag.datasource.vdb.clickzetta.clickzetta_vector import ClickzettaVectorFactory

                return ClickzettaVectorFactory
-            case VectorType.IRIS:
-                from core.rag.datasource.vdb.iris.iris_vector import IrisVectorFactory
-
-                return IrisVectorFactory
            case _:
                raise ValueError(f"Vector store {vector_type} is not supported.")

@ -211,47 +203,6 @@ class Vector:
                self._vector_processor.create(texts=batch, embeddings=batch_embeddings, **kwargs)
            logger.info("Embedding %s texts took %s s", len(texts), time.time() - start)

-    def create_multimodal(self, file_documents: list | None = None, **kwargs):
-        if file_documents:
-            start = time.time()
-            logger.info("start embedding %s files %s", len(file_documents), start)
-            batch_size = 1000
-            total_batches = len(file_documents) + batch_size - 1
-            for i in range(0, len(file_documents), batch_size):
-                batch = file_documents[i : i + batch_size]
-                batch_start = time.time()
-                logger.info("Processing batch %s/%s (%s files)", i // batch_size + 1, total_batches, len(batch))
-
-                # Batch query all upload files to avoid N+1 queries
-                attachment_ids = [doc.metadata["doc_id"] for doc in batch]
-                stmt = select(UploadFile).where(UploadFile.id.in_(attachment_ids))
-                upload_files = db.session.scalars(stmt).all()
-                upload_file_map = {str(f.id): f for f in upload_files}
-
-                file_base64_list = []
-                real_batch = []
-                for document in batch:
-                    attachment_id = document.metadata["doc_id"]
-                    doc_type = document.metadata["doc_type"]
-                    upload_file = upload_file_map.get(attachment_id)
-                    if upload_file:
-                        blob = storage.load_once(upload_file.key)
-                        file_base64_str = base64.b64encode(blob).decode()
-                        file_base64_list.append(
-                            {
-                                "content": file_base64_str,
-                                "content_type": doc_type,
-                                "file_id": attachment_id,
-                            }
-                        )
-                        real_batch.append(document)
-                batch_embeddings = self._embeddings.embed_multimodal_documents(file_base64_list)
-                logger.info(
-                    "Embedding batch %s/%s took %s s", i // batch_size + 1, total_batches, time.time() - batch_start
-                )
-                self._vector_processor.create(texts=real_batch, embeddings=batch_embeddings, **kwargs)
-            logger.info("Embedding %s files took %s s", len(file_documents), time.time() - start)
-
    def add_texts(self, documents: list[Document], **kwargs):
        if kwargs.get("duplicate_check", False):
            documents = self._filter_duplicate_texts(documents)
@ -272,22 +223,6 @@ class Vector:
        query_vector = self._embeddings.embed_query(query)
        return self._vector_processor.search_by_vector(query_vector, **kwargs)

-    def search_by_file(self, file_id: str, **kwargs: Any) -> list[Document]:
-        upload_file: UploadFile | None = db.session.query(UploadFile).where(UploadFile.id == file_id).first()
-
-        if not upload_file:
-            return []
-        blob = storage.load_once(upload_file.key)
-        file_base64_str = base64.b64encode(blob).decode()
-        multimodal_vector = self._embeddings.embed_multimodal_query(
-            {
-                "content": file_base64_str,
-                "content_type": DocType.IMAGE,
-                "file_id": file_id,
-            }
-        )
-        return self._vector_processor.search_by_vector(multimodal_vector, **kwargs)
-
    def search_by_full_text(self, query: str, **kwargs: Any) -> list[Document]:
        return self._vector_processor.search_by_full_text(query, **kwargs)

--- a/api/core/rag/datasource/vdb/vector_type.py
+++ b/api/core/rag/datasource/vdb/vector_type.py
@ -27,10 +27,8 @@ class VectorType(StrEnum):
    UPSTASH = "upstash"
    TIDB_ON_QDRANT = "tidb_on_qdrant"
    OCEANBASE = "oceanbase"
-    SEEKDB = "seekdb"
    OPENGAUSS = "opengauss"
    TABLESTORE = "tablestore"
    HUAWEI_CLOUD = "huawei_cloud"
    MATRIXONE = "matrixone"
    CLICKZETTA = "clickzetta"
-    IRIS = "iris"
--- a/api/core/rag/docstore/dataset_docstore.py
+++ b/api/core/rag/docstore/dataset_docstore.py
@ -5,9 +5,9 @@ from sqlalchemy import func, select

 from core.model_manager import ModelManager
 from core.model_runtime.entities.model_entities import ModelType
-from core.rag.models.document import AttachmentDocument, Document
+from core.rag.models.document import Document
 from extensions.ext_database import db
-from models.dataset import ChildChunk, Dataset, DocumentSegment, SegmentAttachmentBinding
+from models.dataset import ChildChunk, Dataset, DocumentSegment


 class DatasetDocumentStore:
@ -120,9 +120,6 @@ class DatasetDocumentStore:

                db.session.add(segment_document)
                db.session.flush()
-                self.add_multimodel_documents_binding(
-                    segment_id=segment_document.id, multimodel_documents=doc.attachments
-                )
                if save_child:
                    if doc.children:
                        for position, child in enumerate(doc.children, start=1):
@ -147,9 +144,6 @@ class DatasetDocumentStore:
                segment_document.index_node_hash = doc.metadata.get("doc_hash")
                segment_document.word_count = len(doc.page_content)
                segment_document.tokens = tokens
-                self.add_multimodel_documents_binding(
-                    segment_id=segment_document.id, multimodel_documents=doc.attachments
-                )
                if save_child and doc.children:
                    # delete the existing child chunks
                    db.session.query(ChildChunk).where(
@ -239,15 +233,3 @@ class DatasetDocumentStore:
        document_segment = db.session.scalar(stmt)

        return document_segment
-
-    def add_multimodel_documents_binding(self, segment_id: str, multimodel_documents: list[AttachmentDocument] | None):
-        if multimodel_documents:
-            for multimodel_document in multimodel_documents:
-                binding = SegmentAttachmentBinding(
-                    tenant_id=self._dataset.tenant_id,
-                    dataset_id=self._dataset.id,
-                    document_id=self._document_id,
-                    segment_id=segment_id,
-                    attachment_id=multimodel_document.metadata["doc_id"],
-                )
-                db.session.add(binding)
--- a/api/core/rag/embedding/cached_embedding.py
+++ b/api/core/rag/embedding/cached_embedding.py
@ -104,88 +104,6 @@ class CacheEmbedding(Embeddings):

        return text_embeddings

-    def embed_multimodal_documents(self, multimodel_documents: list[dict]) -> list[list[float]]:
-        """Embed file documents."""
-        # use doc embedding cache or store if not exists
-        multimodel_embeddings: list[Any] = [None for _ in range(len(multimodel_documents))]
-        embedding_queue_indices = []
-        for i, multimodel_document in enumerate(multimodel_documents):
-            file_id = multimodel_document["file_id"]
-            embedding = (
-                db.session.query(Embedding)
-                .filter_by(
-                    model_name=self._model_instance.model, hash=file_id, provider_name=self._model_instance.provider
-                )
-                .first()
-            )
-            if embedding:
-                multimodel_embeddings[i] = embedding.get_embedding()
-            else:
-                embedding_queue_indices.append(i)
-
-        # NOTE: avoid closing the shared scoped session here; downstream code may still have pending work
-
-        if embedding_queue_indices:
-            embedding_queue_multimodel_documents = [multimodel_documents[i] for i in embedding_queue_indices]
-            embedding_queue_embeddings = []
-            try:
-                model_type_instance = cast(TextEmbeddingModel, self._model_instance.model_type_instance)
-                model_schema = model_type_instance.get_model_schema(
-                    self._model_instance.model, self._model_instance.credentials
-                )
-                max_chunks = (
-                    model_schema.model_properties[ModelPropertyKey.MAX_CHUNKS]
-                    if model_schema and ModelPropertyKey.MAX_CHUNKS in model_schema.model_properties
-                    else 1
-                )
-                for i in range(0, len(embedding_queue_multimodel_documents), max_chunks):
-                    batch_multimodel_documents = embedding_queue_multimodel_documents[i : i + max_chunks]
-
-                    embedding_result = self._model_instance.invoke_multimodal_embedding(
-                        multimodel_documents=batch_multimodel_documents,
-                        user=self._user,
-                        input_type=EmbeddingInputType.DOCUMENT,
-                    )
-
-                    for vector in embedding_result.embeddings:
-                        try:
-                            # FIXME: type ignore for numpy here
-                            normalized_embedding = (vector / np.linalg.norm(vector)).tolist()  # type: ignore
-                            # stackoverflow best way: https://stackoverflow.com/questions/20319813/how-to-check-list-containing-nan
-                            if np.isnan(normalized_embedding).any():
-                                # for issue #11827  float values are not json compliant
-                                logger.warning("Normalized embedding is nan: %s", normalized_embedding)
-                                continue
-                            embedding_queue_embeddings.append(normalized_embedding)
-                        except IntegrityError:
-                            db.session.rollback()
-                        except Exception:
-                            logger.exception("Failed transform embedding")
-                cache_embeddings = []
-                try:
-                    for i, n_embedding in zip(embedding_queue_indices, embedding_queue_embeddings):
-                        multimodel_embeddings[i] = n_embedding
-                        file_id = multimodel_documents[i]["file_id"]
-                        if file_id not in cache_embeddings:
-                            embedding_cache = Embedding(
-                                model_name=self._model_instance.model,
-                                hash=file_id,
-                                provider_name=self._model_instance.provider,
-                                embedding=pickle.dumps(n_embedding, protocol=pickle.HIGHEST_PROTOCOL),
-                            )
-                            embedding_cache.set_embedding(n_embedding)
-                            db.session.add(embedding_cache)
-                            cache_embeddings.append(file_id)
-                    db.session.commit()
-                except IntegrityError:
-                    db.session.rollback()
-            except Exception as ex:
-                db.session.rollback()
-                logger.exception("Failed to embed documents")
-                raise ex
-
-        return multimodel_embeddings
-
    def embed_query(self, text: str) -> list[float]:
        """Embed query text."""
        # use doc embedding cache or store if not exists
@ -228,46 +146,3 @@ class CacheEmbedding(Embeddings):
            raise ex

        return embedding_results  # type: ignore
-
-    def embed_multimodal_query(self, multimodel_document: dict) -> list[float]:
-        """Embed multimodal documents."""
-        # use doc embedding cache or store if not exists
-        file_id = multimodel_document["file_id"]
-        embedding_cache_key = f"{self._model_instance.provider}_{self._model_instance.model}_{file_id}"
-        embedding = redis_client.get(embedding_cache_key)
-        if embedding:
-            redis_client.expire(embedding_cache_key, 600)
-            decoded_embedding = np.frombuffer(base64.b64decode(embedding), dtype="float")
-            return [float(x) for x in decoded_embedding]
-        try:
-            embedding_result = self._model_instance.invoke_multimodal_embedding(
-                multimodel_documents=[multimodel_document], user=self._user, input_type=EmbeddingInputType.QUERY
-            )
-
-            embedding_results = embedding_result.embeddings[0]
-            # FIXME: type ignore for numpy here
-            embedding_results = (embedding_results / np.linalg.norm(embedding_results)).tolist()  # type: ignore
-            if np.isnan(embedding_results).any():
-                raise ValueError("Normalized embedding is nan please try again")
-        except Exception as ex:
-            if dify_config.DEBUG:
-                logger.exception("Failed to embed multimodal document '%s'", multimodel_document["file_id"])
-            raise ex
-
-        try:
-            # encode embedding to base64
-            embedding_vector = np.array(embedding_results)
-            vector_bytes = embedding_vector.tobytes()
-            # Transform to Base64
-            encoded_vector = base64.b64encode(vector_bytes)
-            # Transform to string
-            encoded_str = encoded_vector.decode("utf-8")
-            redis_client.setex(embedding_cache_key, 600, encoded_str)
-        except Exception as ex:
-            if dify_config.DEBUG:
-                logger.exception(
-                    "Failed to add embedding to redis for the multimodal document '%s'", multimodel_document["file_id"]
-                )
-            raise ex
-
-        return embedding_results  # type: ignore
--- a/api/core/rag/embedding/embedding_base.py
+++ b/api/core/rag/embedding/embedding_base.py
@ -9,21 +9,11 @@ class Embeddings(ABC):
        """Embed search docs."""
        raise NotImplementedError

-    @abstractmethod
-    def embed_multimodal_documents(self, multimodel_documents: list[dict]) -> list[list[float]]:
-        """Embed file documents."""
-        raise NotImplementedError
-
    @abstractmethod
    def embed_query(self, text: str) -> list[float]:
        """Embed query text."""
        raise NotImplementedError

-    @abstractmethod
-    def embed_multimodal_query(self, multimodel_document: dict) -> list[float]:
-        """Embed multimodal query."""
-        raise NotImplementedError
-
    async def aembed_documents(self, texts: list[str]) -> list[list[float]]:
        """Asynchronous Embed search docs."""
        raise NotImplementedError
--- a/api/core/rag/embedding/retrieval.py
+++ b/api/core/rag/embedding/retrieval.py
@ -19,4 +19,3 @@ class RetrievalSegments(BaseModel):
    segment: DocumentSegment
    child_chunks: list[RetrievalChildChunk] | None = None
    score: float | None = None
-    files: list[dict[str, str | int]] | None = None
--- a/api/core/rag/entities/citation_metadata.py
+++ b/api/core/rag/entities/citation_metadata.py
@ -21,4 +21,3 @@ class RetrievalSourceMetadata(BaseModel):
    page: int | None = None
    doc_metadata: dict[str, Any] | None = None
    title: str | None = None
-    files: list[dict[str, Any]] | None = None
--- a/api/core/rag/extractor/entity/extract_setting.py
+++ b/api/core/rag/extractor/entity/extract_setting.py
@ -10,7 +10,7 @@ class NotionInfo(BaseModel):
    """

    credential_id: str | None = None
-    notion_workspace_id: str | None = ""
+    notion_workspace_id: str
    notion_obj_id: str
    notion_page_type: str
    document: Document | None = None
--- a/api/core/rag/extractor/excel_extractor.py
+++ b/api/core/rag/extractor/excel_extractor.py
@ -1,7 +1,7 @@
 """Abstract interface for document loader implementations."""

 import os
-from typing import TypedDict
+from typing import cast

 import pandas as pd
 from openpyxl import load_workbook
@ -10,12 +10,6 @@ from core.rag.extractor.extractor_base import BaseExtractor
 from core.rag.models.document import Document


-class Candidate(TypedDict):
-    idx: int
-    count: int
-    map: dict[int, str]
-
-
 class ExcelExtractor(BaseExtractor):
    """Load Excel files.

@ -36,38 +30,32 @@ class ExcelExtractor(BaseExtractor):
        file_extension = os.path.splitext(self._file_path)[-1].lower()

        if file_extension == ".xlsx":
-            wb = load_workbook(self._file_path, read_only=True, data_only=True)
-            try:
-                for sheet_name in wb.sheetnames:
-                    sheet = wb[sheet_name]
-                    header_row_idx, column_map, max_col_idx = self._find_header_and_columns(sheet)
-                    if not column_map:
-                        continue
-                    start_row = header_row_idx + 1
-                    for row in sheet.iter_rows(min_row=start_row, max_col=max_col_idx, values_only=False):
-                        if all(cell.value is None for cell in row):
-                            continue
-                        page_content = []
-                        for col_idx, cell in enumerate(row):
-                            value = cell.value
-                            if col_idx in column_map:
-                                col_name = column_map[col_idx]
-                                if hasattr(cell, "hyperlink") and cell.hyperlink:
-                                    target = getattr(cell.hyperlink, "target", None)
-                                    if target:
-                                        value = f"[{value}]({target})"
-                                if value is None:
-                                    value = ""
-                                elif not isinstance(value, str):
-                                    value = str(value)
-                                value = value.strip().replace('"', '\\"')
-                                page_content.append(f'"{col_name}":"{value}"')
-                        if page_content:
-                            documents.append(
-                                Document(page_content=";".join(page_content), metadata={"source": self._file_path})
-                            )
-            finally:
-                wb.close()
+            wb = load_workbook(self._file_path, data_only=True)
+            for sheet_name in wb.sheetnames:
+                sheet = wb[sheet_name]
+                data = sheet.values
+                cols = next(data, None)
+                if cols is None:
+                    continue
+                df = pd.DataFrame(data, columns=cols)
+
+                df.dropna(how="all", inplace=True)
+
+                for index, row in df.iterrows():
+                    page_content = []
+                    for col_index, (k, v) in enumerate(row.items()):
+                        if pd.notna(v):
+                            cell = sheet.cell(
+                                row=cast(int, index) + 2, column=col_index + 1
+                            )  # +2 to account for header and 1-based index
+                            if cell.hyperlink:
+                                value = f"[{v}]({cell.hyperlink.target})"
+                                page_content.append(f'"{k}":"{value}"')
+                            else:
+                                page_content.append(f'"{k}":"{v}"')
+                    documents.append(
+                        Document(page_content=";".join(page_content), metadata={"source": self._file_path})
+                    )

        elif file_extension == ".xls":
            excel_file = pd.ExcelFile(self._file_path, engine="xlrd")
@ -75,9 +63,9 @@ class ExcelExtractor(BaseExtractor):
                df = excel_file.parse(sheet_name=excel_sheet_name)
                df.dropna(how="all", inplace=True)

-                for _, series_row in df.iterrows():
+                for _, row in df.iterrows():
                    page_content = []
-                    for k, v in series_row.items():
+                    for k, v in row.items():
                        if pd.notna(v):
                            page_content.append(f'"{k}":"{v}"')
                    documents.append(
@ -87,61 +75,3 @@ class ExcelExtractor(BaseExtractor):
            raise ValueError(f"Unsupported file extension: {file_extension}")

        return documents
-
-    def _find_header_and_columns(self, sheet, scan_rows=10) -> tuple[int, dict[int, str], int]:
-        """
-        Scan first N rows to find the most likely header row.
-        Returns:
-            header_row_idx: 1-based index of the header row
-            column_map: Dict mapping 0-based column index to column name
-            max_col_idx: 1-based index of the last valid column (for iter_rows boundary)
-        """
-        # Store potential candidates: (row_index, non_empty_count, column_map)
-        candidates: list[Candidate] = []
-
-        # Limit scan to avoid performance issues on huge files
-        # We iterate manually to control the read scope
-        for current_row_idx, row in enumerate(sheet.iter_rows(min_row=1, max_row=scan_rows, values_only=True), start=1):
-            # Filter out empty cells and build a temp map for this row
-            # col_idx is 0-based
-            row_map = {}
-            for col_idx, cell_value in enumerate(row):
-                if cell_value is not None and str(cell_value).strip():
-                    row_map[col_idx] = str(cell_value).strip().replace('"', '\\"')
-
-            if not row_map:
-                continue
-
-            non_empty_count = len(row_map)
-
-            # Header selection heuristic (implemented):
-            # - Prefer the first row with at least 2 non-empty columns.
-            # - Fallback: choose the row with the most non-empty columns
-            #   (tie-breaker: smaller row index).
-            candidates.append({"idx": current_row_idx, "count": non_empty_count, "map": row_map})
-
-        if not candidates:
-            return 0, {}, 0
-
-        # Choose the best candidate header row.
-
-        best_candidate: Candidate | None = None
-
-        # Strategy: prefer the first row with >= 2 non-empty columns; otherwise fallback.
-
-        for cand in candidates:
-            if cand["count"] >= 2:
-                best_candidate = cand
-                break
-
-        # Fallback: if no row has >= 2 columns, or all have 1, just take the one with max columns
-        if not best_candidate:
-            # Sort by count desc, then index asc
-            candidates.sort(key=lambda x: (-x["count"], x["idx"]))
-            best_candidate = candidates[0]
-
-        # Determine max_col_idx (1-based for openpyxl)
-        # It is the index of the last valid column in our map + 1
-        max_col_idx = max(best_candidate["map"].keys()) + 1
-
-        return best_candidate["idx"], best_candidate["map"], max_col_idx
--- a/api/core/rag/extractor/extract_processor.py
+++ b/api/core/rag/extractor/extract_processor.py
@ -166,7 +166,7 @@ class ExtractProcessor:
        elif extract_setting.datasource_type == DatasourceType.NOTION:
            assert extract_setting.notion_info is not None, "notion_info is required"
            extractor = NotionExtractor(
-                notion_workspace_id=extract_setting.notion_info.notion_workspace_id or "",
+                notion_workspace_id=extract_setting.notion_info.notion_workspace_id,
                notion_obj_id=extract_setting.notion_info.notion_obj_id,
                notion_page_type=extract_setting.notion_info.notion_page_type,
                document_model=extract_setting.notion_info.document,
--- a/api/core/rag/extractor/helpers.py
+++ b/api/core/rag/extractor/helpers.py
@ -45,6 +45,6 @@ def detect_file_encodings(file_path: str, timeout: int = 5, sample_size: int = 1
        except concurrent.futures.TimeoutError:
            raise TimeoutError(f"Timeout reached while detecting encoding for {file_path}")

-    if all(encoding.encoding is None for encoding in encodings):
+    if all(encoding["encoding"] is None for encoding in encodings):
        raise RuntimeError(f"Could not detect encoding for {file_path}")
-    return [enc for enc in encodings if enc.encoding is not None]
+    return [FileEncoding(**enc) for enc in encodings if enc["encoding"] is not None]
--- a/api/core/rag/extractor/word_extractor.py
+++ b/api/core/rag/extractor/word_extractor.py
@ -84,45 +84,22 @@ class WordExtractor(BaseExtractor):
        image_count = 0
        image_map = {}

-        for r_id, rel in doc.part.rels.items():
+        for rel in doc.part.rels.values():
            if "image" in rel.target_ref:
                image_count += 1
                if rel.is_external:
                    url = rel.target_ref
-                    if not self._is_valid_url(url):
-                        continue
-                    try:
-                        response = ssrf_proxy.get(url)
-                    except Exception as e:
-                        logger.warning("Failed to download image from URL: %s: %s", url, str(e))
-                        continue
+                    response = ssrf_proxy.get(url)
                    if response.status_code == 200:
-                        image_ext = mimetypes.guess_extension(response.headers.get("Content-Type", ""))
+                        image_ext = mimetypes.guess_extension(response.headers["Content-Type"])
                        if image_ext is None:
                            continue
                        file_uuid = str(uuid.uuid4())
-                        file_key = "image_files/" + self.tenant_id + "/" + file_uuid + image_ext
+                        file_key = "image_files/" + self.tenant_id + "/" + file_uuid + "." + image_ext
                        mime_type, _ = mimetypes.guess_type(file_key)
                        storage.save(file_key, response.content)
-                        # save file to db
-                        upload_file = UploadFile(
-                            tenant_id=self.tenant_id,
-                            storage_type=dify_config.STORAGE_TYPE,
-                            key=file_key,
-                            name=file_key,
-                            size=0,
-                            extension=str(image_ext),
-                            mime_type=mime_type or "",
-                            created_by=self.user_id,
-                            created_by_role=CreatorUserRole.ACCOUNT,
-                            created_at=naive_utc_now(),
-                            used=True,
-                            used_by=self.user_id,
-                            used_at=naive_utc_now(),
-                        )
-                        db.session.add(upload_file)
-                        # Use r_id as key for external images since target_part is undefined
-                        image_map[r_id] = f"![image]({dify_config.FILES_URL}/files/{upload_file.id}/file-preview)"
+                    else:
+                        continue
                else:
                    image_ext = rel.target_ref.split(".")[-1]
                    if image_ext is None:
@ -133,28 +110,27 @@ class WordExtractor(BaseExtractor):
                    mime_type, _ = mimetypes.guess_type(file_key)

                    storage.save(file_key, rel.target_part.blob)
-                    # save file to db
-                    upload_file = UploadFile(
-                        tenant_id=self.tenant_id,
-                        storage_type=dify_config.STORAGE_TYPE,
-                        key=file_key,
-                        name=file_key,
-                        size=0,
-                        extension=str(image_ext),
-                        mime_type=mime_type or "",
-                        created_by=self.user_id,
-                        created_by_role=CreatorUserRole.ACCOUNT,
-                        created_at=naive_utc_now(),
-                        used=True,
-                        used_by=self.user_id,
-                        used_at=naive_utc_now(),
-                    )
-                    db.session.add(upload_file)
-                    # Use target_part as key for internal images
-                    image_map[rel.target_part] = (
-                        f"![image]({dify_config.FILES_URL}/files/{upload_file.id}/file-preview)"
-                    )
-        db.session.commit()
+                # save file to db
+                upload_file = UploadFile(
+                    tenant_id=self.tenant_id,
+                    storage_type=dify_config.STORAGE_TYPE,
+                    key=file_key,
+                    name=file_key,
+                    size=0,
+                    extension=str(image_ext),
+                    mime_type=mime_type or "",
+                    created_by=self.user_id,
+                    created_by_role=CreatorUserRole.ACCOUNT,
+                    created_at=naive_utc_now(),
+                    used=True,
+                    used_by=self.user_id,
+                    used_at=naive_utc_now(),
+                )
+
+                db.session.add(upload_file)
+                db.session.commit()
+                image_map[rel.target_part] = f"![image]({dify_config.FILES_URL}/files/{upload_file.id}/file-preview)"
+
        return image_map

    def _table_to_markdown(self, table, image_map):
@ -210,17 +186,11 @@ class WordExtractor(BaseExtractor):
                    image_id = blip.get("{http://schemas.openxmlformats.org/officeDocument/2006/relationships}embed")
                    if not image_id:
                        continue
-                    rel = paragraph.part.rels.get(image_id)
-                    if rel is None:
-                        continue
-                    # For external images, use image_id as key; for internal, use target_part
-                    if rel.is_external:
-                        if image_id in image_map:
-                            paragraph_content.append(image_map[image_id])
-                    else:
-                        image_part = rel.target_part
-                        if image_part in image_map:
-                            paragraph_content.append(image_map[image_part])
+                    image_part = paragraph.part.rels[image_id].target_part
+
+                    if image_part in image_map:
+                        image_link = image_map[image_part]
+                        paragraph_content.append(image_link)
            else:
                paragraph_content.append(run.text)
        return "".join(paragraph_content).strip()
@ -257,18 +227,6 @@ class WordExtractor(BaseExtractor):

        def parse_paragraph(paragraph):
            paragraph_content = []
-
-            def append_image_link(image_id, has_drawing):
-                """Helper to append image link from image_map based on relationship type."""
-                rel = doc.part.rels[image_id]
-                if rel.is_external:
-                    if image_id in image_map and not has_drawing:
-                        paragraph_content.append(image_map[image_id])
-                else:
-                    image_part = rel.target_part
-                    if image_part in image_map and not has_drawing:
-                        paragraph_content.append(image_map[image_part])
-
            for run in paragraph.runs:
                if hasattr(run.element, "tag") and isinstance(run.element.tag, str) and run.element.tag.endswith("r"):
                    # Process drawing type images
@ -285,18 +243,10 @@ class WordExtractor(BaseExtractor):
                                "{http://schemas.openxmlformats.org/officeDocument/2006/relationships}embed"
                            )
                            if embed_id:
-                                rel = doc.part.rels.get(embed_id)
-                                if rel is not None and rel.is_external:
-                                    # External image: use embed_id as key
-                                    if embed_id in image_map:
-                                        has_drawing = True
-                                        paragraph_content.append(image_map[embed_id])
-                                else:
-                                    # Internal image: use target_part as key
-                                    image_part = doc.part.related_parts.get(embed_id)
-                                    if image_part in image_map:
-                                        has_drawing = True
-                                        paragraph_content.append(image_map[image_part])
+                                image_part = doc.part.related_parts.get(embed_id)
+                                if image_part in image_map:
+                                    has_drawing = True
+                                    paragraph_content.append(image_map[image_part])
                    # Process pict type images
                    shape_elements = run.element.findall(
                        ".//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pict"
@ -311,7 +261,9 @@ class WordExtractor(BaseExtractor):
                                "{http://schemas.openxmlformats.org/officeDocument/2006/relationships}id"
                            )
                            if image_id and image_id in doc.part.rels:
-                                append_image_link(image_id, has_drawing)
+                                image_part = doc.part.rels[image_id].target_part
+                                if image_part in image_map and not has_drawing:
+                                    paragraph_content.append(image_map[image_part])
                        # Find imagedata element in VML
                        image_data = shape.find(".//{urn:schemas-microsoft-com:vml}imagedata")
                        if image_data is not None:
@ -319,7 +271,9 @@ class WordExtractor(BaseExtractor):
                                "{http://schemas.openxmlformats.org/officeDocument/2006/relationships}id"
                            )
                            if image_id and image_id in doc.part.rels:
-                                append_image_link(image_id, has_drawing)
+                                image_part = doc.part.rels[image_id].target_part
+                                if image_part in image_map and not has_drawing:
+                                    paragraph_content.append(image_map[image_part])
                if run.text.strip():
                    paragraph_content.append(run.text.strip())
            return "".join(paragraph_content) if paragraph_content else ""
--- a/api/core/rag/index_processor/constant/built_in_field.py
+++ b/api/core/rag/index_processor/constant/built_in_field.py
@ -15,4 +15,3 @@ class MetadataDataSource(StrEnum):
    notion_import = "notion"
    local_file = "file_upload"
    online_document = "online_document"
-    online_drive = "online_drive"
--- a/api/core/rag/index_processor/constant/doc_type.py
+++ b/api/core/rag/index_processor/constant/doc_type.py
@ -1,6 +0,0 @@
-from enum import StrEnum
-
-
-class DocType(StrEnum):
-    TEXT = "text"
-    IMAGE = "image"
--- a/api/core/rag/index_processor/constant/index_type.py
+++ b/api/core/rag/index_processor/constant/index_type.py
@ -1,12 +1,7 @@
 from enum import StrEnum


-class IndexStructureType(StrEnum):
+class IndexType(StrEnum):
    PARAGRAPH_INDEX = "text_model"
    QA_INDEX = "qa_model"
    PARENT_CHILD_INDEX = "hierarchical_model"
-
-
-class IndexTechniqueType(StrEnum):
-    ECONOMY = "economy"
-    HIGH_QUALITY = "high_quality"
--- a/api/core/rag/index_processor/constant/query_type.py
+++ b/api/core/rag/index_processor/constant/query_type.py
@ -1,6 +0,0 @@
-from enum import StrEnum
-
-
-class QueryType(StrEnum):
-    TEXT_QUERY = "text_query"
-    IMAGE_QUERY = "image_query"
--- a/api/core/rag/index_processor/index_processor_base.py
+++ b/api/core/rag/index_processor/index_processor_base.py
@ -1,34 +1,20 @@
 """Abstract interface for document loader implementations."""

-import cgi
-import logging
-import mimetypes
-import os
-import re
 from abc import ABC, abstractmethod
 from collections.abc import Mapping
 from typing import TYPE_CHECKING, Any, Optional
-from urllib.parse import unquote, urlparse
-
-import httpx

 from configs import dify_config
-from core.helper import ssrf_proxy
 from core.rag.extractor.entity.extract_setting import ExtractSetting
-from core.rag.index_processor.constant.doc_type import DocType
-from core.rag.models.document import AttachmentDocument, Document
+from core.rag.models.document import Document
 from core.rag.retrieval.retrieval_methods import RetrievalMethod
 from core.rag.splitter.fixed_text_splitter import (
    EnhanceRecursiveCharacterTextSplitter,
    FixedRecursiveCharacterTextSplitter,
 )
 from core.rag.splitter.text_splitter import TextSplitter
-from extensions.ext_database import db
-from extensions.ext_storage import storage
-from models import Account, ToolFile
 from models.dataset import Dataset, DatasetProcessRule
 from models.dataset import Document as DatasetDocument
-from models.model import UploadFile

 if TYPE_CHECKING:
    from core.model_manager import ModelInstance
@ -42,18 +28,11 @@ class BaseIndexProcessor(ABC):
        raise NotImplementedError

    @abstractmethod
-    def transform(self, documents: list[Document], current_user: Account | None = None, **kwargs) -> list[Document]:
+    def transform(self, documents: list[Document], **kwargs) -> list[Document]:
        raise NotImplementedError

    @abstractmethod
-    def load(
-        self,
-        dataset: Dataset,
-        documents: list[Document],
-        multimodal_documents: list[AttachmentDocument] | None = None,
-        with_keywords: bool = True,
-        **kwargs,
-    ):
+    def load(self, dataset: Dataset, documents: list[Document], with_keywords: bool = True, **kwargs):
        raise NotImplementedError

    @abstractmethod
@ -117,178 +96,3 @@ class BaseIndexProcessor(ABC):
            )

        return character_splitter  # type: ignore
-
-    def _get_content_files(self, document: Document, current_user: Account | None = None) -> list[AttachmentDocument]:
-        """
-        Get the content files from the document.
-        """
-        multi_model_documents: list[AttachmentDocument] = []
-        text = document.page_content
-        images = self._extract_markdown_images(text)
-        if not images:
-            return multi_model_documents
-        upload_file_id_list = []
-
-        for image in images:
-            # Collect all upload_file_ids including duplicates to preserve occurrence count
-
-            # For data before v0.10.0
-            pattern = r"/files/([a-f0-9\-]+)/image-preview(?:\?.*?)?"
-            match = re.search(pattern, image)
-            if match:
-                upload_file_id = match.group(1)
-                upload_file_id_list.append(upload_file_id)
-                continue
-
-            # For data after v0.10.0
-            pattern = r"/files/([a-f0-9\-]+)/file-preview(?:\?.*?)?"
-            match = re.search(pattern, image)
-            if match:
-                upload_file_id = match.group(1)
-                upload_file_id_list.append(upload_file_id)
-                continue
-
-            # For tools directory - direct file formats (e.g., .png, .jpg, etc.)
-            # Match URL including any query parameters up to common URL boundaries (space, parenthesis, quotes)
-            pattern = r"/files/tools/([a-f0-9\-]+)\.([a-zA-Z0-9]+)(?:\?[^\s\)\"\']*)?"
-            match = re.search(pattern, image)
-            if match:
-                if current_user:
-                    tool_file_id = match.group(1)
-                    upload_file_id = self._download_tool_file(tool_file_id, current_user)
-                    if upload_file_id:
-                        upload_file_id_list.append(upload_file_id)
-                continue
-            if current_user:
-                upload_file_id = self._download_image(image.split(" ")[0], current_user)
-                if upload_file_id:
-                    upload_file_id_list.append(upload_file_id)
-
-        if not upload_file_id_list:
-            return multi_model_documents
-
-        # Get unique IDs for database query
-        unique_upload_file_ids = list(set(upload_file_id_list))
-        upload_files = db.session.query(UploadFile).where(UploadFile.id.in_(unique_upload_file_ids)).all()
-
-        # Create a mapping from ID to UploadFile for quick lookup
-        upload_file_map = {upload_file.id: upload_file for upload_file in upload_files}
-
-        # Create a Document for each occurrence (including duplicates)
-        for upload_file_id in upload_file_id_list:
-            upload_file = upload_file_map.get(upload_file_id)
-            if upload_file:
-                multi_model_documents.append(
-                    AttachmentDocument(
-                        page_content=upload_file.name,
-                        metadata={
-                            "doc_id": upload_file.id,
-                            "doc_hash": "",
-                            "document_id": document.metadata.get("document_id"),
-                            "dataset_id": document.metadata.get("dataset_id"),
-                            "doc_type": DocType.IMAGE,
-                        },
-                    )
-                )
-        return multi_model_documents
-
-    def _extract_markdown_images(self, text: str) -> list[str]:
-        """
-        Extract the markdown images from the text.
-        """
-        pattern = r"!\[.*?\]\((.*?)\)"
-        return re.findall(pattern, text)
-
-    def _download_image(self, image_url: str, current_user: Account) -> str | None:
-        """
-        Download the image from the URL.
-        Image size must not exceed 2MB.
-        """
-        from services.file_service import FileService
-
-        MAX_IMAGE_SIZE = dify_config.ATTACHMENT_IMAGE_FILE_SIZE_LIMIT * 1024 * 1024
-        DOWNLOAD_TIMEOUT = dify_config.ATTACHMENT_IMAGE_DOWNLOAD_TIMEOUT
-
-        try:
-            # Download with timeout
-            response = ssrf_proxy.get(image_url, timeout=DOWNLOAD_TIMEOUT)
-            response.raise_for_status()
-
-            # Check Content-Length header if available
-            content_length = response.headers.get("Content-Length")
-            if content_length and int(content_length) > MAX_IMAGE_SIZE:
-                logging.warning("Image from %s exceeds 2MB limit (size: %s bytes)", image_url, content_length)
-                return None
-
-            filename = None
-
-            content_disposition = response.headers.get("content-disposition")
-            if content_disposition:
-                _, params = cgi.parse_header(content_disposition)
-                if "filename" in params:
-                    filename = params["filename"]
-                    filename = unquote(filename)
-
-            if not filename:
-                parsed_url = urlparse(image_url)
-                # unquote 处理 URL 中的中文
-                path = unquote(parsed_url.path)
-                filename = os.path.basename(path)
-
-            if not filename:
-                filename = "downloaded_image_file"
-
-            name, current_ext = os.path.splitext(filename)
-
-            content_type = response.headers.get("content-type", "").split(";")[0].strip()
-
-            real_ext = mimetypes.guess_extension(content_type)
-
-            if not current_ext and real_ext or current_ext in [".php", ".jsp", ".asp", ".html"] and real_ext:
-                filename = f"{name}{real_ext}"
-            # Download content with size limit
-            blob = b""
-            for chunk in response.iter_bytes(chunk_size=8192):
-                blob += chunk
-                if len(blob) > MAX_IMAGE_SIZE:
-                    logging.warning("Image from %s exceeds 2MB limit during download", image_url)
-                    return None
-
-            if not blob:
-                logging.warning("Image from %s is empty", image_url)
-                return None
-
-            upload_file = FileService(db.engine).upload_file(
-                filename=filename,
-                content=blob,
-                mimetype=content_type,
-                user=current_user,
-            )
-            return upload_file.id
-        except httpx.TimeoutException:
-            logging.warning("Timeout downloading image from %s after %s seconds", image_url, DOWNLOAD_TIMEOUT)
-            return None
-        except httpx.RequestError as e:
-            logging.warning("Error downloading image from %s: %s", image_url, str(e))
-            return None
-        except Exception:
-            logging.exception("Unexpected error downloading image from %s", image_url)
-            return None
-
-    def _download_tool_file(self, tool_file_id: str, current_user: Account) -> str | None:
-        """
-        Download the tool file from the ID.
-        """
-        from services.file_service import FileService
-
-        tool_file = db.session.query(ToolFile).where(ToolFile.id == tool_file_id).first()
-        if not tool_file:
-            return None
-        blob = storage.load_once(tool_file.file_key)
-        upload_file = FileService(db.engine).upload_file(
-            filename=tool_file.name,
-            content=blob,
-            mimetype=tool_file.mimetype,
-            user=current_user,
-        )
-        return upload_file.id
--- a/api/core/rag/index_processor/index_processor_factory.py
+++ b/api/core/rag/index_processor/index_processor_factory.py
@ -1,6 +1,6 @@
 """Abstract interface for document loader implementations."""

-from core.rag.index_processor.constant.index_type import IndexStructureType
+from core.rag.index_processor.constant.index_type import IndexType
 from core.rag.index_processor.index_processor_base import BaseIndexProcessor
 from core.rag.index_processor.processor.paragraph_index_processor import ParagraphIndexProcessor
 from core.rag.index_processor.processor.parent_child_index_processor import ParentChildIndexProcessor
@ -19,11 +19,11 @@ class IndexProcessorFactory:
        if not self._index_type:
            raise ValueError("Index type must be specified.")

-        if self._index_type == IndexStructureType.PARAGRAPH_INDEX:
+        if self._index_type == IndexType.PARAGRAPH_INDEX:
            return ParagraphIndexProcessor()
-        elif self._index_type == IndexStructureType.QA_INDEX:
+        elif self._index_type == IndexType.QA_INDEX:
            return QAIndexProcessor()
-        elif self._index_type == IndexStructureType.PARENT_CHILD_INDEX:
+        elif self._index_type == IndexType.PARENT_CHILD_INDEX:
            return ParentChildIndexProcessor()
        else:
            raise ValueError(f"Index type {self._index_type} is not supported.")
--- a/api/core/rag/index_processor/processor/paragraph_index_processor.py
+++ b/api/core/rag/index_processor/processor/paragraph_index_processor.py
@ -11,17 +11,14 @@ from core.rag.datasource.vdb.vector_factory import Vector
 from core.rag.docstore.dataset_docstore import DatasetDocumentStore
 from core.rag.extractor.entity.extract_setting import ExtractSetting
 from core.rag.extractor.extract_processor import ExtractProcessor
-from core.rag.index_processor.constant.doc_type import DocType
-from core.rag.index_processor.constant.index_type import IndexStructureType
+from core.rag.index_processor.constant.index_type import IndexType
 from core.rag.index_processor.index_processor_base import BaseIndexProcessor
-from core.rag.models.document import AttachmentDocument, Document, MultimodalGeneralStructureChunk
+from core.rag.models.document import Document
 from core.rag.retrieval.retrieval_methods import RetrievalMethod
 from core.tools.utils.text_processing_utils import remove_leading_symbols
 from libs import helper
-from models.account import Account
 from models.dataset import Dataset, DatasetProcessRule
 from models.dataset import Document as DatasetDocument
-from services.account_service import AccountService
 from services.entities.knowledge_entities.knowledge_entities import Rule


@ -36,7 +33,7 @@ class ParagraphIndexProcessor(BaseIndexProcessor):

        return text_docs

-    def transform(self, documents: list[Document], current_user: Account | None = None, **kwargs) -> list[Document]:
+    def transform(self, documents: list[Document], **kwargs) -> list[Document]:
        process_rule = kwargs.get("process_rule")
        if not process_rule:
            raise ValueError("No process rule found.")
@ -72,11 +69,6 @@ class ParagraphIndexProcessor(BaseIndexProcessor):
                    if document_node.metadata is not None:
                        document_node.metadata["doc_id"] = doc_id
                        document_node.metadata["doc_hash"] = hash
-                    multimodal_documents = (
-                        self._get_content_files(document_node, current_user) if document_node.metadata else None
-                    )
-                    if multimodal_documents:
-                        document_node.attachments = multimodal_documents
                    # delete Splitter character
                    page_content = remove_leading_symbols(document_node.page_content).strip()
                    if len(page_content) > 0:
@ -85,19 +77,10 @@ class ParagraphIndexProcessor(BaseIndexProcessor):
            all_documents.extend(split_documents)
        return all_documents

-    def load(
-        self,
-        dataset: Dataset,
-        documents: list[Document],
-        multimodal_documents: list[AttachmentDocument] | None = None,
-        with_keywords: bool = True,
-        **kwargs,
-    ):
+    def load(self, dataset: Dataset, documents: list[Document], with_keywords: bool = True, **kwargs):
        if dataset.indexing_technique == "high_quality":
            vector = Vector(dataset)
            vector.create(documents)
-            if multimodal_documents and dataset.is_multimodal:
-                vector.create_multimodal(multimodal_documents)
            with_keywords = False
        if with_keywords:
            keywords_list = kwargs.get("keywords_list")
@ -151,9 +134,8 @@ class ParagraphIndexProcessor(BaseIndexProcessor):
        return docs

    def index(self, dataset: Dataset, document: DatasetDocument, chunks: Any):
-        documents: list[Any] = []
-        all_multimodal_documents: list[Any] = []
        if isinstance(chunks, list):
+            documents = []
            for content in chunks:
                metadata = {
                    "dataset_id": dataset.id,
@ -162,68 +144,26 @@ class ParagraphIndexProcessor(BaseIndexProcessor):
                    "doc_hash": helper.generate_text_hash(content),
                }
                doc = Document(page_content=content, metadata=metadata)
-                attachments = self._get_content_files(doc)
-                if attachments:
-                    doc.attachments = attachments
-                    all_multimodal_documents.extend(attachments)
                documents.append(doc)
+            if documents:
+                # save node to document segment
+                doc_store = DatasetDocumentStore(dataset=dataset, user_id=document.created_by, document_id=document.id)
+                # add document segments
+                doc_store.add_documents(docs=documents, save_child=False)
+                if dataset.indexing_technique == "high_quality":
+                    vector = Vector(dataset)
+                    vector.create(documents)
+                elif dataset.indexing_technique == "economy":
+                    keyword = Keyword(dataset)
+                    keyword.add_texts(documents)
        else:
-            multimodal_general_structure = MultimodalGeneralStructureChunk.model_validate(chunks)
-            for general_chunk in multimodal_general_structure.general_chunks:
-                metadata = {
-                    "dataset_id": dataset.id,
-                    "document_id": document.id,
-                    "doc_id": str(uuid.uuid4()),
-                    "doc_hash": helper.generate_text_hash(general_chunk.content),
-                }
-                doc = Document(page_content=general_chunk.content, metadata=metadata)
-                if general_chunk.files:
-                    attachments = []
-                    for file in general_chunk.files:
-                        file_metadata = {
-                            "doc_id": file.id,
-                            "doc_hash": "",
-                            "document_id": document.id,
-                            "dataset_id": dataset.id,
-                            "doc_type": DocType.IMAGE,
-                        }
-                        file_document = AttachmentDocument(
-                            page_content=file.filename or "image_file", metadata=file_metadata
-                        )
-                        attachments.append(file_document)
-                        all_multimodal_documents.append(file_document)
-                    doc.attachments = attachments
-                else:
-                    account = AccountService.load_user(document.created_by)
-                    if not account:
-                        raise ValueError("Invalid account")
-                    doc.attachments = self._get_content_files(doc, current_user=account)
-                    if doc.attachments:
-                        all_multimodal_documents.extend(doc.attachments)
-                documents.append(doc)
-        if documents:
-            # save node to document segment
-            doc_store = DatasetDocumentStore(dataset=dataset, user_id=document.created_by, document_id=document.id)
-            # add document segments
-            doc_store.add_documents(docs=documents, save_child=False)
-            if dataset.indexing_technique == "high_quality":
-                vector = Vector(dataset)
-                vector.create(documents)
-                if all_multimodal_documents and dataset.is_multimodal:
-                    vector.create_multimodal(all_multimodal_documents)
-            elif dataset.indexing_technique == "economy":
-                keyword = Keyword(dataset)
-                keyword.add_texts(documents)
+            raise ValueError("Chunks is not a list")

    def format_preview(self, chunks: Any) -> Mapping[str, Any]:
        if isinstance(chunks, list):
            preview = []
            for content in chunks:
                preview.append({"content": content})
-            return {
-                "chunk_structure": IndexStructureType.PARAGRAPH_INDEX,
-                "preview": preview,
-                "total_segments": len(chunks),
-            }
+            return {"chunk_structure": IndexType.PARAGRAPH_INDEX, "preview": preview, "total_segments": len(chunks)}
        else:
            raise ValueError("Chunks is not a list")
--- a/api/core/rag/index_processor/processor/parent_child_index_processor.py
+++ b/api/core/rag/index_processor/processor/parent_child_index_processor.py
@ -13,17 +13,14 @@ from core.rag.datasource.vdb.vector_factory import Vector
 from core.rag.docstore.dataset_docstore import DatasetDocumentStore
 from core.rag.extractor.entity.extract_setting import ExtractSetting
 from core.rag.extractor.extract_processor import ExtractProcessor
-from core.rag.index_processor.constant.doc_type import DocType
-from core.rag.index_processor.constant.index_type import IndexStructureType
+from core.rag.index_processor.constant.index_type import IndexType
 from core.rag.index_processor.index_processor_base import BaseIndexProcessor
-from core.rag.models.document import AttachmentDocument, ChildDocument, Document, ParentChildStructureChunk
+from core.rag.models.document import ChildDocument, Document, ParentChildStructureChunk
 from core.rag.retrieval.retrieval_methods import RetrievalMethod
 from extensions.ext_database import db
 from libs import helper
-from models import Account
 from models.dataset import ChildChunk, Dataset, DatasetProcessRule, DocumentSegment
 from models.dataset import Document as DatasetDocument
-from services.account_service import AccountService
 from services.entities.knowledge_entities.knowledge_entities import ParentMode, Rule


@ -38,7 +35,7 @@ class ParentChildIndexProcessor(BaseIndexProcessor):

        return text_docs

-    def transform(self, documents: list[Document], current_user: Account | None = None, **kwargs) -> list[Document]:
+    def transform(self, documents: list[Document], **kwargs) -> list[Document]:
        process_rule = kwargs.get("process_rule")
        if not process_rule:
            raise ValueError("No process rule found.")
@ -80,9 +77,6 @@ class ParentChildIndexProcessor(BaseIndexProcessor):
                            page_content = page_content
                        if len(page_content) > 0:
                            document_node.page_content = page_content
-                            multimodel_documents = self._get_content_files(document_node, current_user)
-                            if multimodel_documents:
-                                document_node.attachments = multimodel_documents
                            # parse document to child nodes
                            child_nodes = self._split_child_nodes(
                                document_node, rules, process_rule.get("mode"), kwargs.get("embedding_model_instance")
@ -93,9 +87,6 @@ class ParentChildIndexProcessor(BaseIndexProcessor):
        elif rules.parent_mode == ParentMode.FULL_DOC:
            page_content = "\n".join([document.page_content for document in documents])
            document = Document(page_content=page_content, metadata=documents[0].metadata)
-            multimodel_documents = self._get_content_files(document)
-            if multimodel_documents:
-                document.attachments = multimodel_documents
            # parse document to child nodes
            child_nodes = self._split_child_nodes(
                document, rules, process_rule.get("mode"), kwargs.get("embedding_model_instance")
@ -113,14 +104,7 @@ class ParentChildIndexProcessor(BaseIndexProcessor):

        return all_documents

-    def load(
-        self,
-        dataset: Dataset,
-        documents: list[Document],
-        multimodal_documents: list[AttachmentDocument] | None = None,
-        with_keywords: bool = True,
-        **kwargs,
-    ):
+    def load(self, dataset: Dataset, documents: list[Document], with_keywords: bool = True, **kwargs):
        if dataset.indexing_technique == "high_quality":
            vector = Vector(dataset)
            for document in documents:
@ -130,8 +114,6 @@ class ParentChildIndexProcessor(BaseIndexProcessor):
                        Document.model_validate(child_document.model_dump()) for child_document in child_documents
                    ]
                    vector.create(formatted_child_documents)
-            if multimodal_documents and dataset.is_multimodal:
-                vector.create_multimodal(multimodal_documents)

    def clean(self, dataset: Dataset, node_ids: list[str] | None, with_keywords: bool = True, **kwargs):
        # node_ids is segment's node_ids
@ -262,24 +244,6 @@ class ParentChildIndexProcessor(BaseIndexProcessor):
                }
                child_documents.append(ChildDocument(page_content=child, metadata=child_metadata))
            doc = Document(page_content=parent_child.parent_content, metadata=metadata, children=child_documents)
-            if parent_child.files and len(parent_child.files) > 0:
-                attachments = []
-                for file in parent_child.files:
-                    file_metadata = {
-                        "doc_id": file.id,
-                        "doc_hash": "",
-                        "document_id": document.id,
-                        "dataset_id": dataset.id,
-                        "doc_type": DocType.IMAGE,
-                    }
-                    file_document = AttachmentDocument(page_content=file.filename or "", metadata=file_metadata)
-                    attachments.append(file_document)
-                doc.attachments = attachments
-            else:
-                account = AccountService.load_user(document.created_by)
-                if not account:
-                    raise ValueError("Invalid account")
-                doc.attachments = self._get_content_files(doc, current_user=account)
            documents.append(doc)
        if documents:
            # update document parent mode
@ -303,17 +267,12 @@ class ParentChildIndexProcessor(BaseIndexProcessor):
            doc_store.add_documents(docs=documents, save_child=True)
            if dataset.indexing_technique == "high_quality":
                all_child_documents = []
-                all_multimodal_documents = []
                for doc in documents:
                    if doc.children:
                        all_child_documents.extend(doc.children)
-                    if doc.attachments:
-                        all_multimodal_documents.extend(doc.attachments)
-                vector = Vector(dataset)
                if all_child_documents:
+                    vector = Vector(dataset)
                    vector.create(all_child_documents)
-                if all_multimodal_documents and dataset.is_multimodal:
-                    vector.create_multimodal(all_multimodal_documents)

    def format_preview(self, chunks: Any) -> Mapping[str, Any]:
        parent_childs = ParentChildStructureChunk.model_validate(chunks)
@ -321,7 +280,7 @@ class ParentChildIndexProcessor(BaseIndexProcessor):
        for parent_child in parent_childs.parent_child_chunks:
            preview.append({"content": parent_child.parent_content, "child_chunks": parent_child.child_contents})
        return {
-            "chunk_structure": IndexStructureType.PARENT_CHILD_INDEX,
+            "chunk_structure": IndexType.PARENT_CHILD_INDEX,
            "parent_mode": parent_childs.parent_mode,
            "preview": preview,
            "total_segments": len(parent_childs.parent_child_chunks),
--- a/api/core/rag/index_processor/processor/qa_index_processor.py
+++ b/api/core/rag/index_processor/processor/qa_index_processor.py
@ -18,13 +18,12 @@ from core.rag.datasource.vdb.vector_factory import Vector
 from core.rag.docstore.dataset_docstore import DatasetDocumentStore
 from core.rag.extractor.entity.extract_setting import ExtractSetting
 from core.rag.extractor.extract_processor import ExtractProcessor
-from core.rag.index_processor.constant.index_type import IndexStructureType
+from core.rag.index_processor.constant.index_type import IndexType
 from core.rag.index_processor.index_processor_base import BaseIndexProcessor
-from core.rag.models.document import AttachmentDocument, Document, QAStructureChunk
+from core.rag.models.document import Document, QAStructureChunk
 from core.rag.retrieval.retrieval_methods import RetrievalMethod
 from core.tools.utils.text_processing_utils import remove_leading_symbols
 from libs import helper
-from models.account import Account
 from models.dataset import Dataset
 from models.dataset import Document as DatasetDocument
 from services.entities.knowledge_entities.knowledge_entities import Rule
@ -42,7 +41,7 @@ class QAIndexProcessor(BaseIndexProcessor):
        )
        return text_docs

-    def transform(self, documents: list[Document], current_user: Account | None = None, **kwargs) -> list[Document]:
+    def transform(self, documents: list[Document], **kwargs) -> list[Document]:
        preview = kwargs.get("preview")
        process_rule = kwargs.get("process_rule")
        if not process_rule:
@ -117,7 +116,7 @@ class QAIndexProcessor(BaseIndexProcessor):

        try:
            # Skip the first row
-            df = pd.read_csv(file)  # type: ignore
+            df = pd.read_csv(file)
            text_docs = []
            for _, row in df.iterrows():
                data = Document(page_content=row.iloc[0], metadata={"answer": row.iloc[1]})
@ -129,19 +128,10 @@ class QAIndexProcessor(BaseIndexProcessor):
            raise ValueError(str(e))
        return text_docs

-    def load(
-        self,
-        dataset: Dataset,
-        documents: list[Document],
-        multimodal_documents: list[AttachmentDocument] | None = None,
-        with_keywords: bool = True,
-        **kwargs,
-    ):
+    def load(self, dataset: Dataset, documents: list[Document], with_keywords: bool = True, **kwargs):
        if dataset.indexing_technique == "high_quality":
            vector = Vector(dataset)
            vector.create(documents)
-            if multimodal_documents and dataset.is_multimodal:
-                vector.create_multimodal(multimodal_documents)

    def clean(self, dataset: Dataset, node_ids: list[str] | None, with_keywords: bool = True, **kwargs):
        vector = Vector(dataset)
@ -207,7 +197,7 @@ class QAIndexProcessor(BaseIndexProcessor):
        for qa_chunk in qa_chunks.qa_chunks:
            preview.append({"question": qa_chunk.question, "answer": qa_chunk.answer})
        return {
-            "chunk_structure": IndexStructureType.QA_INDEX,
+            "chunk_structure": IndexType.QA_INDEX,
            "qa_preview": preview,
            "total_segments": len(qa_chunks.qa_chunks),
        }
--- a/api/core/rag/models/document.py
+++ b/api/core/rag/models/document.py
@ -4,8 +4,6 @@ from typing import Any

 from pydantic import BaseModel, Field

-from core.file import File
-

 class ChildDocument(BaseModel):
    """Class for storing a piece of text and associated metadata."""
@ -17,19 +15,7 @@ class ChildDocument(BaseModel):
    """Arbitrary metadata about the page content (e.g., source, relationships to other
        documents, etc.).
    """
-    metadata: dict[str, Any] = Field(default_factory=dict)
-
-
-class AttachmentDocument(BaseModel):
-    """Class for storing a piece of text and associated metadata."""
-
-    page_content: str
-
-    provider: str | None = "dify"
-
-    vector: list[float] | None = None
-
-    metadata: dict[str, Any] = Field(default_factory=dict)
+    metadata: dict = Field(default_factory=dict)


 class Document(BaseModel):
@ -42,31 +28,12 @@ class Document(BaseModel):
    """Arbitrary metadata about the page content (e.g., source, relationships to other
        documents, etc.).
    """
-    metadata: dict[str, Any] = Field(default_factory=dict)
+    metadata: dict = Field(default_factory=dict)

    provider: str | None = "dify"

    children: list[ChildDocument] | None = None

-    attachments: list[AttachmentDocument] | None = None
-
-
-class GeneralChunk(BaseModel):
-    """
-    General Chunk.
-    """
-
-    content: str
-    files: list[File] | None = None
-
-
-class MultimodalGeneralStructureChunk(BaseModel):
-    """
-    Multimodal General Structure Chunk.
-    """
-
-    general_chunks: list[GeneralChunk]
-

 class GeneralStructureChunk(BaseModel):
    """
@ -83,7 +50,6 @@ class ParentChildChunk(BaseModel):

    parent_content: str
    child_contents: list[str]
-    files: list[File] | None = None


 class ParentChildStructureChunk(BaseModel):
--- a/api/core/rag/rerank/rerank_base.py
+++ b/api/core/rag/rerank/rerank_base.py
@ -1,6 +1,5 @@
 from abc import ABC, abstractmethod

-from core.rag.index_processor.constant.query_type import QueryType
 from core.rag.models.document import Document


@ -13,7 +12,6 @@ class BaseRerankRunner(ABC):
        score_threshold: float | None = None,
        top_n: int | None = None,
        user: str | None = None,
-        query_type: QueryType = QueryType.TEXT_QUERY,
    ) -> list[Document]:
        """
        Run rerank model
--- a/api/core/rag/rerank/rerank_model.py
+++ b/api/core/rag/rerank/rerank_model.py
@ -1,15 +1,6 @@
-import base64
-
-from core.model_manager import ModelInstance, ModelManager
-from core.model_runtime.entities.model_entities import ModelType
-from core.model_runtime.entities.rerank_entities import RerankResult
-from core.rag.index_processor.constant.doc_type import DocType
-from core.rag.index_processor.constant.query_type import QueryType
+from core.model_manager import ModelInstance
 from core.rag.models.document import Document
 from core.rag.rerank.rerank_base import BaseRerankRunner
-from extensions.ext_database import db
-from extensions.ext_storage import storage
-from models.model import UploadFile


 class RerankModelRunner(BaseRerankRunner):
@ -23,7 +14,6 @@ class RerankModelRunner(BaseRerankRunner):
        score_threshold: float | None = None,
        top_n: int | None = None,
        user: str | None = None,
-        query_type: QueryType = QueryType.TEXT_QUERY,
    ) -> list[Document]:
        """
        Run rerank model
@ -34,31 +24,38 @@ class RerankModelRunner(BaseRerankRunner):
        :param user: unique user id if needed
        :return:
        """
-        model_manager = ModelManager()
-        is_support_vision = model_manager.check_model_support_vision(
-            tenant_id=self.rerank_model_instance.provider_model_bundle.configuration.tenant_id,
-            provider=self.rerank_model_instance.provider,
-            model=self.rerank_model_instance.model,
-            model_type=ModelType.RERANK,
+        docs = []
+        doc_ids = set()
+        unique_documents = []
+        for document in documents:
+            if (
+                document.provider == "dify"
+                and document.metadata is not None
+                and document.metadata["doc_id"] not in doc_ids
+            ):
+                doc_ids.add(document.metadata["doc_id"])
+                docs.append(document.page_content)
+                unique_documents.append(document)
+            elif document.provider == "external":
+                if document not in unique_documents:
+                    docs.append(document.page_content)
+                    unique_documents.append(document)
+
+        documents = unique_documents
+
+        rerank_result = self.rerank_model_instance.invoke_rerank(
+            query=query, docs=docs, score_threshold=score_threshold, top_n=top_n, user=user
        )
-        if not is_support_vision:
-            if query_type == QueryType.TEXT_QUERY:
-                rerank_result, unique_documents = self.fetch_text_rerank(query, documents, score_threshold, top_n, user)
-            else:
-                return documents
-        else:
-            rerank_result, unique_documents = self.fetch_multimodal_rerank(
-                query, documents, score_threshold, top_n, user, query_type
-            )

        rerank_documents = []
+
        for result in rerank_result.docs:
            if score_threshold is None or result.score >= score_threshold:
                # format document
                rerank_document = Document(
                    page_content=result.text,
-                    metadata=unique_documents[result.index].metadata,
-                    provider=unique_documents[result.index].provider,
+                    metadata=documents[result.index].metadata,
+                    provider=documents[result.index].provider,
                )
                if rerank_document.metadata is not None:
                    rerank_document.metadata["score"] = result.score
@ -66,126 +63,3 @@ class RerankModelRunner(BaseRerankRunner):

        rerank_documents.sort(key=lambda x: x.metadata.get("score", 0.0), reverse=True)
        return rerank_documents[:top_n] if top_n else rerank_documents
-
-    def fetch_text_rerank(
-        self,
-        query: str,
-        documents: list[Document],
-        score_threshold: float | None = None,
-        top_n: int | None = None,
-        user: str | None = None,
-    ) -> tuple[RerankResult, list[Document]]:
-        """
-        Fetch text rerank
-        :param query: search query
-        :param documents: documents for reranking
-        :param score_threshold: score threshold
-        :param top_n: top n
-        :param user: unique user id if needed
-        :return:
-        """
-        docs = []
-        doc_ids = set()
-        unique_documents = []
-        for document in documents:
-            if (
-                document.provider == "dify"
-                and document.metadata is not None
-                and document.metadata["doc_id"] not in doc_ids
-            ):
-                if not document.metadata.get("doc_type") or document.metadata.get("doc_type") == DocType.TEXT:
-                    doc_ids.add(document.metadata["doc_id"])
-                    docs.append(document.page_content)
-                    unique_documents.append(document)
-            elif document.provider == "external":
-                if document not in unique_documents:
-                    docs.append(document.page_content)
-                    unique_documents.append(document)
-
-        rerank_result = self.rerank_model_instance.invoke_rerank(
-            query=query, docs=docs, score_threshold=score_threshold, top_n=top_n, user=user
-        )
-        return rerank_result, unique_documents
-
-    def fetch_multimodal_rerank(
-        self,
-        query: str,
-        documents: list[Document],
-        score_threshold: float | None = None,
-        top_n: int | None = None,
-        user: str | None = None,
-        query_type: QueryType = QueryType.TEXT_QUERY,
-    ) -> tuple[RerankResult, list[Document]]:
-        """
-        Fetch multimodal rerank
-        :param query: search query
-        :param documents: documents for reranking
-        :param score_threshold: score threshold
-        :param top_n: top n
-        :param user: unique user id if needed
-        :param query_type: query type
-        :return: rerank result
-        """
-        docs = []
-        doc_ids = set()
-        unique_documents = []
-        for document in documents:
-            if (
-                document.provider == "dify"
-                and document.metadata is not None
-                and document.metadata["doc_id"] not in doc_ids
-            ):
-                if document.metadata.get("doc_type") == DocType.IMAGE:
-                    # Query file info within db.session context to ensure thread-safe access
-                    upload_file = (
-                        db.session.query(UploadFile).where(UploadFile.id == document.metadata["doc_id"]).first()
-                    )
-                    if upload_file:
-                        blob = storage.load_once(upload_file.key)
-                        document_file_base64 = base64.b64encode(blob).decode()
-                        document_file_dict = {
-                            "content": document_file_base64,
-                            "content_type": document.metadata["doc_type"],
-                        }
-                        docs.append(document_file_dict)
-                else:
-                    document_text_dict = {
-                        "content": document.page_content,
-                        "content_type": document.metadata.get("doc_type") or DocType.TEXT,
-                    }
-                    docs.append(document_text_dict)
-                doc_ids.add(document.metadata["doc_id"])
-                unique_documents.append(document)
-            elif document.provider == "external":
-                if document not in unique_documents:
-                    docs.append(
-                        {
-                            "content": document.page_content,
-                            "content_type": document.metadata.get("doc_type") or DocType.TEXT,
-                        }
-                    )
-                    unique_documents.append(document)
-
-        documents = unique_documents
-        if query_type == QueryType.TEXT_QUERY:
-            rerank_result, unique_documents = self.fetch_text_rerank(query, documents, score_threshold, top_n, user)
-            return rerank_result, unique_documents
-        elif query_type == QueryType.IMAGE_QUERY:
-            # Query file info within db.session context to ensure thread-safe access
-            upload_file = db.session.query(UploadFile).where(UploadFile.id == query).first()
-            if upload_file:
-                blob = storage.load_once(upload_file.key)
-                file_query = base64.b64encode(blob).decode()
-                file_query_dict = {
-                    "content": file_query,
-                    "content_type": DocType.IMAGE,
-                }
-                rerank_result = self.rerank_model_instance.invoke_multimodal_rerank(
-                    query=file_query_dict, docs=docs, score_threshold=score_threshold, top_n=top_n, user=user
-                )
-                return rerank_result, unique_documents
-            else:
-                raise ValueError(f"Upload file not found for query: {query}")
-
-        else:
-            raise ValueError(f"Query type {query_type} is not supported")
--- a/Show More
+++ b/Show More