ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-05-23 09:28:06 +08:00

Author	SHA1	Message	Date
Jin Hai	1d0519d025	Fix secret key inconsistency cross the RAGFlow servers (#14591 ) ### What problem does this PR solve? A and B, two API servers and a REDIS server. If A and REDIS restart, B will hold the obsolete secret key and will lead to error. TODO: app.config['SECRET_KEY'] and app.secret_key still hold obsolete secret key. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-07 10:10:02 +08:00
euvre	2846a93998	Fix: Remove hardcoded page limits causing parsing failures on large PDFs (>300 pages) (#14382 ) ### What problem does this PR solve? Fixes #14196 ## Problem When using DeepDOC to parse large PDFs (over 1000 pages), the parser silently truncated processing at 300 pages due to a hardcoded default `page_to=299` in `RAGFlowPdfParser.__images__()`. This caused: - Errors on pages beyond the limit - Poor image quality as the parser attempted to compensate with missing page data - Inconsistent chunk splitting between full PDF imports and partial imports Additionally, the codebase scattered magic numbers (`299`, `600`, `10000`, `100000`, `100000000`, `10000000000`, `10*9`) across 22 files as sentinel values for "parse all pages", making future maintenance error-prone. ## Root Cause ```python # deepdoc/parser/pdf_parser.py (before) def __images__(self, fnm, zoomin=3, page_from=0, page_to=299, callback=None): # Only the first 300 pages were rendered; everything beyond was silently dropped ``` While most callers in `rag/app/.py` correctly passed `to_page=100000`, the base class `RAGFlowPdfParser.__call__()` and `parse_into_bboxes()` invoked `__images__` without forwarding `page_from`/`page_to`, falling back to the restrictive default of 299. ## Solution ### 1. Define constants in `common/constants.py` ```python MAXIMUM_PAGE_NUMBER = 100000 # Used by the parsing layer MAXIMUM_TASK_PAGE_NUMBER = MAXIMUM_PAGE_NUMBER * 1000 # Used by the task/DB layer ``` ### 2. Replace all hardcoded sentinel values \| Layer \| Files Changed \| Old Values \| New Value \| \|---\|---\|---\|---\| \| Deepdoc parsers \| `pdf_parser.py`, `mineru_parser.py`, `docling_parser.py`, `opendataloader_parser.py`, `paddleocr_parser.py`, `docx_parser.py` \| `299`, `600`, `109`, `100000000` \| `MAXIMUM_PAGE_NUMBER` \| \| Chunk parsers \| `naive.py`, `book.py`, `qa.py`, `one.py`, `manual.py`, `paper.py`, `presentation.py`, `laws.py`, `resume.py`, `email.py`, `table.py` \| `100000`, `10000`, `10000000000` \| `MAXIMUM_PAGE_NUMBER` \| \| Task/DB layer** \| `db_models.py`, `task_service.py`, `document_service.py`, `file_service.py` \| `100000000` \| `MAXIMUM_TASK_PAGE_NUMBER` \| ### 3. Fix `parse_into_bboxes()` missing parameters Added `from_page`/`to_page` parameters to `parse_into_bboxes()` so that the `rag/flow/parser/parser.py` DeepDOC path no longer falls back to the restrictive default. ## Files Changed (22) - `common/constants.py` - `deepdoc/parser/pdf_parser.py` - `deepdoc/parser/mineru_parser.py` - `deepdoc/parser/docling_parser.py` - `deepdoc/parser/opendataloader_parser.py` - `deepdoc/parser/paddleocr_parser.py` - `deepdoc/parser/docx_parser.py` - `rag/app/naive.py` - `rag/app/book.py` - `rag/app/qa.py` - `rag/app/one.py` - `rag/app/manual.py` - `rag/app/paper.py` - `rag/app/presentation.py` - `rag/app/laws.py` - `rag/app/resume.py` - `rag/app/email.py` - `rag/app/table.py` - `api/db/db_models.py` - `api/db/services/task_service.py` - `api/db/services/document_service.py` - `api/db/services/file_service.py` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-27 14:57:20 +08:00
euvre	9a785b26bd	fix: change file size column from IntegerField to BigIntegerField to support files > 2GB (#14148 ) ### What problem does this PR solve? Fixes #6034 Changes the `size` field in both `Document` and `File` models from `IntegerField` (32-bit, max ~2GB) to `BigIntegerField` (64-bit, max ~9.2EB), and adds corresponding database migrations. ## Problem When uploading a file larger than 2GB, the `size` value overflows a 32-bit signed integer (max 2,147,483,647). This causes: - The stored `size` wraps around to an incorrect value (e.g., a 3GB file shows as 2,097,152 KB in File Management). - Subsequent file operations (e.g., download) fail because the corrupted size leads to invalid storage lookups. ## Changes - `Document.size`: `IntegerField` → `BigIntegerField` - `File.size`: `IntegerField` → `BigIntegerField` - Added `alter_db_column_type` migrations in `migrate_db()` for both `document.size` and `file.size` columns to ensure existing deployments are upgraded automatically. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-16 15:43:29 +08:00
bitloi	853021ff2a	feat: support multiple canvas_types for agent templates and remove duplicate files (#14030 ) ### What problem does this PR solve? Closes #13907 The template catalog had duplicate files (e.g. `*_r.json`) only to place the same template into multiple sidebar groups. This increases maintenance cost and makes template updates error-prone. This PR adds first-class support for multiple template categories in a single file via `canvas_types`, then removes duplicate template files. What changed: - Added `canvas_types` to `CanvasTemplate` model and DB migration. - Added normalization logic when loading templates: - accepts legacy `canvas_type` - accepts new `canvas_types` - merges/deduplicates values - preserves backward compatibility by keeping `canvas_type` as first normalized value. - Updated template import flow to load only `.json` files and in stable sorted order. - Updated frontend template filtering to match on `canvas_types` first, with fallback to legacy `canvas_type`. - Consolidated duplicated template pairs into single files and removed: - `deep_search_r.json` - `reflective_academic_paper_generator_r.json` - `seo_article_writer_r.json` - Added regression/edge-case tests for category normalization and route serialization expectations. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2026-04-13 20:26:30 +08:00
Yongteng Lei	1b29522279	Fix: migrate_add_unique_email silently skips unique constraint (#13744 ) ### What problem does this PR solve? Fix migrate_add_unique_email-silently-skips-unique-constraint-when-non-unique-user_email-index-exists. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-24 20:24:24 +08:00
balibabu	6cae364ac2	Feat: Export Agent Logs. (#13658 ) ### What problem does this PR solve? Feat: Export Agent Logs. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-03-17 18:51:26 +08:00
Ethan T.	71804bf5bc	fix(db_models): guard MySQL-specific SQL in migration with DB_TYPE check (fixes #13544 ) (#13582 ) ## Summary Fixes #13544: PostgreSQL startup crash because `update_tenant_llm_to_id_primary_key()` unconditionally uses MySQL-specific SQL. - Split `update_tenant_llm_to_id_primary_key()` into `_update_tenant_llm_to_id_primary_key_mysql()` and `_update_tenant_llm_to_id_primary_key_postgres()`, dispatching on `settings.DATABASE_TYPE` - MySQL path: unchanged (existing `DATABASE()`, `SET @row = 0`, `AUTO_INCREMENT`, `DROP PRIMARY KEY` logic) - PostgreSQL path: uses `current_database()`, `ROW_NUMBER() OVER (ORDER BY ...)` for sequential IDs, `CREATE SEQUENCE` + `nextval()` for auto-increment, and `information_schema.table_constraints` to find the PK constraint name - Also fix `migrate_add_unique_email()`: MySQL-only `information_schema.statistics` is replaced with `pg_indexes` on PostgreSQL ## Test plan - [ ] Start RAGFlow with `DB_TYPE=postgres` — startup should complete without `function database() does not exist` error - [ ] Start RAGFlow with `DB_TYPE=mysql` (default) — existing behaviour unchanged, migration runs as before - [ ] Fresh PostgreSQL install: verify `tenant_llm.id` column is created as a serial primary key after migration - [ ] Idempotency: running migration twice on PostgreSQL should be a no-op (column already exists check passes) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: gambletan <gambletan@github> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-13 11:53:01 +08:00
balibabu	aaf900cf16	Feat: Display release status in agent version history. (#13479 ) ### What problem does this PR solve? Feat: Display release status in agent version history. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-03-10 14:25:27 +08:00
BitToby	383986dc5f	fix: re-chunk documents when data source content is updated (#12918 ) Closes: #12889 ### What problem does this PR solve? When syncing external data sources (e.g., Jira, Confluence, Google Drive), updated documents were not being re-chunked. The raw content was correctly updated in blob storage, but the vector database retained stale chunks, causing search results to return outdated information. Root cause: The task digest used for chunk reuse optimization was calculated only from parser configuration fields (`parser_id`, `parser_config`, `kb_id`, etc.), without any content-dependent fields. When a document's content changed but the parser configuration remained the same, the system incorrectly reused old chunks instead of regenerating new ones. Example scenario: 1. User syncs a Jira issue: "Meeting scheduled for Monday" 2. User updates the Jira issue to: "Meeting rescheduled to Friday" 3. User triggers sync again 4. Raw content panel shows updated text ✓ 5. Chunk panel still shows old text "Monday" ✗ Solution: 1. Include `update_time` and `size` in the chunking config, so the task digest changes when document content is updated 2. Track updated documents separately in `upload_document()` and return them for processing 3. Process updated documents through the re-parsing pipeline to regenerate chunks [1.webm](https://github.com/user-attachments/assets/d21d4dcd-e189-4d39-8700-053bae0ca5a0) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 12:48:47 +08:00
Lynn	62cb292635	Feat/tenant model (#13072 ) ### What problem does this PR solve? Add id for table tenant_llm and apply in LLMBundle. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:27:17 +08:00
Magicbook1108	47540a4147	Feat: published agent version control (#13410 ) ### What problem does this PR solve? Feat: published agent version control ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-05 17:26:39 +08:00
as-ondewo	194e076e26	Fix: init superuser can create duplicate users (#13221 ) ### What problem does this PR solve? This PR fixes 2 bugs related to RAGFlow's init superuser functionality. #### Bug 1 When the RAGFlow server was started with the `--init-superuser` option it would always create a new admin user even if it already exists resulting in duplicate users. To fix this, I added an additional check before create the superuser and added the unique constraint to the email column of the database, to mitigate potential TOCTOU race conditions. Since existing databases could contain duplicate emails I added email de-duplication to the database migration. #### Bug 2 When the RAGFlow server was started with the `--init-superuser` option but without configured default LLM and embedding models it would fail to start because the `init_superuser` function would always make test request to the models even if they were not set. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 19:55:51 +08:00
Kevin Hu	1262533b74	Feat: support verify to set llm key and boost bigrams. (#12980 ) #12863 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-05 19:19:09 +08:00
Liu An	1b587013d8	Fix: remove unused imports and f-string formatting (#12935 ) ### What problem does this PR solve? - Remove unused imports (Mock, patch, MagicMock, json, os, RAGFLOW_COLUMNS, VECTOR_FIELD_PATTERN) from multiple files - Replace f-string formatting with regular strings for console output messages in cli.py - Clean up unnecessary imports that were no longer being used in the codebase ### Type of change - [x] Refactoring	2026-02-02 12:11:39 +08:00
NTLx	c4c3f744c0	feat: add Peewee ORM support for OceanBase as primary database (#12769 ) (#12926 ) ## Summary This PR adds Peewee ORM support for OceanBase as the primary database in RAGFlow, as requested in issue #12769. ## Changes ### Core Implementation 1. RetryingPooledOceanBaseDatabase Class - Inherits from `PooledMySQLDatabase` (OceanBase is MySQL-compatible) - Implements retry mechanism for connection issues - Handles MySQL-specific error codes (2013, 2006 for connection loss) - Provides connection pool management 2. PooledDatabase Enum - Added `OCEANBASE = RetryingPooledOceanBaseDatabase` 3. DatabaseLock Enum - Added `OCEANBASE = MysqlDatabaseLock` - OceanBase uses MySQL-style locking 4. TextFieldType Enum - Added `OCEANBASE = "LONGTEXT"` - OceanBase uses same text field type as MySQL 5. DatabaseMigrator Enum - Added `OCEANBASE = MySQLMigrator` - OceanBase uses MySQL migration tools ### Usage ```bash # Set environment variable to use OceanBase export DB_TYPE=oceanbase # Configure connection (in docker/.env or environment) OCEANBASE_HOST=localhost OCEANBASE_PORT=2881 OCEANBASE_USER=root OCEANBASE_PASSWORD=password OCEANBASE_DATABASE=ragflow ``` ### Technical Details - Location: `api/db/db_models.py` - Dependencies: No new dependencies (uses existing Peewee MySQL support) - Code Size: ~90 lines - Difficulty: Simple ### Testing - Added comprehensive unit tests in `tests/unit/test_oceanbase_peewee.py` - Tests cover: - OceanBase database class existence and inheritance - Enum values for PooledDatabase, DatabaseLock, TextFieldType - Initialization with custom retry settings - Environment variable configuration ### Acceptance Criteria ✅ Can switch to OceanBase database via `DB_TYPE=oceanbase` environment variable ✅ All database operations work normally in OceanBase environment ✅ OceanBase uses MySQL compatibility mode (no additional dependencies) ### Background This is part of the RAGFlow + OceanBase Hackathon to allow users to choose OceanBase as RAGFlow's primary database, leveraging OceanBase's high availability and scalability. --- ## Related Issues - Primary: https://github.com/infiniflow/ragflow/issues/12769 - Context: https://github.com/oceanbase/seekdb/issues/123 (OceanBase Developer Challenge) --- Closes infiniflow/ragflow#12769	2026-01-31 15:45:20 +08:00
qinling0210	9a5208976c	Put document metadata in ES/Infinity (#12826 ) ### What problem does this PR solve? Put document metadata in ES/Infinity. Index name of meta data: ragflow_doc_meta_{tenant_id} ### Type of change - [x] Refactoring	2026-01-28 13:29:34 +08:00
Zhichang Yu	fd11aca8e5	feat: Implement pluggable multi-provider sandbox architecture (#12820 ) ## Summary Implement a flexible sandbox provider system supporting both self-managed (Docker) and SaaS (Aliyun Code Interpreter) backends for secure code execution in agent workflows. Key Changes: - ✅ Aliyun Code Interpreter provider using official `agentrun-sdk>=0.0.16` - ✅ Self-managed provider with gVisor (runsc) security - ✅ Arguments parameter support for dynamic code execution - ✅ Database-only configuration (removed fallback logic) - ✅ Configuration scripts for quick setup Issue #12479 ## Features ### 🔌 Provider Abstraction Layer 1. Self-Managed Provider (`agent/sandbox/providers/self_managed.py`) - Wraps existing executor_manager HTTP API - gVisor (runsc) for secure container isolation - Configurable pool size, timeout, retry logic - Languages: Python, Node.js, JavaScript - ⚠️ Requires: gVisor installation, Docker, base images 2. Aliyun Code Interpreter (`agent/sandbox/providers/aliyun_codeinterpreter.py`) - SaaS integration using official agentrun-sdk - Serverless microVM execution with auto-authentication - Hard timeout: 30 seconds max - Credentials: `AGENTRUN_ACCESS_KEY_ID`, `AGENTRUN_ACCESS_KEY_SECRET`, `AGENTRUN_ACCOUNT_ID`, `AGENTRUN_REGION` - Automatically wraps code to call `main()` function 3. E2B Provider (`agent/sandbox/providers/e2b.py`) - Placeholder for future integration ### ⚙️ Configuration System - `conf/system_settings.json`: Default provider = `aliyun_codeinterpreter` - `agent/sandbox/client.py`: Enforces database-only configuration - Admin UI: `/admin/sandbox-settings` - Configuration validation via `validate_config()` method - Health checks for all providers ### 🎯 Key Capabilities Arguments Parameter Support: All providers support passing arguments to `main()` function: ```python # User code def main(name: str, count: int) -> dict: return {"message": f"Hello {name}!" * count} # Executed with: arguments={"name": "World", "count": 3} # Result: {"message": "Hello World!Hello World!Hello World!"} ``` Self-Describing Providers: Each provider implements `get_config_schema()` returning form configuration for Admin UI Error Handling: Structured `ExecutionResult` with stdout, stderr, exit_code, execution_time ## Configuration Scripts Two scripts for quick Aliyun sandbox setup: Shell Script (requires jq): ```bash source scripts/configure_aliyun_sandbox.sh ``` Python Script (interactive): ```bash python3 scripts/configure_aliyun_sandbox.py ``` ## Testing ```bash # Unit tests uv run pytest agent/sandbox/tests/test_providers.py -v # Aliyun provider tests uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter.py -v # Integration tests (requires credentials) uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py -v # Quick SDK validation python3 agent/sandbox/tests/verify_sdk.py ``` Test Coverage: - 30 unit tests for provider abstraction - Provider-specific tests for Aliyun - Integration tests with real API - Security tests for executor_manager ## Documentation - `docs/develop/sandbox_spec.md` - Complete architecture specification - `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration from legacy sandbox - `agent/sandbox/tests/QUICKSTART.md` - Quick start guide - `agent/sandbox/tests/README.md` - Testing documentation ## Breaking Changes ⚠️ Migration Required: 1. Directory Move: `sandbox/` → `agent/sandbox/` - Update imports: `from sandbox.` → `from agent.sandbox.` 2. Mandatory Configuration: - SystemSettings must have `sandbox.provider_type` configured - Removed fallback default values - Configuration must exist in database (from `conf/system_settings.json`) 3. Aliyun Credentials: - Requires `AGENTRUN_` environment variables (not `ALIYUN_`) - `AGENTRUN_ACCOUNT_ID` is now required (Aliyun primary account ID) 4. Self-Managed Provider: - gVisor (runsc) must be installed for security - Install: `go install gvisor.dev/gvisor/runsc@latest` ## Database Schema Changes ```python # SystemSettings.value: CharField → TextField api/db/db_models.py: Changed for unlimited config length # SystemSettingsService.get_by_name(): Fixed query precision api/db/services/system_settings_service.py: startswith → exact match ``` ## Files Changed ### Backend (Python) - `agent/sandbox/providers/base.py` - SandboxProvider ABC interface - `agent/sandbox/providers/manager.py` - ProviderManager - `agent/sandbox/providers/self_managed.py` - Self-managed provider - `agent/sandbox/providers/aliyun_codeinterpreter.py` - Aliyun provider - `agent/sandbox/providers/e2b.py` - E2B provider (placeholder) - `agent/sandbox/client.py` - Unified client (enforces DB-only config) - `agent/tools/code_exec.py` - Updated to use provider system - `admin/server/services.py` - SandboxMgr with registry & validation - `admin/server/routes.py` - 5 sandbox API endpoints - `conf/system_settings.json` - Default: aliyun_codeinterpreter - `api/db/db_models.py` - TextField for SystemSettings.value - `api/db/services/system_settings_service.py` - Exact match query ### Frontend (TypeScript/React) - `web/src/pages/admin/sandbox-settings.tsx` - Settings UI - `web/src/services/admin-service.ts` - Sandbox service functions - `web/src/services/admin.service.d.ts` - Type definitions - `web/src/utils/api.ts` - Sandbox API endpoints ### Documentation - `docs/develop/sandbox_spec.md` - Architecture spec - `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration guide - `agent/sandbox/tests/QUICKSTART.md` - Quick start - `agent/sandbox/tests/README.md` - Testing guide ### Configuration Scripts - `scripts/configure_aliyun_sandbox.sh` - Shell script (jq) - `scripts/configure_aliyun_sandbox.py` - Python script ### Tests - `agent/sandbox/tests/test_providers.py` - 30 unit tests - `agent/sandbox/tests/test_aliyun_codeinterpreter.py` - Provider tests - `agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py` - Integration tests - `agent/sandbox/tests/verify_sdk.py` - SDK validation ## Architecture ``` Admin UI → Admin API → SandboxMgr → ProviderManager → [SelfManaged\|Aliyun\|E2B] ↓ SystemSettings ``` ## Usage ### 1. Configure Provider Via Admin UI: 1. Navigate to `/admin/sandbox-settings` 2. Select provider (Aliyun Code Interpreter / Self-Managed) 3. Fill in configuration 4. Click "Test Connection" to verify 5. Click "Save" to apply Via Configuration Scripts: ```bash # Aliyun provider export AGENTRUN_ACCESS_KEY_ID="xxx" export AGENTRUN_ACCESS_KEY_SECRET="yyy" export AGENTRUN_ACCOUNT_ID="zzz" export AGENTRUN_REGION="cn-shanghai" source scripts/configure_aliyun_sandbox.sh ``` ### 2. Restart Service ```bash cd docker docker compose restart ragflow-server ``` ### 3. Execute Code in Agent ```python from agent.sandbox.client import execute_code result = execute_code( code='def main(name: str) -> dict: return {"message": f"Hello {name}!"}', language="python", timeout=30, arguments={"name": "World"} ) print(result.stdout) # {"message": "Hello World!"} ``` ## Troubleshooting ### "Container pool is busy" (Self-Managed) - Cause: Pool exhausted (default: 1 container in `.env`) - Fix: Increase `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` to 5+ ### "Sandbox provider type not configured" - Cause: Database missing configuration - Fix: Run config script or set via Admin UI ### "gVisor not found" - Cause: runsc not installed - Fix: `go install gvisor.dev/gvisor/runsc@latest && sudo cp ~/go/bin/runsc /usr/local/bin/` ### Aliyun authentication errors - Cause: Wrong environment variable names - Fix: Use `AGENTRUN_` prefix (not `ALIYUN_`) ## Checklist - [x] All tests passing (30 unit tests + integration tests) - [x] Documentation updated (spec, migration guide, quickstart) - [x] Type definitions added (TypeScript) - [x] Admin UI implemented - [x] Configuration validation - [x] Health checks implemented - [x] Error handling with structured results - [x] Breaking changes documented - [x] Configuration scripts created - [x] gVisor requirements documented Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-28 13:28:21 +08:00
Mohan	0a8eb11c3d	fix: Add proper error handling for database reconnection attempts (#12650 ) ## Problem When database connection is lost, the reconnection logic had a bug: if the first reconnect attempt failed, the second attempt was not wrapped in error handling, causing unhandled exceptions. ## Solution Added proper try-except blocks around the second reconnect attempt in both MySQL and PostgreSQL database classes to ensure errors are properly logged and handled. ## Changes - Fixed `_handle_connection_loss()` in `RetryingPooledMySQLDatabase` - Fixed `_handle_connection_loss()` in `RetryingPooledPostgresqlDatabase` Fixes #12294 --- Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=158349177 Co-authored-by: SID <158349177+0xsid0703@users.noreply.github.com>	2026-01-19 09:48:10 +08:00
Jin Hai	14c250e3d7	Fix adding column error (#12503 ) ### What problem does this PR solve? 1. Fix redundant column adding 2. Refactor the code ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-08 16:44:53 +08:00
Jin Hai	5ebe334a2f	Refactor setting type (#12425 ) ### What problem does this PR solve? Refactor setting type ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-04 20:26:12 +08:00
Jin Hai	ac9113b0ef	feature: add system setting service (#12408 ) ### What problem does this PR solve? #12409 ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-04 14:21:39 +08:00
Jin Hai	6044314811	Fix text issue (#12221 ) ### What problem does this PR solve? Fix several text issues. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-26 11:18:08 +08:00
Lynn	a1164b9c89	Feat/memory (#11812 ) ### What problem does this PR solve? Manage and display memory datasets. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-10 13:34:08 +08:00
hsparks-codes	237a66913b	Feat: RAG evaluation (#11674 ) ### What problem does this PR solve? Feature: This PR implements a comprehensive RAG evaluation framework to address issue #11656. Problem: Developers using RAGFlow lack systematic ways to measure RAG accuracy and quality. They cannot objectively answer: 1. Are RAG results truly accurate? 2. How should configurations be adjusted to improve quality? 3. How to maintain and improve RAG performance over time? Solution: This PR adds a complete evaluation system with: - Dataset & test case management - Create ground truth datasets with questions and expected answers - Automated evaluation - Run RAG pipeline on test cases and compute metrics - Comprehensive metrics - Precision, recall, F1 score, MRR, hit rate for retrieval quality - Smart recommendations - Analyze results and suggest specific configuration improvements (e.g., "increase top_k", "enable reranking") - 20+ REST API endpoints - Full CRUD operations for datasets, test cases, and evaluation runs Impact: Enables developers to objectively measure RAG quality, identify issues, and systematically improve their RAG systems through data-driven configuration tuning. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-03 17:00:58 +08:00
Yongteng Lei	9d8b96c1d0	Feat: add context for figure and table (#11547 ) ### What problem does this PR solve? Add context for figure table. ![demo_figure_table_context](https://github.com/user-attachments/assets/61b37fac-e22e-40a4-9665-9396c7b4103e) `==================()` for demonstrating purpose. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-27 10:21:44 +08:00
Kevin Hu	d1716d865a	Feat: Alter flask to Quart for async API serving. (#11275 ) ### What problem does this PR solve? #11277 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-18 17:05:16 +08:00
Jin Hai	61cf430dbb	Minor tweats (#11271 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-16 19:29:20 +08:00
Kevin Hu	d207291217	Fix: add download stats to kb logs. (#11112 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-10 13:28:07 +08:00
Kevin Hu	34283d4db4	Feat: add data source to pipleline logs . (#11075 ) ### What problem does this PR solve? #10953 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-07 11:43:59 +08:00
Jin Hai	f98b24c9bf	Move api.settings to common.settings (#11036 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-06 09:36:38 +08:00
Jin Hai	bab3fce136	Move some constants to common (#11004 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 08:01:39 +08:00
Jin Hai	16d2be623c	Minor tweaks (#10987 ) ### What problem does this PR solve? 1. Rename identifier name 2. Fix some return statement 3. Fix some typos ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 14:15:31 +08:00
Kevin Hu	3e5a39482e	Feat: Support multiple data sources synchronizations (#10954 ) ### What problem does this PR solve? #10953 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-03 19:59:18 +08:00
Jin Hai	6447b737ab	Move singleton to common directory (#10935 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-02 12:24:08 +08:00
Billy Bao	55eb525fdc	Feat: rename file to avoid package name conflict (#10863 ) ### What problem does this PR solve? Feat: rename file to avoid package name conflict ### Type of change - [x] Refactoring	2025-10-29 12:19:57 +08:00
Jin Hai	5a200f7652	Add time utils (#10849 ) ### What problem does this PR solve? - Add time utilities and unit tests ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-10-28 19:09:14 +08:00
Andrea Bugeja	8a41057236	Fix: Add RetryingPooledPostgresqlDatabase to handle max_retries param (#10524 ) ## What problem does this PR solve? Fixes the PostgreSQL connection error that prevents RAGFlow from starting: peewee.ProgrammingError: invalid dsn: invalid connection option "max_retries" ## Problem Analysis The `BaseDataBase` class in `api/db/db_models.py` adds `max_retries` and `retry_delay` to the database configuration dict before passing it to the database connection constructor. - MySQL: Has `RetryingPooledMySQLDatabase` class that properly extracts these custom parameters using `kwargs.pop()` before calling the parent constructor - PostgreSQL: Was using the base `PooledPostgresqlDatabase` class which passes all parameters directly to `psycopg2.connect()`, which doesn't recognize `max_retries` as a valid connection option ## Solution Created `RetryingPooledPostgresqlDatabase` class that: - Extracts `max_retries` and `retry_delay` parameters before initialization - Implements retry logic with exponential backoff for connection failures - Handles PostgreSQL-specific connection errors (connection refused, server closed, etc.) - Mirrors the existing `RetryingPooledMySQLDatabase` implementation Updated the `PooledDatabase` enum to use the new retrying class for PostgreSQL. ## Benefits ✅ Prevents invalid connection parameters from being passed to psycopg2 ✅ Adds automatic retry logic for PostgreSQL connection failures ✅ Provides better error logging for PostgreSQL-specific issues ✅ Maintains consistency between MySQL and PostgreSQL database handling ## Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ## Testing Tested with PostgreSQL database configuration and verified: - Server starts without the "invalid dsn" error - Database connections are established successfully - Retry logic works correctly on connection failures Co-authored-by: Andrea Bugeja <andrea.bugeja@gig.com>	2025-10-16 15:08:41 +08:00
Günter Lukas	0283e4098f	Fix #10408 (#10471 ) ### What problem does this PR solve? Google Cloud model does not work correctly with gemini-2.5 models Close #10408 ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-10-10 19:18:24 +08:00
Kevin Hu	cbf04ee470	Feat: Use data pipeline to visualize the parsing configuration of the knowledge base (#10423 ) ### What problem does this PR solve? #9869 ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: jinhai <haijin.chn@gmail.com> Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: chanx <1243304602@qq.com> Co-authored-by: balibabu <cike8899@users.noreply.github.com> Co-authored-by: Lynn <lynn_inf@hotmail.com> Co-authored-by: 纷繁下的无奈 <zhileihuang@126.com> Co-authored-by: huangzl <huangzl@shinemo.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Wilmer <33392318@qq.com> Co-authored-by: Adrian Weidig <adrianweidig@gmx.net> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: Liu An <asiro@qq.com> Co-authored-by: buua436 <66937541+buua436@users.noreply.github.com> Co-authored-by: BadwomanCraZY <511528396@qq.com> Co-authored-by: cucusenok <31804608+cucusenok@users.noreply.github.com> Co-authored-by: Russell Valentine <russ@coldstonelabs.org> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Billy Bao <newyorkupperbay@gmail.com> Co-authored-by: Zhedong Cen <cenzhedong2@126.com> Co-authored-by: TensorNull <129579691+TensorNull@users.noreply.github.com> Co-authored-by: TensorNull <tensor.null@gmail.com> Co-authored-by: TeslaZY <TeslaZY@outlook.com> Co-authored-by: Ajay <160579663+aybanda@users.noreply.github.com> Co-authored-by: AB <aj@Ajays-MacBook-Air.local> Co-authored-by: 天海蒼灆 <huangaoqin@tecpie.com> Co-authored-by: He Wang <wanghechn@qq.com> Co-authored-by: Atsushi Hatakeyama <atu729@icloud.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Mohamed Mathari <155896313+melmathari@users.noreply.github.com> Co-authored-by: Mohamed Mathari <nocodeventure@Mac-mini-van-Mohamed.fritz.box> Co-authored-by: Stephen Hu <stephenhu@seismic.com> Co-authored-by: Shaun Zhang <zhangwfjh@users.noreply.github.com> Co-authored-by: zhimeng123 <60221886+zhimeng123@users.noreply.github.com> Co-authored-by: mxc <mxc@example.com> Co-authored-by: Dominik Novotný <50611433+SgtMarmite@users.noreply.github.com> Co-authored-by: EVGENY M <168018528+rjohny55@users.noreply.github.com> Co-authored-by: mcoder6425 <mcoder64@gmail.com> Co-authored-by: lemsn <lemsn@msn.com> Co-authored-by: lemsn <lemsn@126.com> Co-authored-by: Adrian Gora <47756404+adagora@users.noreply.github.com> Co-authored-by: Womsxd <45663319+Womsxd@users.noreply.github.com> Co-authored-by: FatMii <39074672+FatMii@users.noreply.github.com>	2025-10-09 12:36:19 +08:00
Jin Hai	b0b866c8fd	Refactor: move some functions out of api/utils/__init__.py (#10216 ) ### What problem does this PR solve? Refactor import modules. ### Type of change - [x] Refactoring --------- Signed-off-by: jinhai <haijin.chn@gmail.com> Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-09-25 18:04:49 +08:00
lemsn	3a831d0c28	Fixed the issue where database connections were interrupted under high concurrency (#10126 ) ### What problem does this PR solve? Fixed the issue where database connections were interrupted under high concurrency ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: lemsn <lemsn@126.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-09-25 17:03:43 +08:00
Yongteng Lei	c832e0b858	Feat: add `canvas_category` field for UserCanvas and CanvasTemplate (#9885 ) ### What problem does this PR solve? Add `canvas_category` field for UserCanvas and CanvasTemplate. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-09-03 14:55:24 +08:00
Kevin Hu	a1633e0a2f	Fix: second round value removal. (#9756 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-28 09:34:47 +08:00
Yongteng Lei	188c0f614b	Refa: refine search app (#9536 ) ### What problem does this PR solve? Refine search app. ### Type of change - [x] Refactoring	2025-08-19 09:33:33 +08:00
Yongteng Lei	ffc095bd50	Feat: conversation completion can specify different model (#9485 ) ### What problem does this PR solve? Conversation completion can specify different model ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-08-15 17:44:58 +08:00
writinwaters	6e862553cb	Docs: Deprecated 'Create session with agent' (#9464 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2025-08-14 12:13:11 +08:00
Kevin Hu	153e430b00	Feat: add meta data filter. (#9405 ) ### What problem does this PR solve? #8531 #7417 #6761 #6573 #6477 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-08-12 14:12:56 +08:00
Kevin Hu	d9fe279dde	Feat: Redesign and refactor agent module (#9113 ) ### What problem does this PR solve? #9082 #6365 <u> WARNING: it's not compatible with the older version of `Agent` module, which means that `Agent` from older versions can not work anymore.</u> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-30 19:41:09 +08:00
Yongteng Lei	5cc570f5e0	Refa: suppress DB migration error logs (#9043 ) ### What problem does this PR solve? Suppress DB migration error logs. ### Type of change - [x] Refactoring	2025-07-25 12:38:07 +08:00
Kevin Hu	f2909ea0c4	Perf: retryable mysql connection. (#8858 ) ### What problem does this PR solve? ### Type of change - [x] Performance Improvement	2025-07-15 19:05:48 +08:00

1 2 3

130 Commits