ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-03-06 08:06:43 +08:00

Author	SHA1	Message	Date
Lynn	62cb292635	Feat/tenant model (#13072 ) ### What problem does this PR solve? Add id for table tenant_llm and apply in LLMBundle. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:27:17 +08:00
PandaMan	d43aebe701	Fix/13142 auto metadata (#13217 ) ### What problem does this PR solve? Close #13142 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-26 10:25:48 +08:00
Phives	4ceb668d40	feat(api/utils): Harden file_utils for robustness and edge cases (#12915 ) ## Summary Improves robustness and edge-case handling in `api.utils.file_utils` to avoid crashes, DoS/OOM risks, and timeouts when processing user-provided filenames, paths, and file blobs. ## Changes ### Resource limits & timeouts - `MAX_BLOB_SIZE_THUMBNAIL` (50 MiB) and `MAX_BLOB_SIZE_PDF` (100 MiB) to reject oversized inputs before thumbnail/PDF processing. - `GHOSTSCRIPT_TIMEOUT_SEC` (120 s) for `repair_pdf_with_ghostscript` subprocess to avoid hangs on malicious or broken PDFs. ### `filename_type` - Handles `None`, empty string, non-string (e.g. int/list), and path-only input via new `_normalize_filename_for_type()`. - Uses basename for type detection (e.g. `a/b/c.pdf` → PDF). - Enforces `FILE_NAME_LEN_LIMIT`; invalid input returns `FileType.OTHER`. ### `thumbnail_img` - Rejects `None`/empty/oversized blob and invalid filename; returns `None` instead of raising. - Wraps PDF, image, and PPT handling in try/except so corrupt or malformed files return `None`. - Ensures PDF has pages and PPT has slides before use. - Normalizes PIL image mode (RGBA/P/LA → RGB) for safe PNG export. ### `repair_pdf_with_ghostscript` - Handles `None`/empty input; skips repair when input size exceeds limit. - Uses `subprocess.run(..., timeout=GHOSTSCRIPT_TIMEOUT_SEC)` and catches `TimeoutExpired`. - Returns original bytes when Ghostscript output is empty. ### `read_potential_broken_pdf` - `None` → `b""`; non–sequence-like (no `len`) → `b""`; empty → return as-is. - Oversized blob returned as-is (no repair) to avoid DoS. ### `sanitize_path` - Explicit `None` and non-string check; strips whitespace before normalizing. ## Testing - `test/unit_test/utils/test_api_file_utils.py` added with 36 unit tests covering the above behavior (filename_type, sanitize_path, read_potential_broken_pdf, thumbnail_img, thumbnail, repair_pdf_with_ghostscript, constants). - All tests pass. --------- Co-authored-by: Gittensor Miner <miner@gittensor.io>	2026-02-25 14:34:47 +08:00
PandaMan	f4cbdc3a3b	fix(api): MinIO health check use dynamic scheme and verify (Closes #13159 and #13158 ) (#13197 ) ## Summary Fixes MinIO SSL/TLS support in two places: the MinIO client connection and the health check used by the Admin/Service Health dashboard. Both now respect the `secure` and `verify` settings from the MinIO configuration. Closes #13158 Closes #13159 --- ## Problem #13158 – MinIO client: The client in `rag/utils/minio_conn.py` was hardcoded with `secure=False`, so RAGFlow could not connect to MinIO over HTTPS even when `secure: true` was set in config. There was also no way to disable certificate verification for self-signed certs. #13159 – MinIO health check: In `api/utils/health_utils.py`, the MinIO liveness check always used `http://` for the health URL. When MinIO was configured with SSL, the health check failed and the dashboard showed "timeout" even though MinIO was reachable over HTTPS. --- ## Solution ### MinIO client (`rag/utils/minio_conn.py`) - Read `MINIO.secure` (default `false`) and pass it into the `Minio()` constructor so HTTPS is used when configured. - Add `_build_minio_http_client()` that reads `MINIO.verify` (default `true`). When `verify` is false, return an `urllib3.PoolManager` with `cert_reqs=ssl.CERT_NONE` and pass it as `http_client` to `Minio()` so self-signed certificates are accepted. - Support string values for `secure` and `verify` (e.g. `"true"`, `"false"`). ### MinIO health check (`api/utils/health_utils.py`) - Add `_minio_scheme_and_verify()` to derive URL scheme (http/https) and the `verify` flag from `MINIO.secure` and `MINIO.verify`. - Update `check_minio_alive()` to use the correct scheme, pass `verify` into `requests.get(..., verify=verify)`, and use `timeout=10`. ### Config template (`docker/service_conf.yaml.template`) - Add commented optional MinIO keys `secure` and `verify` (and env vars `MINIO_SECURE`, `MINIO_VERIFY`) so deployers know they can enable HTTPS and optional cert verification. ### Tests - `test/unit_test/utils/test_health_utils_minio.py` – Tests for `_minio_scheme_and_verify()` and `check_minio_alive()` (scheme, verify, status codes, timeout, errors). - `test/unit_test/utils/test_minio_conn_ssl.py` – Tests for `_build_minio_http_client()` (verify true/false/missing, string values, `CERT_NONE` when verify is false). --- ## Testing - Unit tests added/updated as above; run with the project's test runner. - Manually: configure MinIO with HTTPS and `secure: true` (and optionally `verify: false` for self-signed); confirm client operations work and the Service Health dashboard shows MinIO as alive instead of timeout.	2026-02-25 09:47:12 +08:00
Yongteng Lei	c292d617ca	Fix: stored XSS via HTML File upload and inline Rendering in file get (#13202 ) ### What problem does this PR solve? Fix stored XSS via HTML file upload and inline rendering in /v1/file/get/<id> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-25 09:46:48 +08:00
Magicbook1108	5de92e57d3	Fix: 'None None' in log (#13192 ) ### What problem does this PR solve? Fix: 'None None' in log ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-24 19:15:20 +08:00
Kevin Hu	f1c2fac03e	Refa: remove ppt image. (#12909 ) ### What problem does this PR solve? remove `aspose` ### Type of change - [x] Refactoring	2026-01-30 13:35:42 +08:00
Angel98518	98b6a0e6d1	feat: Add OceanBase Performance Monitoring and Health Check Integration (#12886 ) ## Description This PR implements comprehensive OceanBase performance monitoring and health check functionality as requested in issue #12772. The implementation follows the existing ES/Infinity health check patterns and provides detailed metrics for operations teams. ## Problem Currently, RAGFlow lacks detailed health monitoring for OceanBase when used as the document engine. Operations teams need visibility into: - Connection status and latency - Storage space usage - Query throughput (QPS) - Slow query statistics - Connection pool utilization ## Solution ### 1. Enhanced OBConnection Class (`rag/utils/ob_conn.py`) Added comprehensive performance monitoring methods: - `get_performance_metrics()` - Main method returning all performance metrics - `_get_storage_info()` - Retrieves database storage usage - `_get_connection_pool_stats()` - Gets connection pool statistics - `_get_slow_query_count()` - Counts queries exceeding threshold - `_estimate_qps()` - Estimates queries per second - Enhanced `health()` method with connection status ### 2. Health Check Utilities (`api/utils/health_utils.py`) Added two new functions following ES/Infinity patterns: - `get_oceanbase_status()` - Returns OceanBase status with health and performance metrics - `check_oceanbase_health()` - Comprehensive health check with detailed metrics ### 3. API Endpoint (`api/apps/system_app.py`) Added new endpoint: - `GET /v1/system/oceanbase/status` - Returns OceanBase health status and performance metrics ### 4. Comprehensive Unit Tests (`test/unit_test/utils/test_oceanbase_health.py`) Added 340+ lines of unit tests covering: - Health check success/failure scenarios - Performance metrics retrieval - Error handling and edge cases - Connection pool statistics - Storage information retrieval - QPS estimation - Slow query detection ## Metrics Provided - Connection Status: connected/disconnected - Latency: Query latency in milliseconds - Storage: Used and total storage space - QPS: Estimated queries per second - Slow Queries: Count of queries exceeding threshold - Connection Pool: Active connections, max connections, pool size ## Testing - All unit tests pass - Error handling tested for connection failures - Edge cases covered (missing tables, connection errors) - Follows existing code patterns and conventions ## Code Statistics - Total Lines Changed: 665+ lines - New Code: ~600 lines - Test Coverage: 340+ lines of comprehensive tests - Files Modified: 3 - Files Created: 1 (test file) ## Acceptance Criteria Met ✅ `/system/oceanbase/status` API returns OceanBase health status ✅ Monitoring metrics accurately reflect OceanBase running status ✅ Clear error messages when health checks fail ✅ Response time optimized (metrics cached where possible) ✅ Follows existing ES/Infinity health check patterns ✅ Comprehensive test coverage ## Related Files - `rag/utils/ob_conn.py` - OceanBase connection class - `api/utils/health_utils.py` - Health check utilities - `api/apps/system_app.py` - System API endpoints - `test/unit_test/utils/test_oceanbase_health.py` - Unit tests Fixes #12772 --------- Co-authored-by: Daniel <daniel@example.com>	2026-01-30 09:44:42 +08:00
Kevin Hu	927db0b373	Refa: asyncio.to_thread to ThreadPoolExecutor to break thread limitat… (#12716 ) ### Type of change - [x] Refactoring	2026-01-20 13:29:37 +08:00
Jin Hai	38f0a92da9	Use RAGFlow CLI to replace RAGFlow Admin CLI (#12653 ) ### What problem does this PR solve? ``` $ python admin/client/ragflow_cli.py -t user -u aaa@aaa.com -p 9380 ragflow> list datasets; ragflow> list default models; ragflow> show version; ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-17 17:52:38 +08:00
6ba3i	2b20d0b3bb	Fix : Web API tests by normalizing errors, validation, and uploads (#12620 ) ### What problem does this PR solve? Fixes web API behavior mismatches that caused test failures by normalizing error responses, tightening validations, correcting error messages, and closing upload file handles. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-16 11:09:22 +08:00
Magicbook1108	b40a7b2e7d	Feat: Hash doc id to avoid duplicate name. (#12573 ) ### What problem does this PR solve? Feat: Hash doc id to avoid duplicate name. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-15 14:02:15 +08:00
6ba3i	0795616b34	Align p3 HTTP/SDK tests with current backend behavior (#12563 ) ### What problem does this PR solve? Updates pre-existing HTTP API and SDK tests to align with current backend behavior (validation errors, 404s, and schema defaults). This ensures p3 regression coverage is accurate without changing production code. ### Type of change - [x] Other (please describe): align p3 HTTP/SDK tests with current backend behavior --------- Co-authored-by: Liu An <asiro@qq.com>	2026-01-13 19:22:47 +08:00
LIRUI YU	947e63ca14	Fixed typos and added pptx preview for frontend (#12577 ) ### What problem does this PR solve? Previously, we added support for previewing PPT and PPTX files in the backend. Now, we are adding it to the frontend, so when the slides in the chat interface are referenced, they will no longer be blank. ### Type of change - Bug Fix (non-breaking change which fixes an issue)	2026-01-13 17:02:36 +08:00
LIRUI YU	41c84fd78f	Add MIME types for PPT and PPTX files (#12562 ) Otherwise, slide files cannot be opened in Chat module ### What problem does this PR solve? Backend Reason (API): In the api/utils/web_utils.py file of the backend, the CONTENT_TYPE_MAP dictionary is missing ppt and pptx. MIME type mapping. This means that when the frontend requests a PPTX file, the backend cannot correctly inform the browser that it is a PPTX file, resulting in the file being displayed incorrectly. Type identification error. ### Type of change - Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-13 12:17:49 +08:00
lys1313013	37e4485415	feat: add MDX file support (#12261 ) Feat: add MDX file support #12057 ### What problem does this PR solve? <img width="1055" height="270" alt="image" src="https://github.com/user-attachments/assets/a0ab49f9-7806-41cd-8a96-f593591ab36b" /> The page states that MDX files are supported, but uploading fails with the error: "x.mdx: This type of file has not been supported yet!" <img width="381" height="110" alt="image" src="https://github.com/user-attachments/assets/4bbb7d08-cb47-416a-95fc-bc90b90fcc39" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-29 12:54:31 +08:00
Jin Hai	42f9ac997f	Remove Chinese comments and fix function arguments errors (#12052 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-22 12:59:37 +08:00
Kevin Hu	44dec89f1f	Fix: aspose-slide issue. (#11935 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-12 20:16:18 +08:00
Magicbook1108	50715ba332	Fix: forget-reset password (#11927 ) ### What problem does this PR solve? Fix: forget-reset password ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-12 16:16:17 +08:00
Magicbook1108	7db9045b74	Feat: Add box connector (#11845 ) ### What problem does this PR solve? Feat: Add box connector ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-12 10:23:40 +08:00
Lynn	a1164b9c89	Feat/memory (#11812 ) ### What problem does this PR solve? Manage and display memory datasets. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-10 13:34:08 +08:00
buua436	65a5a56d95	Refa:replace trio with asyncio (#11831 ) ### What problem does this PR solve? change: replace trio with asyncio ### Type of change - [x] Refactoring	2025-12-09 19:23:14 +08:00
hsparks-codes	4870d42949	feat: Auto-disable Raptor for structured data (Issue #11653 ) (#11676 ) ### What problem does this PR solve? Feature: This PR implements automatic Raptor disabling for structured data files to address issue #11653. Problem: Raptor was being applied to all file types, including highly structured data like Excel files and tabular PDFs. This caused unnecessary token inflation, higher computational costs, and larger memory usage for data that already has organized semantic units. Solution: Automatically skip Raptor processing for: - Excel files (.xls, .xlsx, .xlsm, .xlsb) - CSV files (.csv, .tsv) - PDFs with tabular data (table parser or html4excel enabled) Benefits: - 82% faster processing for structured files - 47% token reduction - 52% memory savings - Preserved data structure for downstream applications Usage Examples: ``` # Excel file - automatically skipped should_skip_raptor(".xlsx") # True # CSV file - automatically skipped should_skip_raptor(".csv") # True # Tabular PDF - automatically skipped should_skip_raptor(".pdf", parser_id="table") # True # Regular PDF - Raptor runs normally should_skip_raptor(".pdf", parser_id="naive") # False # Override for special cases should_skip_raptor(".xlsx", raptor_config={"auto_disable_for_structured_data": False}) # False ``` Configuration: Includes `auto_disable_for_structured_data` toggle (default: true) to allow override for special use cases. Testing: 44 comprehensive tests, 100% passing ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-03 17:02:29 +08:00
Yongteng Lei	b6c4722687	Refa: make RAGFlow more asynchronous (#11601 ) ### What problem does this PR solve? Try to make this more asynchronous. Verified in chat and agent scenarios, reducing blocking behavior. #11551, #11579. However, the impact of these changes still requires further investigation to ensure everything works as expected. ### Type of change - [x] Refactoring	2025-12-01 14:24:06 +08:00
Billy Bao	fa9b7b259c	Feat: create datasets from http api supports ingestion pipeline (#11597 ) ### What problem does this PR solve? Feat: create datasets from http api supports ingestion pipeline ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-28 19:55:24 +08:00
Yongteng Lei	9d8b96c1d0	Feat: add context for figure and table (#11547 ) ### What problem does this PR solve? Add context for figure table. ![demo_figure_table_context](https://github.com/user-attachments/assets/61b37fac-e22e-40a4-9665-9396c7b4103e) `==================()` for demonstrating purpose. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-27 10:21:44 +08:00
Zhichang Yu	40e84ca41a	Use Infinity single-field-multi-index (#11444 ) ### What problem does this PR solve? Use Infinity single-field-multi-index ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-11-26 11:06:37 +08:00
Kevin Hu	d1716d865a	Feat: Alter flask to Quart for async API serving. (#11275 ) ### What problem does this PR solve? #11277 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-18 17:05:16 +08:00
Jin Hai	bd4bc57009	Refactor: move mcp connection utilities to common (#11304 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-17 15:34:17 +08:00
Jin Hai	61cf430dbb	Minor tweats (#11271 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-16 19:29:20 +08:00
Lynn	b5f2cf16bc	Fix: check task executor alive and display status (#11270 ) ### What problem does this PR solve? Correctly check task executor alive and display status. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-14 15:52:28 +08:00
YngvarHuang	bd5dda6b10	Feature/doc upload api add parent path 20251112 (#11231 ) ### What problem does this PR solve? Add the specified parent_path to the document upload api interface (#11230) ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: virgilwong <hyhvirgil@gmail.com>	2025-11-13 09:59:39 +08:00
Kevin Hu	d207291217	Fix: add download stats to kb logs. (#11112 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-10 13:28:07 +08:00
Lynn	d016a06fd5	Feat/monitor task (#11116 ) ### What problem does this PR solve? Show task executor. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-10 12:51:39 +08:00
Lynn	b7aa6d6c4f	Fix: add avatar for UI (#11080 ) ### What problem does this PR solve? Add avatar for admin UI. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-07 09:27:31 +08:00
Jin Hai	f98b24c9bf	Move api.settings to common.settings (#11036 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-06 09:36:38 +08:00
Billy Bao	24335485bf	Fix: get_allowed_llm_factories() return type (#11031 ) ### What problem does this PR solve? Fix: get_allowed_llm_factories() return type #11003 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) <img width="2880" height="215" alt="截图 2025-11-05 17-02-01" src="https://github.com/user-attachments/assets/ee892077-21f9-4b1e-a1d2-b921fa7f6121" />	2025-11-05 17:32:12 +08:00
Jin Hai	02d10f8eda	Move var from rag.settings to common.globals (#11022 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 15:48:50 +08:00
Jin Hai	1a9215bc6f	Move some vars to globals (#11017 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 14:14:38 +08:00
buua436	89410d2381	fix:api /factories wrong return (#11015 ) ### What problem does this PR solve? change: api /factories wrong return ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-05 12:50:11 +08:00
Wanderson Pinto dos Santos	3654ae61c1	feat: add allowed factories variable to allow admins to restrict llms users can add (#11003 ) ### What problem does this PR solve? Currently, if we want to restrict the allowed factories users can use we need to delete from the database table manually. The proposal of this PR is to include a variable to that, if set, will restrict the LLM factories the users can see and add. This allow us to not touch the llm_factories.json or the database if the LLM factory is already inserted. Obs.: All the lint changes were from the pre-commit hook which I did not change. ### Type of change - [X] New Feature (non-breaking change which adds functionality)	2025-11-05 10:47:50 +08:00
Jin Hai	bab3fce136	Move some constants to common (#11004 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 08:01:39 +08:00
Jin Hai	880a6a0428	Move some enumerate type to constants.py (#10998 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 19:25:25 +08:00
Jin Hai	03038c7d3d	Update RetCode to common.constants (#10984 ) ### What problem does this PR solve? 1. Update RetCode to common.constants 2. Decouple the admin and API modules ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 15:12:53 +08:00
Billy Bao	19f71a961a	Fix: Create dataset performance unmatched between HTTP api and web ui (#10960 ) ### What problem does this PR solve? Fix: Create dataset performance unmatched between HTTP api and web ui #10925 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-04 13:45:14 +08:00
Jin Hai	1e45137284	Move 'timeout' to common folder (#10983 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 11:51:12 +08:00
Jin Hai	d55344bc11	Remove unused code (#10981 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 11:10:29 +08:00
Jin Hai	378bdfccfc	Refactor log utils (#10973 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 20:25:02 +08:00
Jin Hai	9a486e0f51	Move some funcs from api to rag module (#10972 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 19:26:09 +08:00
Jin Hai	1284647694	Refactor file utils (#10970 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 18:54:55 +08:00

1 2 3 4 5

222 Commits