ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-03-12 10:39:00 +08:00

Author	SHA1	Message	Date
Magicbook1108	4f09b3e2a4	Fix: pipeline canvas category (#13319 ) ### What problem does this PR solve? Fix: pipeline canvas category ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-02 20:27:36 +08:00
Ahmad Intisar	184388879d	feat: Add `disable_password_login` configuration to support SSO-only authentication (#13151 ) ### What problem does this PR solve? Enterprise deployments that use an external Identity Provider (e.g., Microsoft Entra ID, Okta, Keycloak) need the ability to enforce SSO-only authentication by hiding the email/password login form. Currently, the login page always shows the password form alongside OAuth buttons, with no way to disable it. This PR adds a `disable_password_login` configuration option under the existing `authentication` section in `service_conf.yaml`. When set to `true`, the login page only displays configured OAuth/SSO buttons and hides the email/password form, "Remember me" checkbox, and "Sign up" link. The flag can be set via: - `service_conf.yaml` (`authentication.disable_password_login: true`) - Environment variable (`DISABLE_PASSWORD_LOGIN=true`) Default behavior is unchanged (`false`). ### Behavior \| `disable_password_login` \| OAuth configured \| Result \| \|---\|---\|---\| \| `false` (default) \| No \| Standard email/password form \| \| `false` \| Yes \| Email/password form + SSO buttons below \| \| `true` \| Yes \| SSO buttons only (no form, no sign up link) \| \| `true` \| No \| Empty card (admin should configure OAuth first) \| ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### Files changed (5) 1. `docker/service_conf.yaml.template` — added `disable_password_login: false` under authentication 2. `common/settings.py` — added `DISABLE_PASSWORD_LOGIN` global variable and loader in `init_settings()` 3. `common/config_utils.py` — fixed `TypeError` in `show_configs()` when authentication section contains non-dict values (e.g., booleans) 4. `api/apps/system_app.py` — exposed `disablePasswordLogin` flag in `/config` endpoint 5. `web/src/pages/login/index.tsx` — conditionally render password form based on config flag; OAuth buttons always render when channels exist --------- Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>	2026-03-02 14:06:03 +08:00
Idriss Sbaaoui	860c4bd0bb	Feat: UI testing automation with playwright (#12749 ) ### What problem does this PR solve? This PR helps automate the testing of the ui interface using pytest Playwright ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): test automation infrastructure --------- Co-authored-by: Liu An <asiro@qq.com>	2026-03-02 13:04:08 +08:00
Idriss Sbaaoui	9d78d3ddb1	Tests: fix failling http in CI (#13301 ) ### What problem does this PR solve? test_doc_sdk_routes_unit had two flaky/incorrect branch assumptions: 1. parse/stop_parsing production logic gates on doc.run, but tests used progress, causing branch mismatch and unintended fallthrough into mutation/DB paths. 2. stop_parsing invalid-state test asserted an outdated message fragment, making the contract brittle. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-02 10:44:33 +08:00
Magicbook1108	1027916bfe	Fix: inconsistent state handling for multi-user single-canvas access (#13267 ) ### What problem does this PR solve? <img width="700" alt="image" src="https://github.com/user-attachments/assets/1db7412e-4554-44bc-84ba-16421949aacc" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-02-28 15:09:21 +08:00
天海蒼灆	983150b936	Fix (api): fix the document parsing status check logic (#12504 ) ### What problem does this PR solve? When the original code terminates the parsing task halfway, the progress may not be 0 or 1, which will result in the inability to call the interface to parse again -Change the document parsing progress check to task status check, and use TaskStatus.RUNNING.value to judge -Update the condition judgment for stopping parsing documents, and check whether the task is running instead ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-28 14:38:55 +08:00
qinling0210	8b6d363a98	Use pagination in _search_metadata (#13238 ) ### What problem does this PR solve? Fix [#13210](https://github.com/infiniflow/ragflow/issues/13210) Remove limit in _search_metadata, use pagination in _search_metadata. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 11:24:49 +08:00
6ba3i	22c4d72891	tests: improve RAGFlow coverage based on Codecov report (#13219 ) ### What problem does this PR solve? Codecov’s coverage report shows that several RAGFlow code paths are currently untested or under-tested. This makes it easier for regressions to slip in during refactors and feature work. This PR adds targeted automated tests to cover the files and branches highlighted by Codecov, improving confidence in core behavior while keeping runtime functionality unchanged. ### Type of change - [x] Other (please describe): Test coverage improvement (adds/extends unit and integration tests to address Codecov-reported gaps)	2026-02-26 19:03:26 +08:00
PandaMan	d43aebe701	Fix/13142 auto metadata (#13217 ) ### What problem does this PR solve? Close #13142 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-26 10:25:48 +08:00
6ba3i	38011f2c16	tests: improve RAGFlow coverage based on Codecov report (#13200 ) ### What problem does this PR solve? Codecov’s coverage report shows that several RAGFlow code paths are currently untested or under-tested. This makes it easier for regressions to slip in during refactors and feature work. This PR adds targeted automated tests to cover the files and branches highlighted by Codecov, improving confidence in core behavior while keeping runtime functionality unchanged. ### Type of change - [x] Other (please describe): Test coverage improvement (adds/extends unit and integration tests to address Codecov-reported gaps)	2026-02-25 19:12:11 +08:00
Phives	4ceb668d40	feat(api/utils): Harden file_utils for robustness and edge cases (#12915 ) ## Summary Improves robustness and edge-case handling in `api.utils.file_utils` to avoid crashes, DoS/OOM risks, and timeouts when processing user-provided filenames, paths, and file blobs. ## Changes ### Resource limits & timeouts - `MAX_BLOB_SIZE_THUMBNAIL` (50 MiB) and `MAX_BLOB_SIZE_PDF` (100 MiB) to reject oversized inputs before thumbnail/PDF processing. - `GHOSTSCRIPT_TIMEOUT_SEC` (120 s) for `repair_pdf_with_ghostscript` subprocess to avoid hangs on malicious or broken PDFs. ### `filename_type` - Handles `None`, empty string, non-string (e.g. int/list), and path-only input via new `_normalize_filename_for_type()`. - Uses basename for type detection (e.g. `a/b/c.pdf` → PDF). - Enforces `FILE_NAME_LEN_LIMIT`; invalid input returns `FileType.OTHER`. ### `thumbnail_img` - Rejects `None`/empty/oversized blob and invalid filename; returns `None` instead of raising. - Wraps PDF, image, and PPT handling in try/except so corrupt or malformed files return `None`. - Ensures PDF has pages and PPT has slides before use. - Normalizes PIL image mode (RGBA/P/LA → RGB) for safe PNG export. ### `repair_pdf_with_ghostscript` - Handles `None`/empty input; skips repair when input size exceeds limit. - Uses `subprocess.run(..., timeout=GHOSTSCRIPT_TIMEOUT_SEC)` and catches `TimeoutExpired`. - Returns original bytes when Ghostscript output is empty. ### `read_potential_broken_pdf` - `None` → `b""`; non–sequence-like (no `len`) → `b""`; empty → return as-is. - Oversized blob returned as-is (no repair) to avoid DoS. ### `sanitize_path` - Explicit `None` and non-string check; strips whitespace before normalizing. ## Testing - `test/unit_test/utils/test_api_file_utils.py` added with 36 unit tests covering the above behavior (filename_type, sanitize_path, read_potential_broken_pdf, thumbnail_img, thumbnail, repair_pdf_with_ghostscript, constants). - All tests pass. --------- Co-authored-by: Gittensor Miner <miner@gittensor.io>	2026-02-25 14:34:47 +08:00
PandaMan	f4cbdc3a3b	fix(api): MinIO health check use dynamic scheme and verify (Closes #13159 and #13158 ) (#13197 ) ## Summary Fixes MinIO SSL/TLS support in two places: the MinIO client connection and the health check used by the Admin/Service Health dashboard. Both now respect the `secure` and `verify` settings from the MinIO configuration. Closes #13158 Closes #13159 --- ## Problem #13158 – MinIO client: The client in `rag/utils/minio_conn.py` was hardcoded with `secure=False`, so RAGFlow could not connect to MinIO over HTTPS even when `secure: true` was set in config. There was also no way to disable certificate verification for self-signed certs. #13159 – MinIO health check: In `api/utils/health_utils.py`, the MinIO liveness check always used `http://` for the health URL. When MinIO was configured with SSL, the health check failed and the dashboard showed "timeout" even though MinIO was reachable over HTTPS. --- ## Solution ### MinIO client (`rag/utils/minio_conn.py`) - Read `MINIO.secure` (default `false`) and pass it into the `Minio()` constructor so HTTPS is used when configured. - Add `_build_minio_http_client()` that reads `MINIO.verify` (default `true`). When `verify` is false, return an `urllib3.PoolManager` with `cert_reqs=ssl.CERT_NONE` and pass it as `http_client` to `Minio()` so self-signed certificates are accepted. - Support string values for `secure` and `verify` (e.g. `"true"`, `"false"`). ### MinIO health check (`api/utils/health_utils.py`) - Add `_minio_scheme_and_verify()` to derive URL scheme (http/https) and the `verify` flag from `MINIO.secure` and `MINIO.verify`. - Update `check_minio_alive()` to use the correct scheme, pass `verify` into `requests.get(..., verify=verify)`, and use `timeout=10`. ### Config template (`docker/service_conf.yaml.template`) - Add commented optional MinIO keys `secure` and `verify` (and env vars `MINIO_SECURE`, `MINIO_VERIFY`) so deployers know they can enable HTTPS and optional cert verification. ### Tests - `test/unit_test/utils/test_health_utils_minio.py` – Tests for `_minio_scheme_and_verify()` and `check_minio_alive()` (scheme, verify, status codes, timeout, errors). - `test/unit_test/utils/test_minio_conn_ssl.py` – Tests for `_build_minio_http_client()` (verify true/false/missing, string values, `CERT_NONE` when verify is false). --- ## Testing - Unit tests added/updated as above; run with the project's test runner. - Manually: configure MinIO with HTTPS and `secure: true` (and optionally `verify: false` for self-signed); confirm client operations work and the Service Health dashboard shows MinIO as alive instead of timeout.	2026-02-25 09:47:12 +08:00
Liu An	65ebc06956	Refa: test file location for better organization (#13107 ) ### What problem does this PR solve? Renamed test/unit/test_delete_query_construction.py to test/unit_test/common/test_delete_query_construction.py to align with the project's directory structure and improve test categorization. ### Type of change - [x] Refactoring	2026-02-12 10:15:09 +08:00
Jim Smith	7029b8ca81	Fix: Make time_utils tests timezone-independent (#13100 ) ## Summary - Replace hardcoded CST (UTC+8) expected values in `test_time_utils.py` with dynamically computed local-time expectations using `time.localtime()` and `time.mktime()` - Tests previously failed in any timezone other than UTC+8; they now pass regardless of the system's local timezone ## Test plan - [x] `uv run pytest test/unit_test/ -v` — 317 passed, 25 skipped 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Jim Smith <jhsmith0@me.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 10:51:53 +08:00
Liu An	392ec99651	Docs: Update version references to v0.24.0 in READMEs and docs (#13095 ) ### What problem does this PR solve? - Update version tags in README files (including translations) from v0.23.1 to v0.24.0 - Modify Docker image references and documentation to reflect new version - Update version badges and image descriptions - Maintain consistency across all language variants of README files ### Type of change - [x] Documentation Update	2026-02-10 17:24:03 +08:00
6ba3i	fabbfcab90	Fix: failing p3 test for SDK/HTTP APIs (#13062 ) ### What problem does this PR solve? Adjust highlight parsing, add row-count SQL override, tweak retrieval thresholding, and update tests with engine-aware skips/utilities. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-09 14:56:10 +08:00
Carve_	7b230aadf4	chore(tests): move oceanbase peewee test under test/ and fix enum check (#12969 ) ### What problem does this PR solve? This mistake was made by PR #12926 This PR makes the OceanBase peewee unit test discoverable by the default unit test runner/CI (by moving it under test/), so it’s included in the unified unit test suite. It also fixes `test_database_lock_enum_values` to correctly handle Enum alias members (DatabaseLock uses the same value for MYSQL and OCEANBASE). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Screenshots The original `test_oceanbase_peewee.py` was placed under tests/, which isn’t included in the default unit test runner’s testpaths, so it wasn’t picked up by the unit test suite. So we need to move it to correct path. <img width="670" height="540" alt="image" src="https://github.com/user-attachments/assets/69d39346-450f-46dc-8965-29c3d7b32bc9" /> When using old version in `test_oceanbase_peewee.py`: ``` def test_database_lock_enum_values(self): """Test DatabaseLock enum has all expected values.""" expected = {'MYSQL', 'OCEANBASE', 'POSTGRES'} actual = {e.name for e in DatabaseLock} assert expected.issubset(actual), f"Missing: {expected - actual}" ``` The old check iterated Enum members, so alias values were skipped and only `MYSQL/POSTGRES` were seen, making OCEANBASE appear missing. <img width="1998" height="931" alt="65e2837f23b7b298980a410c7d5c2f09" src="https://github.com/user-attachments/assets/d8e98c5a-2cfa-4182-ae35-a3ef03554a27" /> and new version uses `DatabaseLock.__members__` and passes: <img width="2024" height="1170" alt="1aa8c6facb28d24149270fe1bc4a9dd9" src="https://github.com/user-attachments/assets/d8688936-ccac-4a39-a389-23dc6f0fe276" />	2026-02-03 17:28:53 +08:00
Philipp Heyken Soares	ad06c042c4	Support operator constraints in semi-automatic metadata filtering (#12956 ) ### What problem does this PR solve? #### Summary This PR enhances the Semi-automatic metadata filtering mode by allowing users to explicitly pre-define operators (e.g., contains, =, >, etc.) for selected metadata keys. While the LLM still dynamically extracts the filter value from the user's query, it is now strictly constrained to use the operator specified in the UI configuration. Using this feature is optional. By default the operator selection is set to "automatic" resulting in the LLM choosing the operator (as presently). #### Rationale & Use Case This enhancement was driven by a concrete challenge I encountered while working with technical documentation. In my specific use case, I was trying to filter for software versions within a technical manual. In this dataset, a single document chunk often applies to multiple software versions. These versions are stored as a combined string within the metadata for each chunk. When using the standard semi-automatic filter, the LLM would inconsistently choose between the contains and equals operators. When it chose equals, it would exclude every chunk that applied to more than one version, even if the version I was searching for was clearly included in that metadata string. This led to incomplete and frustrating retrieval results. By extending the semi-automatic filter to allow pre-defining the operator for a specific key, I was able to force the use of contains for the version field. This change immediately led to significantly improved and more reliable results in my case. I believe this functionality will be equally useful for others dealing with "tagged" or multi-value metadata where the relationship between the query and the field is known, but the specific value needs to remain dynamic. #### Key Changes ##### Backend & Core Logic - `common/metadata_utils.py`: Updated apply_meta_data_filter to support a mixed data structure for semi_auto (handling both legacy string arrays and the new object-based format {"key": "...", "op": "..."}). - `rag/prompts/generator.py`: Extended gen_meta_filter to accept and pass operator constraints to the LLM. - `rag/prompts/meta_filter.md`: Updated the system prompt to instruct the LLM to strictly respect provided operator constraints. ##### Frontend - `web/src/components/metadata-filter/metadata-semi-auto-fields.tsx`: Enhanced the UI to include an operator dropdown for each selected metadata key, utilizing existing operator constants. - `web/src/components/metadata-filter/index.tsx`: Updated the validation schema to accommodate the new state structure. #### Test Plan - Backward Compatibility: Verified that existing semi-auto filters stored as simple strings still function correctly. - Prompt Verification: Confirmed that constraints are correctly rendered in the LLM system prompt when specified. - Added unit tests as `test/unit_test/common/test_apply_semi_auto_meta_data_filter.py` - Manual End-to-End: - Configured a "Semi-automatic" filter for a "Version" key with the "contains" operator. - Asked a version-specific query. - Result <img width="1173" height="704" alt="Screenshot 2026-02-02 145359" src="https://github.com/user-attachments/assets/510a6a61-a231-4dc2-a7fe-cdfc07219132" /> ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: Philipp Heyken Soares <philipp.heyken-soares@am.ai>	2026-02-03 11:11:34 +08:00
Paul Y Hui	f028f74883	Fixed 12787 with syntax error in generated MySql json path expression (#12929 ) ### What problem does this PR solve? Fixed 12787 with syntax error in generated MySql json path expression https://github.com/infiniflow/ragflow/issues/12787 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) #### What was fixed: - Changed line 237 in ob_conn.py from value_str = get_value_str(value) if value else "" to value_str = get_value_str(value) - This fixes the bug where falsy but valid values (0, False, "", [], {}) were being converted to empty strings, causing invalid SQL syntax #### What was tested: - Comprehensive unit tests covering all edge cases - Regression tests specifically for the bug scenario --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-02-03 09:50:14 +08:00
Liu An	c4f60b349d	Fix(test): downgrade test priorities (#12913 ) ### What problem does this PR solve? Changed test priorities in multiple test files, downgrading from p1 to p2 and p2 to p3. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-30 20:02:56 +08:00
Liu An	4947e9473a	Fix(test): Update error message assertions for unsupported content type tests (#12901 ) ### What problem does this PR solve? This commit updates test cases for create, delete, and update dataset endpoints to expect consistent error messages when an unsupported content type is provided. ### Type of change - [x] Bug Fix (test)	2026-01-30 09:45:04 +08:00
Angel98518	98b6a0e6d1	feat: Add OceanBase Performance Monitoring and Health Check Integration (#12886 ) ## Description This PR implements comprehensive OceanBase performance monitoring and health check functionality as requested in issue #12772. The implementation follows the existing ES/Infinity health check patterns and provides detailed metrics for operations teams. ## Problem Currently, RAGFlow lacks detailed health monitoring for OceanBase when used as the document engine. Operations teams need visibility into: - Connection status and latency - Storage space usage - Query throughput (QPS) - Slow query statistics - Connection pool utilization ## Solution ### 1. Enhanced OBConnection Class (`rag/utils/ob_conn.py`) Added comprehensive performance monitoring methods: - `get_performance_metrics()` - Main method returning all performance metrics - `_get_storage_info()` - Retrieves database storage usage - `_get_connection_pool_stats()` - Gets connection pool statistics - `_get_slow_query_count()` - Counts queries exceeding threshold - `_estimate_qps()` - Estimates queries per second - Enhanced `health()` method with connection status ### 2. Health Check Utilities (`api/utils/health_utils.py`) Added two new functions following ES/Infinity patterns: - `get_oceanbase_status()` - Returns OceanBase status with health and performance metrics - `check_oceanbase_health()` - Comprehensive health check with detailed metrics ### 3. API Endpoint (`api/apps/system_app.py`) Added new endpoint: - `GET /v1/system/oceanbase/status` - Returns OceanBase health status and performance metrics ### 4. Comprehensive Unit Tests (`test/unit_test/utils/test_oceanbase_health.py`) Added 340+ lines of unit tests covering: - Health check success/failure scenarios - Performance metrics retrieval - Error handling and edge cases - Connection pool statistics - Storage information retrieval - QPS estimation - Slow query detection ## Metrics Provided - Connection Status: connected/disconnected - Latency: Query latency in milliseconds - Storage: Used and total storage space - QPS: Estimated queries per second - Slow Queries: Count of queries exceeding threshold - Connection Pool: Active connections, max connections, pool size ## Testing - All unit tests pass - Error handling tested for connection failures - Edge cases covered (missing tables, connection errors) - Follows existing code patterns and conventions ## Code Statistics - Total Lines Changed: 665+ lines - New Code: ~600 lines - Test Coverage: 340+ lines of comprehensive tests - Files Modified: 3 - Files Created: 1 (test file) ## Acceptance Criteria Met ✅ `/system/oceanbase/status` API returns OceanBase health status ✅ Monitoring metrics accurately reflect OceanBase running status ✅ Clear error messages when health checks fail ✅ Response time optimized (metrics cached where possible) ✅ Follows existing ES/Infinity health check patterns ✅ Comprehensive test coverage ## Related Files - `rag/utils/ob_conn.py` - OceanBase connection class - `api/utils/health_utils.py` - Health check utilities - `api/apps/system_app.py` - System API endpoints - `test/unit_test/utils/test_oceanbase_health.py` - Unit tests Fixes #12772 --------- Co-authored-by: Daniel <daniel@example.com>	2026-01-30 09:44:42 +08:00
Philipp Heyken Soares	6305c7e411	Fix metadata filter (#12861 ) ### What problem does this PR solve? ##### Summary This PR fixes a bug in the metadata filtering logic where the contains and not contains operators were behaving identically to the in and not in operators. It also standardizes the syntax for string-based operators. ##### The Issue On the main branch, the contains operator was implemented as: `matched = input in value if not isinstance(input, list) else all(i in value for i in input)` This logic is identical to the `in` operator. It checks if the metadata (`input`) exists within the filter (`value`). For a "contains" search, the logic should be reversed: _we want to check if the filter value exists within the metadata input_. ##### Solution Presented Here The operators have been rewritten using str.find(): Contains: `str(input).find(value) >= 0` Not Contains: `str(input).find(value) == -1` ##### Advantage This approach places the metadata (input) on the left side of the expression. This maintains stylistic consistency with the existing start with and end with operators in the same file, which also place the input on the left (e.g., str(input).lower().startswith(...)). ##### Considered Alternative In a previous PR we considered using the standard Python `in` operator: `value in str(input)`. The `in` operator is approximately 15% faster because it uses optimized Python bytecode (CONTAINS_OP) and avoids an attribute lookup. However following rejection of this PR we now propose the change presented here. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: Philipp Heyken Soares <philipp.heyken-soares@am.ai>	2026-01-29 09:59:48 +08:00
qinling0210	9a5208976c	Put document metadata in ES/Infinity (#12826 ) ### What problem does this PR solve? Put document metadata in ES/Infinity. Index name of meta data: ragflow_doc_meta_{tenant_id} ### Type of change - [x] Refactoring	2026-01-28 13:29:34 +08:00
Stephen Hu	52da81cf9e	Fix:Redis configuration template error in v0.22.1 (#12685 ) ### What problem does this PR solve? https://github.com/infiniflow/ragflow/issues/12674 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-27 12:47:46 +08:00
Julien Deveaux	6be197cbb6	Fix: Use tiktoken for proper token counting in OpenAI-compatible endpoint #7850 (#12760 ) ### What problem does this PR solve? The OpenAI-compatible chat endpoint (`/chats_openai/<chat_id>/chat/completions`) was not returning accurate token usage in streaming responses. The token counts were either missing or inaccurate because the underlying LLM API responses weren't being properly parsed for usage data. This PR adds proper token counting using tiktoken (cl100k_base encoding) as a fallback when the LLM API doesn't provide usage data in streaming chunks. This ensures clients always receive token usage information in the response, which is essential for billing and quota management. Changes: - Add tiktoken-based token counting for streaming responses in OpenAI-compatible endpoint - Ensure `usage` field is always populated in the final streaming chunk - Add unit tests for token usage calculation Fixes #7850 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-23 09:36:21 +08:00
Kevin Hu	3beb85efa0	Feat: enhance metadata arranging. (#12745 ) ### What problem does this PR solve? #11564 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-22 15:34:08 +08:00
Liu An	f98abf14a8	Refa(test): improve code formatting and remove debug prints (#12739 ) ### What problem does this PR solve? - Improving code formatting and consistency - Removing debug print statements ### Type of change - [x] Refactoring	2026-01-21 14:53:17 +08:00
Liu An	2a87778e10	Chore(ci): use new Web API test cases in CI (#12738 ) ### What problem does this PR solve? - Update pytest commands to use new test directory structure ### Type of change - [x] chore(ci)	2026-01-21 14:53:05 +08:00
Haipeng LI	7787085664	Doc: add README for test (#12728 ) ### What problem does this PR solve? We added instructions on how to test RAGFlow in test/README.md. ### Type of change - [x] Documentation Update	2026-01-20 19:12:35 +08:00
6ba3i	960ecd3158	Feat: update and add new tests for web api apps (#12714 ) ### What problem does this PR solve? This PR adds missing web API tests (system, search, KB, LLM, plugin, connector). It also addresses a contract mismatch that was causing test failures: metadata updates did not persist new keys (update‑only behavior). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): Test coverage expansion and test helper instrumentation	2026-01-20 19:12:15 +08:00
6ba3i	aee9860970	Make document change-status idempotent for Infinity doc store (#12717 ) ### What problem does this PR solve? This PR makes the document change‑status endpoint idempotent under the Infinity doc store. If a document already has the requested status, the handler returns success without touching the engine, preventing unnecessary updates and avoiding missing‑table errors while keeping responses consistent. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-20 19:11:21 +08:00
qinling0210	b40d639fdb	Add dataset with table parser type for Infinity and answer question in chat using SQL (#12541 ) ### What problem does this PR solve? 1) Create dataset using table parser for infinity 2) Answer questions in chat using SQL ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-19 19:35:14 +08:00
Hetavi Shah	46305ef35e	Add User API Token Management to Admin API and CLI (#12595 ) ## Summary This PR extends the RAGFlow Admin API and CLI with comprehensive user API token management capabilities. Administrators can now generate, list, and delete API tokens for users through both the REST API and the Admin CLI interface. ## Changes ### Backend API (`admin/server/`) #### New Endpoints - POST `/api/v1/admin/users/<username>/new_token` - Generate a new API token for a user - GET `/api/v1/admin/users/<username>/token_list` - List all API tokens for a user - DELETE `/api/v1/admin/users/<username>/token/<token>` - Delete a specific API token for a user #### Service Layer Updates (`services.py`) - Added `get_user_api_key(username)` - Retrieves all API tokens for a user - Added `save_api_token(api_token)` - Saves a new API token to the database - Added `delete_api_token(username, token)` - Deletes an API token for a user ### Admin CLI (`admin/client/`) #### New Commands - `GENERATE TOKEN FOR USER <username>;` - Generate a new API token for the specified user - `LIST TOKENS OF <username>;` - List all API tokens associated with a user - `DROP TOKEN <token> OF <username>;` - Delete a specific API token for a user ### Testing Added comprehensive test suite in `test/testcases/test_admin_api/`: - `test_generate_user_api_key.py` - Tests for API token generation - `test_get_user_api_key.py` - Tests for listing user API tokens - `test_delete_user_api_key.py` - Tests for deleting API tokens - `conftest.py` - Shared test fixtures and utilities ## Technical Details ### Token Generation - Tokens are generated using `generate_confirmation_token()` utility - Each token includes metadata: `tenant_id`, `token`, `beta`, `create_time`, `create_date` - Tokens are associated with user tenants automatically ### Security Considerations - All endpoints require admin authentication (`@check_admin_auth`) - Tokens are URL-encoded when passed in DELETE requests to handle special characters - Proper error handling for unauthorized access and missing resources ### API Response Format All endpoints follow the standard RAGFlow response format: ```json { "code": 0, "data": {...}, "message": "Success message" } ``` ## Files Changed - `admin/client/admin_client.py` - CLI token management commands - `admin/server/routes.py` - New API endpoints - `admin/server/services.py` - Token management service methods - `docs/guides/admin/admin_cli.md` - CLI documentation updates - `test/testcases/test_admin_api/conftest.py` - Test fixtures - `test/testcases/test_admin_api/test_user_api_key_management/*` - Test suites ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Alexander Strasser <alexander.strasser@ondewo.com> Co-authored-by: Hetavi Shah <your.email@example.com>	2026-01-17 15:21:00 +08:00
6ba3i	2b20d0b3bb	Fix : Web API tests by normalizing errors, validation, and uploads (#12620 ) ### What problem does this PR solve? Fixes web API behavior mismatches that caused test failures by normalizing error responses, tightening validations, correcting error messages, and closing upload file handles. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-16 11:09:22 +08:00
Vedant Madane	ac936005e6	fix: ensure deleted chunks are not returned in retrieval (#12520 ) (#12546 ) ## Summary Fixes #12520 - Deleted chunks should not appear in retrieval/reference results. ## Changes ### Core Fix - api/apps/chunk_app.py: Include \doc_id\ in delete condition to properly scope the delete operation ### Improved Error Handling - api/db/services/document_service.py: Better separation of concerns with individual try-catch blocks and proper logging for each cleanup operation ### Doc Store Updates - rag/utils/es_conn.py: Updated delete query construction to support compound conditions - rag/utils/opensearch_conn.py: Same updates for OpenSearch compatibility ### Tests - test/testcases/.../test_retrieval_chunks.py: Added \TestDeletedChunksNotRetrievable\ class with regression tests - test/unit/test_delete_query_construction.py: Unit tests for delete query construction ## Testing - Added regression tests that verify deleted chunks are not returned by retrieval API - Tests cover single chunk deletion and batch deletion scenarios	2026-01-15 14:45:55 +08:00
6ba3i	5b22f94502	Feat: Benchmark CLI additions and documentation (#12536 ) ### What problem does this PR solve? This PR adds a dedicated HTTP benchmark CLI for RAGFlow chat and retrieval endpoints so we can measure latency/QPS. ### Type of change - [x] Documentation Update - [x] Other (please describe): Adds a CLI benchmarking tool for chat/retrieval latency/QPS --------- Co-authored-by: Liu An <asiro@qq.com>	2026-01-14 13:49:16 +08:00
6ba3i	ea619dba3b	Added to the HTTP API test suite (#12556 ) ### What problem does this PR solve? This PR adds missing HTTP API test coverage for dataset graph/GraphRAG/RAPTOR tasks, metadata summary, chat completions, agent sessions/completions, and related questions. It also introduces minimal HTTP test helpers to exercise these endpoints consistently with the existing suite. ### Type of change - [x] Other (please describe): Test coverage (HTTP API tests) --------- Co-authored-by: Liu An <asiro@qq.com>	2026-01-14 10:02:30 +08:00
6ba3i	0795616b34	Align p3 HTTP/SDK tests with current backend behavior (#12563 ) ### What problem does this PR solve? Updates pre-existing HTTP API and SDK tests to align with current backend behavior (validation errors, 404s, and schema defaults). This ensures p3 regression coverage is accurate without changing production code. ### Type of change - [x] Other (please describe): align p3 HTTP/SDK tests with current backend behavior --------- Co-authored-by: Liu An <asiro@qq.com>	2026-01-13 19:22:47 +08:00
Lynn	f9d4179bf2	Feat：memory sdk (#12538 ) ### What problem does this PR solve? Move memory and message apis to /api, and add sdk support. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-09 17:45:58 +08:00
Lynn	a2211c200d	Feat: message write testcase (#12417 ) ### What problem does this PR solve? Write testcase for message web apis. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-01-04 16:52:44 +08:00
Lynn	11779697de	Test: get message content testcase (#12403 ) ### What problem does this PR solve? Testcase for get_message_content api. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-01-04 11:25:24 +08:00
Lynn	1f4a17863f	Feat: read web api testcases (#12383 ) ### What problem does this PR solve? Web API testcase for list_messages, get_recent_message. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-01-01 12:52:40 +08:00
Lynn	6e9691a419	Feat: message manage (#12196 ) ### What problem does this PR solve? Manage message and use in agent. Issue #4213 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-25 21:18:13 +08:00
Jin Hai	d1c4077a75	Fix directory name (#12195 ) ### What problem does this PR solve? as title. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-25 14:24:13 +08:00
Jin Hai	5ebabf5bed	Fix test error (#12194 ) ### What problem does this PR solve? as title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-25 13:14:20 +08:00
Jin Hai	cc9546b761	Fix IDE warnings (#12010 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-18 11:27:02 +08:00
Jin Hai	30019dab9f	Change knowledge base to dataset (#11976 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-17 10:03:33 +08:00
Lynn	a1164b9c89	Feat/memory (#11812 ) ### What problem does this PR solve? Manage and display memory datasets. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-10 13:34:08 +08:00
buua436	af1344033d	Delete:remove unused tests (#11749 ) ### What problem does this PR solve? change: remove unused tests ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-04 18:49:32 +08:00

1 2

99 Commits