ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-03-22 23:17:44 +08:00

Author	SHA1	Message	Date
balibabu	be231faec0	Feat: Write the row and column numbers into the element's data attribute for easy code location. (#13368 ) ### What problem does this PR solve? Feat: Write the row and column numbers into the element's data attribute for easy code location. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Liu An <asiro@qq.com>	2026-03-04 20:50:58 +08:00
Idriss Sbaaoui	b3a7332c08	playwright : add data-testids for new test (#13364 ) ### What problem does this PR solve? add data-testids for new test ### Type of change - [x] Other (please describe): add data-testids for new test	2026-03-04 19:28:36 +08:00
Yao Wei	c99b53064d	fix: remove company info from resume_summary to prevent over-retrieval (#13358 ) ### What problem does this PR solve? Problem: When searching for a specific company name like(Daofeng Technology), the search would incorrectly return unrelated resumes containing generic terms like (Technology) in their company names Root Cause: The `corporation_name_tks` field was included in the identity fields that are redundantly written to every chunk. This caused common words like "科技" to match across all chunks, leading to over-retrieval of irrelevant resumes. Solution: Remove `corporation_name_tks` from the `_IDENTITY_FIELDS` list. Company information is still preserved in the "Work Overview" chunk where it belongs, allowing proper company-based searches while preventing false positives from generic terms. --------- Co-authored-by: Aron.Yao <yaowei@192.168.1.68> Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local> Co-authored-by: Liu An <asiro@qq.com>	2026-03-04 19:24:49 +08:00
Jin Hai	70e9743ef1	RAGFlow go API server (#13240 ) # RAGFlow Go Implementation Plan 🚀 This repository tracks the progress of porting RAGFlow to Go. We'll implement core features and provide performance comparisons between Python and Go versions. ## Implementation Checklist - [x] User Management APIs - [x] Dataset Management Operations - [x] Retrieval Test - [x] Chat Management Operations - [x] Infinity Go SDK --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>	2026-03-04 19:17:16 +08:00
Idriss Sbaaoui	2508c46c8f	Playwright : add new test for configuration tab in datasets (#13365 ) ### What problem does this PR solve? this pr adds new tests, for the full configuration tab in datasests ### Type of change - [x] Other (please describe): new tests	2026-03-04 19:10:06 +08:00
Idriss Sbaaoui	88e8509159	benchmark fail in ci (#13377 ) ### What problem does this PR solve? ci fails in elastic search because of benchmark ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-04 19:01:41 +08:00
Stephen Hu	c7d17c84b2	Refa:improve excel parser logic (#13372 ) ### What problem does this PR solve? improve excel parser logic ### Type of change - [x] Refactoring	2026-03-04 18:00:17 +08:00
Jin Hai	6bb00e2762	Update graspologic to gitee (#13362 ) ### What problem does this PR solve? Accelerate python module downloading ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-04 17:48:47 +08:00
Good0987	8a7272f423	Test: add scenario for embedding_model update when chunk_count > 0 (#13351 ) ### What problem does this PR solve? Guard embedding_model change when dataset has existing chunks. API must return code 102 with message 'When chunk_num (N) > 0, embedding_model must remain <current_model>' to prevent silent embedding drift. ### Type of change - [x] Add Testcases Co-authored-by: Liu An <asiro@qq.com>	2026-03-04 17:41:35 +08:00
Jin Hai	f47c47df99	Disable benchmark (#13370 ) ### What problem does this PR solve? benchmark always failed in new CI machine. please enable it after the issue is fixed. ### Type of change - [x] Other (please describe): disable benchmark Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-04 16:36:42 +08:00
yiminghub2024	5eb602166c	Enhance local model deployment documentation support gpustack guide (#13339 ) ### Type of change - [X] Documentation Update:Enhance local model deployment documentation support gpustack guide	2026-03-04 13:54:20 +08:00
少卿	54ae5b4a27	Fix Dify external retrieval by providing metadata.document_id (#13337 ) ### What problem does this PR solve? ## Summary Dify’s external retrieval expects `records[].metadata.document_id` to be a non-empty string. RAGFlow currently only sets `metadata.doc_id`, which causes Dify validation to fail. This PR adds `metadata.document_id` (mapped from `doc_id`) in the Dify-compatible retrieval response. ## Changes - Add `meta["document_id"] = c["doc_id"]` in `api/apps/sdk/dify_retrieval.py` ## Testing - Not run (logic-only change). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-04 13:23:37 +08:00
Jin Hai	b9ad014f63	Supports login cross multiple RAGFlow servers (#13322 ) ### What problem does this PR solve? 1. Use redis to store the secret key. 2. During startup API server will read the secret from redis. If no such secret key, generate one and store it into redis, atomically. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-04 13:07:45 +08:00
balibabu	5f8966608d	Fix: The dropdown menu for large models does not automatically focus on the search box. #13313 (#13360 ) ### What problem does this PR solve? Fix: The dropdown menu for large models does not automatically focus on the search box. #13313 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-04 12:48:35 +08:00
Magicbook1108	93d621a666	Fix: Correct PDF chunking parameter name in naive (#13357 ) ### What problem does this PR solve? Fix: Correct PDF chunking parameter name in naive #13325 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-04 11:51:10 +08:00
balibabu	733a64f0d6	Fix: Change the background color of the message notification button. (#13344 ) ### What problem does this PR solve? Fix: Change the background color of the message notification button. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-04 11:10:05 +08:00
statxc	839b603768	feat: Add PDF parser selection to Agent Begin and Await Response comp… (#13325 ) ### Issue: #12756 ### What problem does this PR solve? When users upload files through Agent's Begin or Await Response components, the parsing is hardcoded to "Plain Text", ignoring all other available parsers (DeepDOC, TCADP, Docling, MinerU, PaddleOCR). This PR adds a PDF parser dropdown to these components so users can select the appropriate parser for their file inputs. ### Changes Backend - `agent/component/fillup.py` - Added `layout_recognize` param to `UserFillUpParam`, forwarded to `FileService.get_files()` - `agent/component/begin.py` - Same forwarding in `Begin._invoke()` - `agent/canvas.py` - Extract Begin's `layout_recognize` for `sys.files` parsing, added param to `get_files_async()` / `get_files()` - `api/db/services/file_service.py` - Added `layout_recognize` param to `parse()` and `get_files()`, replacing hardcoded `"Plain Text"` - `rag/app/naive.py` - Added `"plain text"` and `"tcadp parser"` aliases to PARSERS dict to match dropdown values after `.lower()` Frontend - `web/src/pages/agent/form/begin-form/index.tsx` - Show `LayoutRecognizeFormField` dropdown when file inputs exist - `web/src/pages/agent/form/begin-form/schema.ts` - Added `layout_recognize` to Zod schema - `web/src/pages/agent/form/user-fill-up-form/index.tsx` - Same dropdown for Await Response component ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-04 11:09:33 +08:00
Liu An	7715bad04e	refactor: reorganize unit test files into appropriate directories (#13343 ) ### What problem does this PR solve? Move test files from utils/ to their corresponding functional directories: - api/db/ for database related tests - api/utils/ for API utility tests - rag/utils/ for RAG utility tests ### Type of change - [x] Refactoring	2026-03-04 11:02:56 +08:00
Copilot	33ba955b02	Translate Chinese text to English in agent/sandbox (#13356 ) Chinese text remained in generated code comments, log messages, field descriptions, and documentation files under `agent/sandbox/`. ### Changes - `tests/MIGRATION_GUIDE.md` — Full EN translation (migration guide from OpenSandbox → Code Interpreter) - `tests/QUICKSTART.md` — Full EN translation (quick test guide for Aliyun sandbox provider) - `providers/aliyun_codeinterpreter.py` — Removed `(主账号ID)` from docstring, error log, and config field description - `sandbox_spec.md` — Removed `(主账号ID)` from `account_id` field description - `tests/test_aliyun_codeinterpreter_integration.py` — Removed `(主账号ID)` from inline comment ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [x] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuzhichang <153784+yuzhichang@users.noreply.github.com>	2026-03-04 10:49:38 +08:00
wyou	0a4c0c38c7	Feat: expose admin service in helm configuration (#13345 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] New Feature (non-breaking change which adds functionality) For helm deployment, there is also requirement to enable the Admin Service for administrative operations. So expose the ability of enable/disable this function by helm configuration. When it's enabled (by default), <img width="486" height="190" alt="image" src="https://github.com/user-attachments/assets/4db0dc3d-bd94-4ad9-bb5d-a240aac5e1c5" /> Admin access and operations would be feasible like below, <img width="2530" height="876" alt="image" src="https://github.com/user-attachments/assets/3e948e1b-7522-4f8d-8dc0-c80a22242022" /> Something like 'user management' is very much important for Ragflow User/Owner to control their clients.	2026-03-04 10:26:10 +08:00
Idriss Sbaaoui	2f4ca38adf	Fix : make playwright tests idempotent (#13332 ) ### What problem does this PR solve? Playwright tests previously depended on cross-file execution order (`auth -> provider -> dataset -> chat`). This change makes setup explicit and idempotent via fixtures so tests can run independently. - Added/standardized prerequisite fixtures in `test/playwright/conftest.py`: - `ensure_auth_context`, `ensure_model_provider_configured`, `ensure_dataset_ready`, `ensure_chat_ready` - Made provisioning reusable/idempotent with `RUN_ID`-based resource naming. - Synced auth envs (`E2E_ADMIN_EMAIL`, `E2E_ADMIN_PASSWORD`) into seeded creds. - Fixed provider cache freshness (`auth_header`/`page` refresh on cache hit). Also included minimal stability fixes: - dataset create stale-element click handling, - search wait logic for results/empty-state, - agent create-menu handling, - agent run-step retry when run UI doesn’t open first click. ### Type of change - [x] Test fix - [x] Refactoring --------- Co-authored-by: Liu An <asiro@qq.com>	2026-03-04 10:07:14 +08:00
writinwaters	1c87f97dde	Docs: Minor document structure tweak. (#13346 ) ### What problem does this PR solve? Refactored the document architecture. ### Type of change - [x] Documentation Update	2026-03-03 20:09:34 +08:00
writinwaters	f7c808383f	Docs: Refactored documentation (#13340 ) ### What problem does this PR solve? Refactored documentation. ### Type of change - [x] Documentation Update	2026-03-03 17:48:48 +08:00
Yao Wei	48755a3352	Fix: (resume) Cross-verify project experience and work experience, and remove duplicate text (#13323 ) Cross-verify project experience and work experience, and remove duplicate text --------- Co-authored-by: Aron.Yao <yaowei@192.168.1.68> Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local>	2026-03-03 14:53:46 +08:00
balibabu	eca60208e3	Fix: The document generation node cannot generate the output content of a large model to a file. #13321 (#13326 ) ### What problem does this PR solve? Fix: The document generation node cannot generate the output content of a large model to a file. #13321 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-03 11:05:24 +08:00
Magicbook1108	4f09b3e2a4	Fix: pipeline canvas category (#13319 ) ### What problem does this PR solve? Fix: pipeline canvas category ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-02 20:27:36 +08:00
Yongteng Lei	707de2461a	Fix: use async_chat with sync wrapper in resume parser (#13320 ) ### What problem does this PR solve? Fix AttributeError when calling llm.chat() in resume parser. LLMBundle only has async_chat method, not chat method. Use `_run_coroutine_sync` wrapper to call async_chat synchronously. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-02 19:51:06 +08:00
chanx	ef264b52c7	Fix: Fixed some errors in the console (#13317 ) ### What problem does this PR solve? Fix: Fixed some errors in the console ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-02 19:19:15 +08:00
Yingfeng	a806f7b707	Potential fix for code scanning alert no. 71: Incomplete URL substring sanitization (#13318 ) Potential fix for [https://github.com/infiniflow/ragflow/security/code-scanning/71](https://github.com/infiniflow/ragflow/security/code-scanning/71) In general, instead of using `String.prototype.includes` on the entire URL string, parse the URL and make decisions based on its `host` (or `hostname`) field. This avoids cases where the trusted domain appears in the path, query, or as part of a different hostname. Here, `payload.source_fid` is set to `'siliconflow_intl'` if `postBody.base_url` “contains” `api.siliconflow.com`. To keep behavior for correct inputs but close the hole, we should: 1. Safely parse `postBody.base_url` using the standard `URL` class. 2. Extract the hostname (`url.hostname`). 3. Compare it appropriately: - If we only want the exact host `api.siliconflow.com`, use strict equality. - If international endpoints may include subdomains like `foo.api.siliconflow.com`, allow those via suffix check on the hostname. 4. Fall back to `LLMFactory.SILICONFLOW` if parsing fails or the host does not match. Concretely, in `web/src/pages/user-setting/setting-model/hooks.tsx`, in the `onApiKeySavingOk` callback where `payload.source_fid` is set, replace the `toLowerCase().includes('api.siliconflow.com')` logic with a small block that: - Initializes a local `let sourceFid = LLMFactory.SILICONFLOW;` - If `postBody.base_url` is present, attempts `new URL(postBody.base_url)` inside a `try/catch`, lowercases `url.hostname`, and checks whether it equals `api.siliconflow.com` or ends with `.api.siliconflow.com`. - Assigns `payload.source_fid = sourceFid`. No new external dependencies are required; `URL` is available in modern browsers and Node, and TypeScript understands it. _Suggested fixes powered by Copilot Autofix. Review carefully before merging._ Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2026-03-02 19:11:52 +08:00
Idriss Sbaaoui	b0ace2c5d0	feat: enable Arabic in production UI and add complete Arabic documentation (#13315 ) ### What problem does this PR solve? This PR adds end-to-end Arabic support in production. It also adds a full Arabic README ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2026-03-02 19:10:11 +08:00
Yao Wei	f8c91e8854	Refa: Resume parsing module (architectural optimizations based on SmartResume Pipeline) (#13255 ) Core optimizations (refer to arXiv:2510.09722): 1. PDF text fusion: Metadata + OCR dual-path extraction and fusion 2. Page-aware reconstruction: YOLOv10 page segmentation + hierarchical sorting + line number indexing 3. Parallel task decomposition: Basic information/work experience/educational background three-way parallel LLM extraction 4. Index pointer mechanism: LLM returns a range of line numbers instead of generating the full text, reducing the illusion of full text. --------- Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local> Co-authored-by: Aron.Yao <yaowei@192.168.1.68> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-02 19:05:50 +08:00
balibabu	7d6f20585f	Feat: Modify the style of the classification operator and fix some console errors. (#13314 ) ### What problem does this PR solve? Feat: Modify the style of the classification operator and fix some console errors. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-02 16:53:24 +08:00
Magicbook1108	5fc3bd38b0	Feat: Support siliconflow.com (#13308 ) ### What problem does this PR solve? Feat: Support siliconflow.com ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-02 15:37:42 +08:00
Magicbook1108	1db221f19e	Feat: add more models for siliconflow and tongyi-qwen (#13311 ) ### What problem does this PR solve? Feat: add more models for siliconflow and tongyi-qwen ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-02 15:37:08 +08:00
liuxiaoyusky	8ba66dd62a	Fix: respect user-configured chunk_token_num for MinerU/docling/paddleocr parsers (#13234 ) ## Summary When using MinerU, docling, TCADP, or paddleocr as the PDF parser with the General (naive) chunk method, the user-configured `chunk_token_num` is unconditionally overwritten to 0 at [rag/app/naive.py#L858-L859](https://github.com/infiniflow/ragflow/blob/main/rag/app/naive.py#L858-L859), effectively disabling chunk merging regardless of what the user sets in the UI. ### Problem A user sets `chunk_token_num = 2048` in the dataset configuration UI, expecting small parser blocks to be merged into larger chunks. However, this line: ```python if name in ["tcadp", "docling", "mineru", "paddleocr"]: parser_config["chunk_token_num"] = 0 ``` silently overrides the user's setting. As a result, every MinerU output block becomes its own chunk. For short documents (e.g. a 3-page PDF fund factsheet parsed by MinerU), this produces 47 tiny chunks — some as small as 11 characters (`"July 2025"`) or 15 characters (`"CIES Eligible"`). This severely degrades retrieval quality: vector embeddings of such short fragments have minimal semantic value, and keyword search produces excessive noise. ### Fix Only apply the `chunk_token_num = 0` override when the user has not explicitly configured a positive value: ```python if name in ["tcadp", "docling", "mineru", "paddleocr"]: if int(parser_config.get("chunk_token_num", 0)) <= 0: parser_config["chunk_token_num"] = 0 ``` This preserves the original default behavior (no merging) while respecting the user's explicit configuration. ### Before / After (MinerU, 3-page PDF, chunk_token_num=2048) \| \| Before \| After \| \|---\|---\|---\| \| Chunks produced \| 47 \| ~8 (merged by token limit) \| \| Smallest chunk \| 11 chars \| ~500 chars \| \| User setting respected \| No \| Yes \| ## Test plan - [ ] Parse a PDF with MinerU and `chunk_token_num = 2048` → verify chunks are merged up to token limit - [ ] Parse a PDF with MinerU and `chunk_token_num = 0` (or default) → verify original behavior (no merging) - [ ] Parse a PDF with DeepDOC parser → verify no change in behavior (not affected by this code path) - [ ] Repeat with docling/paddleocr if available	2026-03-02 15:31:40 +08:00
少卿	d430446e69	fix:absolute page index mix-up in DeepDoc PDF parser (#12848 ) ### What problem does this PR solve? Summary: This PR addresses critical indexing issues in deepdoc/parser/pdf_parser.py that occur when parsing long PDFs with chunk-based pagination: Normalize rotated table page numbering: Rotated-table re-OCR now writes page_number in chunk-local 1-based form, eliminating double-addition of page_from offset that caused misalignment between table positions and document boxes. Convert absolute positions to chunk-local coordinates: When inserting tables/figures extracted via _extract_table_figure, positions are now converted from absolute (0-based) to chunk-local indices before distance matching and box insertion. This prevents IndexError and out-of-range accesses during paged parsing of long documents. Root Cause: The parser mixed absolute (0-based, document-global) and relative (1-based, chunk-local) page numbering systems. Table/figure positions from layout extraction carried absolute page numbers, but insertion logic expected chunk-local coordinates aligned with self.boxes and page_cum_height. Testing(I do): Manual verification: Parse a 200+ page PDF with from_page > 0 and table rotation enabled. Confirm that: Tables and figures appear on correct pages No IndexError or position mismatches occur Page numbers in output match expected chunk-local offsets Automated testing: 我没做 ## Separate Discussion: Memory Optimization Strategy(from codex-5.2-max and claude 4.5 opus and me) ### Context The current implementation loads entire page ranges into memory (`__images__`, `page_chars`, intermediates), which can cause RAM exhaustion on large documents. While the page numbering fix resolves correctness issues, scalability remains a concern. ### Proposed Architecture Pipeline-Driven Chunking with Explicit Resource Management: 1. Authoritative chunk planning: Accept page-range specifications from upstream pipeline as the single source of truth. The parser should be a stateless worker that processes assigned chunks without making independent pagination decisions. 2. Granular memory lifecycle: ```python for chunk_spec in chunk_plan: # Load only chunk_spec.pages into __images__ page_images = load_page_range(chunk_spec.start, chunk_spec.end) # Process with offset tracking results = process_chunk(page_images, offset=chunk_spec.start) # Explicit cleanup before next iteration del page_images, page_chars, layout_intermediates gc.collect() # Force collection of large objects ``` 3. Persistent lightweight state: Keep model instances (layout detector, OCR engine), document metadata (outlines, PDF structure), and configuration across chunks to avoid reinitialization overhead (~2-5s per chunk for model loading). 4. Adaptive fallback: Provide `max_pages_per_chunk` (default: 50) only when pipeline doesn't supply a plan. Never exceed pipeline-specified ranges to maintain predictable memory bounds. 5. Optional: Dynamic budgeting: Expose a memory budget parameter that adjusts chunk size based on observed image dimensions and format (e.g., reduce chunk size for high-DPI scanned documents). ### Benefits - Predictable memory footprint: RAM usage bounded by `chunk_size × avg_page_size` rather than total document size - Horizontal scalability: Enables parallel chunk processing across workers - Failure isolation: Page extraction errors affect only current chunk, not entire document - Cloud-friendly: Works within container memory limits (e.g., 2-4GB per worker) ### Trade-offs - Increased I/O: Re-opening PDF for each chunk vs. keeping file handle (mitigated by page-range seeks) - Complexity: Requires careful offset tracking and stateful coordination between pipeline and parser - Warmup cost: Model initialization overhead amortized across chunks (acceptable for documents >100 pages) ### Implementation Priority This optimization should be deferred to a separate PR after the current correctness fix is merged, as: 1. It requires broader architectural changes across the pipeline 2. Current fix is critical for correctness and can be backported 3. Memory optimization needs comprehensive benchmarking on representative document corpus ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-02 14:58:37 +08:00
Ahmad Intisar	184388879d	feat: Add `disable_password_login` configuration to support SSO-only authentication (#13151 ) ### What problem does this PR solve? Enterprise deployments that use an external Identity Provider (e.g., Microsoft Entra ID, Okta, Keycloak) need the ability to enforce SSO-only authentication by hiding the email/password login form. Currently, the login page always shows the password form alongside OAuth buttons, with no way to disable it. This PR adds a `disable_password_login` configuration option under the existing `authentication` section in `service_conf.yaml`. When set to `true`, the login page only displays configured OAuth/SSO buttons and hides the email/password form, "Remember me" checkbox, and "Sign up" link. The flag can be set via: - `service_conf.yaml` (`authentication.disable_password_login: true`) - Environment variable (`DISABLE_PASSWORD_LOGIN=true`) Default behavior is unchanged (`false`). ### Behavior \| `disable_password_login` \| OAuth configured \| Result \| \|---\|---\|---\| \| `false` (default) \| No \| Standard email/password form \| \| `false` \| Yes \| Email/password form + SSO buttons below \| \| `true` \| Yes \| SSO buttons only (no form, no sign up link) \| \| `true` \| No \| Empty card (admin should configure OAuth first) \| ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### Files changed (5) 1. `docker/service_conf.yaml.template` — added `disable_password_login: false` under authentication 2. `common/settings.py` — added `DISABLE_PASSWORD_LOGIN` global variable and loader in `init_settings()` 3. `common/config_utils.py` — fixed `TypeError` in `show_configs()` when authentication section contains non-dict values (e.g., booleans) 4. `api/apps/system_app.py` — exposed `disablePasswordLogin` flag in `/config` endpoint 5. `web/src/pages/login/index.tsx` — conditionally render password form based on config flag; OAuth buttons always render when channels exist --------- Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>	2026-03-02 14:06:03 +08:00
Magicbook1108	daec36e935	Fix: add soft limit for graph rag size (#13252 ) ### What problem does this PR solve? Fix: add soft limit for graph rag size #13258 Q2 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-02 14:02:36 +08:00
huber	8a6b5ced6b	fix: add missing chunk_data column to OceanBase schema migration (#13306 ) ### What problem does this PR solve? When using OceanBase as the document storage engine, parsing and inserting chunks with chunk_data (e.g., table parser row data) fails with the following error: ``` [ERROR][Exception]: Insert chunk error: ['Unconsumed column names: chunk_data'] This happens because the chunk_data column was recently introduced but was omitted from the EXTRA_COLUMNS list in rag/utils/ob_conn.py ``` As a result, the automatic schema migration for existing OceanBase tables does not append the missing chunk_data column, causing the underlying pyobvector or SQLAlchemy to raise an unconsumed column names error during data insertion. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### What is the solution? Added column_chunk_data to the EXTRA_COLUMNS list in ``` rag/utils/ob_conn.py ``` This ensures that the OceanBase connection wrapper can correctly detect the missing column and automatically alter existing chunk tables to include the chunk_data field during initialization.	2026-03-02 13:25:11 +08:00
Magicbook1108	f0dd12289c	Feat: add preprocess parameters for ingestion pipeline (#13300 ) ### What problem does this PR solve? Feat: add preprocess parameters for ingestion pipeline ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-02 13:18:57 +08:00
Yihang Wang	7fc97da610	security: Adopt Jinja2 SandboxedEnvironment for template rendering. (#13305 )	2026-03-02 13:17:29 +08:00
Idriss Sbaaoui	860c4bd0bb	Feat: UI testing automation with playwright (#12749 ) ### What problem does this PR solve? This PR helps automate the testing of the ui interface using pytest Playwright ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Other (please describe): test automation infrastructure --------- Co-authored-by: Liu An <asiro@qq.com>	2026-03-02 13:04:08 +08:00
Attili-sys	21bc1ab7ec	Feature rtl support (#13118 ) ### What problem does this PR solve? This PR adds comprehensive Right-to-Left (RTL) language support, primarily targeting Arabic and other RTL scripts (Hebrew, Persian, Urdu, etc.). Previously, RTL content had multiple rendering issues: - Incorrect sentence splitting for Arabic punctuation in citation logic - Misaligned text in chat messages and markdown components - Improper positioning of blockquotes and “think” sections - Incorrect table alignment - Citation placement ambiguity in RTL prompts - UI layout inconsistencies when mixing LTR and RTL text This PR introduces backend and frontend improvements to properly detect, render, and style RTL content while preserving existing LTR behavior. #### Backend - Updated sentence boundary regex in `rag/nlp/search.py` to include Arabic punctuation: - `،` (comma) - `؛` (semicolon) - `؟` (question mark) - `۔` (Arabic full stop) - Ensures citation insertion works correctly in RTL sentences. - Updated citation prompt instructions to clarify citation placement rules for RTL languages. #### Frontend - Introduced a new utility: `text-direction.ts` - Detects text direction based on Unicode ranges. - Supports Arabic, Hebrew, Syriac, Thaana, and related scripts. - Provides `getDirAttribute()` for automatic `dir` assignment. - Applied dynamic `dir` attributes across: - Markdown rendering - Chat messages - Search results - Tables - Hover cards and reference popovers - Added proper RTL styling in LESS: - Text alignment adjustments - Blockquote border flipping - Section indentation correction - Table direction switching - Use of `<bdi>` for figure labels to prevent bidirectional conflicts #### DevOps / Environment - Added Windows backend launch script with retry handling. - Updated dependency metadata. - Adjusted development-only React debugging behavior. --- ### Type of change - [x] Bug Fix (non-breaking change which fixes RTL rendering and citation issues) - [x] New Feature (non-breaking change which adds RTL detection and dynamic direction handling) --------- Co-authored-by: 6ba3i <isbaaoui09@gmail.com> Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local> Co-authored-by: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-02 13:03:44 +08:00
balibabu	a897aedea9	Feat: Modify the form styles for retrieval and conditional operators. (#13299 ) ### What problem does this PR solve? Feat: Modify the form styles for retrieval and conditional operators. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-02 12:05:27 +08:00
chanx	0cdddea59a	feat: pipeline add preprocess (#13302 ) ### What problem does this PR solve? feat: pipeline add preprocess ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-02 11:50:48 +08:00
balibabu	cf3d3c7c89	Feat: When exporting the agent DSL, the tailkey, password, and history fields need to be cleared. #13281 (#13282 ) ### What problem does this PR solve? Feat: When exporting the agent DSL, the tailkey, password, and history fields need to be cleared. #13281 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-02 11:41:38 +08:00
dependabot[bot]	b956ad180c	Build(deps): Bump pypdf from 6.7.3 to 6.7.4 (#13298 ) Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.7.3 to 6.7.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/py-pdf/pypdf/releases">pypdf's releases</a>.</em></p> <blockquote> <h2>Version 6.7.4, 2026-02-27</h2> <h2>What's new</h2> <h3>Security (SEC)</h3> <ul> <li>Allow limiting output length for RunLengthDecode filter (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3664">#3664</a>) by <a href="https://github.com/stefan6419846"><code>@stefan6419846</code></a></li> </ul> <h3>Robustness (ROB)</h3> <ul> <li>Deal with invalid annotations in extract_links (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3659">#3659</a>) by <a href="https://github.com/stefan6419846"><code>@stefan6419846</code></a></li> </ul> <p><a href="https://github.com/py-pdf/pypdf/compare/6.7.3...6.7.4">Full Changelog</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md">pypdf's changelog</a>.</em></p> <blockquote> <h2>Version 6.7.4, 2026-02-27</h2> <h3>Security (SEC)</h3> <ul> <li>Allow limiting output length for RunLengthDecode filter (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3664">#3664</a>)</li> </ul> <h3>Robustness (ROB)</h3> <ul> <li>Deal with invalid annotations in extract_links (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3659">#3659</a>)</li> </ul> <p><a href="https://github.com/py-pdf/pypdf/compare/6.7.3...6.7.4">Full Changelog</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`1650bc31e8`"><code>1650bc3</code></a> REL: 6.7.4</li> <li><a href="`f309c60037`"><code>f309c60</code></a> SEC: Allow limiting output length for RunLengthDecode filter (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3664">#3664</a>)</li> <li><a href="`993f052748`"><code>993f052</code></a> DEV: Bump actions/upload-artifact from 6 to 7 (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3662">#3662</a>)</li> <li><a href="`a3c996bffc`"><code>a3c996b</code></a> DEV: Bump actions/download-artifact from 7 to 8 (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3663">#3663</a>)</li> <li><a href="`37de32022e`"><code>37de320</code></a> ROB: Deal with invalid annotations in extract_links (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3659">#3659</a>)</li> <li>See full diff in <a href="https://github.com/py-pdf/pypdf/compare/6.7.3...6.7.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pypdf&package-manager=uv&previous-version=6.7.3&new-version=6.7.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/infiniflow/ragflow/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-02 11:32:12 +08:00
Idriss Sbaaoui	9d78d3ddb1	Tests: fix failling http in CI (#13301 ) ### What problem does this PR solve? test_doc_sdk_routes_unit had two flaky/incorrect branch assumptions: 1. parse/stop_parsing production logic gates on doc.run, but tests used progress, causing branch mismatch and unintended fallthrough into mutation/DB paths. 2. stop_parsing invalid-state test asserted an outdated message fragment, making the contract brittle. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-02 10:44:33 +08:00
Jimmy Ben Klieve	7e0dd906f2	refactor: update admin ui (#13280 ) ### What problem does this PR solve? Update for Admin UI: - Update file picker input in Registration whitelist > Import from Excel modal - Modify DOM structure of Sandbox Settings and move several hardcoded texts into translation files ### Type of change - [x] Refactoring	2026-02-28 19:21:51 +08:00
Idriss Sbaaoui	e62552d482	Added some React IDs for playwright e2e tests (#13265 ) ### What problem does this PR solve? Necessary ids for implementing the new testing suite with playwright for UI ### Type of change - [x] Other (please describe): Testing IDs Co-authored-by: Liu An <asiro@qq.com>	2026-02-28 15:13:47 +08:00

1 2 3 4 5 ...

5402 Commits