ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-03-27 17:29:39 +08:00

Author	SHA1	Message	Date
balibabu	cf3d3c7c89	Feat: When exporting the agent DSL, the tailkey, password, and history fields need to be cleared. #13281 (#13282 ) ### What problem does this PR solve? Feat: When exporting the agent DSL, the tailkey, password, and history fields need to be cleared. #13281 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-02 11:41:38 +08:00
dependabot[bot]	b956ad180c	Build(deps): Bump pypdf from 6.7.3 to 6.7.4 (#13298 ) Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.7.3 to 6.7.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/py-pdf/pypdf/releases">pypdf's releases</a>.</em></p> <blockquote> <h2>Version 6.7.4, 2026-02-27</h2> <h2>What's new</h2> <h3>Security (SEC)</h3> <ul> <li>Allow limiting output length for RunLengthDecode filter (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3664">#3664</a>) by <a href="https://github.com/stefan6419846"><code>@stefan6419846</code></a></li> </ul> <h3>Robustness (ROB)</h3> <ul> <li>Deal with invalid annotations in extract_links (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3659">#3659</a>) by <a href="https://github.com/stefan6419846"><code>@stefan6419846</code></a></li> </ul> <p><a href="https://github.com/py-pdf/pypdf/compare/6.7.3...6.7.4">Full Changelog</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md">pypdf's changelog</a>.</em></p> <blockquote> <h2>Version 6.7.4, 2026-02-27</h2> <h3>Security (SEC)</h3> <ul> <li>Allow limiting output length for RunLengthDecode filter (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3664">#3664</a>)</li> </ul> <h3>Robustness (ROB)</h3> <ul> <li>Deal with invalid annotations in extract_links (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3659">#3659</a>)</li> </ul> <p><a href="https://github.com/py-pdf/pypdf/compare/6.7.3...6.7.4">Full Changelog</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`1650bc31e8`"><code>1650bc3</code></a> REL: 6.7.4</li> <li><a href="`f309c60037`"><code>f309c60</code></a> SEC: Allow limiting output length for RunLengthDecode filter (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3664">#3664</a>)</li> <li><a href="`993f052748`"><code>993f052</code></a> DEV: Bump actions/upload-artifact from 6 to 7 (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3662">#3662</a>)</li> <li><a href="`a3c996bffc`"><code>a3c996b</code></a> DEV: Bump actions/download-artifact from 7 to 8 (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3663">#3663</a>)</li> <li><a href="`37de32022e`"><code>37de320</code></a> ROB: Deal with invalid annotations in extract_links (<a href="https://redirect.github.com/py-pdf/pypdf/issues/3659">#3659</a>)</li> <li>See full diff in <a href="https://github.com/py-pdf/pypdf/compare/6.7.3...6.7.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pypdf&package-manager=uv&previous-version=6.7.3&new-version=6.7.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/infiniflow/ragflow/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-02 11:32:12 +08:00
Idriss Sbaaoui	9d78d3ddb1	Tests: fix failling http in CI (#13301 ) ### What problem does this PR solve? test_doc_sdk_routes_unit had two flaky/incorrect branch assumptions: 1. parse/stop_parsing production logic gates on doc.run, but tests used progress, causing branch mismatch and unintended fallthrough into mutation/DB paths. 2. stop_parsing invalid-state test asserted an outdated message fragment, making the contract brittle. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-02 10:44:33 +08:00
Jimmy Ben Klieve	7e0dd906f2	refactor: update admin ui (#13280 ) ### What problem does this PR solve? Update for Admin UI: - Update file picker input in Registration whitelist > Import from Excel modal - Modify DOM structure of Sandbox Settings and move several hardcoded texts into translation files ### Type of change - [x] Refactoring	2026-02-28 19:21:51 +08:00
Idriss Sbaaoui	e62552d482	Added some React IDs for playwright e2e tests (#13265 ) ### What problem does this PR solve? Necessary ids for implementing the new testing suite with playwright for UI ### Type of change - [x] Other (please describe): Testing IDs Co-authored-by: Liu An <asiro@qq.com>	2026-02-28 15:13:47 +08:00
Magicbook1108	1027916bfe	Fix: inconsistent state handling for multi-user single-canvas access (#13267 ) ### What problem does this PR solve? <img width="700" alt="image" src="https://github.com/user-attachments/assets/1db7412e-4554-44bc-84ba-16421949aacc" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-02-28 15:09:21 +08:00
Yongteng Lei	c91e803a38	Fix: close detached PIL image on JPEG save failure in encode_image (#13278 ) ### What problem does this PR solve? Properly close detached PIL image on JPEG save failure in encode_image. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-28 14:43:35 +08:00
天海蒼灆	983150b936	Fix (api): fix the document parsing status check logic (#12504 ) ### What problem does this PR solve? When the original code terminates the parsing task halfway, the progress may not be 0 or 1, which will result in the inability to call the interface to parse again -Change the document parsing progress check to task status check, and use TaskStatus.RUNNING.value to judge -Update the condition judgment for stopping parsing documents, and check whether the task is running instead ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-28 14:38:55 +08:00
Jin Hai	32ec950ca8	Fix create / drop chat session syntax (#13279 ) ### What problem does this PR solve? This pull request refactors the chat session creation and deletion logic in both the parser and client code to use unique session IDs instead of session names. It also updates the corresponding command syntax and payloads, ensuring more robust and unambiguous session management. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-02-28 14:18:21 +08:00
Jin Hai	d9d4825079	Add chat sessions related command (#13268 ) ### What problem does this PR solve? 1. Create / Drop / List chat sessions 2. Chat with LLM and datasets ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-02-28 12:52:45 +08:00
Jin Hai	54094771a3	Fix streaming chat on web API (#13275 ) ### What problem does this PR solve? This pull request makes a small but important fix to how streaming requests are handled in the `completion` endpoint of `conversation_app.py`. The main change ensures that the `stream` argument is not passed twice, which could cause errors. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-02-28 12:16:38 +08:00
Yongteng Lei	0110151e12	Fix: document remove race condition (#13242 ) ### What problem does this PR solve? Fix document remove race condition. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-28 11:23:24 +08:00
eviaaaaa	fa71f8d0c7	refactor(word): lazy-load DOCX images to reduce peak memory without changing output (#13233 ) Summary This PR tackles a significant memory bottleneck when processing image-heavy Word documents. Previously, our pipeline eagerly decoded DOCX images into `PIL.Image` objects, which caused high peak memory usage. To solve this, I've introduced a lazy-loading approach: images are now stored as raw blobs and only decoded exactly when and where they are consumed. This successfully reduces the memory footprint while keeping the parsing output completely identical to before. What's Changed Instead of a dry file-by-file list, here is the logical breakdown of the updates: * The Core Abstraction (`lazy_image.py`): Introduced `LazyDocxImage` along with helper APIs to handle lazy decoding, image-type checks, and NumPy compatibility. It also supports `.close()` and detached PIL access to ensure safe lifecycle management and prevent memory leaks. * Pipeline Integration (`naive.py`, `figure_parser.py`, etc.): Updated the general DOCX picture extraction to return these new lazy images. Downstream consumers (like the figure/VLM flow and base64 encoding paths) now decode images right at the use site using detached PIL instances, avoiding shared-instance side effects. * Compatibility Hooks (`operators.py`, `book.py`, etc.): Added necessary compatibility conversions so these lazy images flow smoothly through existing merging, filtering, and presentation steps without breaking. Scope & What is Intentionally Left Out To keep this PR focused, I have restricted these changes strictly to the general Word pipeline and its downstream consumers. The `QA` and `manual` Word parsing pipelines are explicitly not modified in this PR. They can be safely migrated to this new lazy-load model in a subsequent, standalone PR. Design Considerations I briefly considered adding image compression during processing, but decided against it to avoid any potential quality degradation in the derived outputs. I also held off on a massive pipeline re-architecture to avoid overly invasive changes right now. Validation & Testing I've tested this to ensure no regressions: * Compared identical DOCX inputs before and after this branch: chunk counts, extracted text, table HTML, and image descriptions match perfectly. * Confirmed a noticeable drop in peak memory usage when processing image-dense documents. For a 30MB Word document containing 243 1080p screenshots, memory consumption is reduced by approximately 1.5GB. Breaking Changes None.	2026-02-28 11:22:31 +08:00
SFL79	4f0c892b32	feat(ui): add individual model delete buttons across all providers (#13271 ) ### What problem does this PR solve? Added the option to delete models individually from providers. For additional context, see [issue-13184](https://github.com/infiniflow/ragflow/issues/13184) ### Type of change - [x] New Feature (non-breaking change which adds functionality) Note: when deleting a selected model, it leaves the full model name as text as seen here: <img width="676" height="90" alt="image" src="https://github.com/user-attachments/assets/c11c7c1b-3f2a-4119-b20c-bb8148a8ad16" /> If attempting to use ragflow with that deleted model, ragflow will throw an unauthorized model error as expected. I left it like that on purpose, so it's easier for the user to understand what he deleted and that he needs to replace it with another model. Co-authored-by: Shahar Flumin <shahar@Shahars-MacBook-Air.local>	2026-02-28 10:51:39 +08:00
Yesid Cano Castro	d1afcc9e71	feat(seafile): add library and directory sync scope support (#13153 ) ### What problem does this PR solve? The SeaFile connector currently synchronises the entire account — every library visible to the authenticated user. This is impractical for users who only need a subset of their data indexed, especially on large SeaFile instances with many shared libraries. This PR introduces granular sync scope support, allowing users to choose between syncing their entire account, a single library, or a specific directory within a library. It also adds support for SeaFile library-scoped API tokens (`/api/v2.1/via-repo-token/` endpoints), enabling tighter access control without exposing account-level credentials. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Test ``` from seafile_connector import SeaFileConnector import logging import os logging.basicConfig(level=logging.DEBUG) URL = os.environ.get("SEAFILE_URL", "https://seafile.example.com") TOKEN = os.environ.get("SEAFILE_TOKEN", "") REPO_ID = os.environ.get("SEAFILE_REPO_ID", "") SYNC_PATH = os.environ.get("SEAFILE_SYNC_PATH", "/Documents") REPO_TOKEN = os.environ.get("SEAFILE_REPO_TOKEN", "") def _test_scope(scope, repo_id=None, sync_path=None): print(f"\n{'='50}") print(f"Testing scope: {scope}") print(f"{'='50}") creds = {"seafile_token": TOKEN} if TOKEN else {} if REPO_TOKEN and scope in ("library", "directory"): creds["repo_token"] = REPO_TOKEN connector = SeaFileConnector( seafile_url=URL, batch_size=5, sync_scope=scope, include_shared = False, repo_id=repo_id, sync_path=sync_path, ) connector.load_credentials(creds) connector.validate_connector_settings() count = 0 for batch in connector.load_from_state(): for doc in batch: count += 1 print(f" [{count}] {doc.semantic_identifier} " f"({doc.size_bytes} bytes, {doc.extension})") print(f"\n-> {scope} scope: {count} document(s) found.\n") # 1. Account scope if TOKEN: _test_scope("account") else: print("\nSkipping account scope (set SEAFILE_TOKEN)") # 2. Library scope if REPO_ID and (TOKEN or REPO_TOKEN): _test_scope("library", repo_id=REPO_ID) else: print("\nSkipping library scope (set SEAFILE_REPO_ID + token)") # 3. Directory scope if REPO_ID and SYNC_PATH and (TOKEN or REPO_TOKEN): _test_scope("directory", repo_id=REPO_ID, sync_path=SYNC_PATH) else: print("\nSkipping directory scope (set SEAFILE_REPO_ID + SEAFILE_SYNC_PATH + token)") ```	2026-02-28 10:24:28 +08:00
Stephen Hu	aec2ef4232	refactor:improve tts model's codes (#13137 ) ### What problem does this PR solve? improve tts model's codes ### Type of change - [x] Refactoring	2026-02-28 10:18:00 +08:00
Stephen Hu	9577753c10	Refactor: improve the logic about docling parser extract box (#13215 ) ### What problem does this PR solve? improve the logic about docling parser extract box ### Type of change - [x] Refactoring	2026-02-28 10:05:24 +08:00
chanx	510ff89661	Fix: remove unused files (#13232 ) ### What problem does this PR solve? Fix: remove unused files ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 23:05:40 +08:00
Jimmy Ben Klieve	c0823e8d6d	refactor: update chat ui (#13269 ) ### What problem does this PR solve? Update Chat UI: - Align to the design. - Update `<AudioButton>` visualizer logic. - Fix keyboard navigation issue. ### Type of change - [x] Refactoring	2026-02-27 22:26:19 +08:00
Enes Delibalta	4e48aba5c4	fix: update DoclingParser return type hint (#13243 ) ### What problem does this PR solve? The _transfer_to_sections method was throwing a type hint violation because it occasionally returns 3-item tuples instead of 2. Adjusted to list[tuple[str, ...]] to prevent runtime crashes. Error: 20:53:21 Page(1~10): [ERROR]Internal server error while chunking: Method[1m[35m deepdoc.parser.docling_parser.DoclingParser._transfer_to_sections()[0m return [1m[31m[(1. JIRA Nasıl Kullanılır?, text, @@1\t70.8\t194.9\t70.9\t85.5##), (1.1. Proje O...##)][0m violates type hint [1m[32mlist[tuple[str, str]][0m, as [1m[33mlist [0mindex [1m[33m15[0m item tuple [1m[33mtuple [0m[1m[31m(Gelen ekran üzerinden alanları isterlerine göre doldurduğunuz taktirde Create düğmesi i...##)[0m length 3 != 2. 20:53:21 [ERROR][Exception]: Method[1m[35m deepdoc.parser.docling_parser.DoclingParser._transfer_to_sections()[0m return [1m[31m[('1. JIRA Nasıl Kullanılır?', 'text', '@@1\t70.8\t194.9\t70.9\t85.5##'), ('1.1. Proje O...##')][0m violates type hint [1m[32mlist[tuple[str, str]][0m, as [1m[33mlist [0mindex [1m[33m15[0m item tuple [1m[33mtuple [0m[1m[31m('Gelen ekran üzerinden alanları isterlerine göre doldurduğunuz taktirde Create düğmesi i...##')[0m length 3 != 2. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Enes Delibalta <enes.delibalta@pentanom.com>	2026-02-27 20:13:50 +08:00
Yuxing Deng	51b180d991	fix: adding GPUStack chat model requires v1 suffix (#13237 ) ### What problem does this PR solve? Refer to issue: #13236 The base url for GPUStack chat model requires `/v1` suffix. For the other model type like `Embedding` or `Rerank`, the `/v1` suffix is not required and will be appended in code. So keep the same logic for chat model as other model type. ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 20:13:07 +08:00
as-ondewo	194e076e26	Fix: init superuser can create duplicate users (#13221 ) ### What problem does this PR solve? This PR fixes 2 bugs related to RAGFlow's init superuser functionality. #### Bug 1 When the RAGFlow server was started with the `--init-superuser` option it would always create a new admin user even if it already exists resulting in duplicate users. To fix this, I added an additional check before create the superuser and added the unique constraint to the email column of the database, to mitigate potential TOCTOU race conditions. Since existing databases could contain duplicate emails I added email de-duplication to the database migration. #### Bug 2 When the RAGFlow server was started with the `--init-superuser` option but without configured default LLM and embedding models it would fail to start because the `init_superuser` function would always make test request to the models even if they were not set. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 19:55:51 +08:00
balibabu	6d0100ca67	Fix: The output content of the multi-model comparison will disappear. #13227 (#13241 ) ### What problem does this PR solve? Fix: The output content of the multi-model comparison will disappear. #13227 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 19:18:40 +08:00
balibabu	861ebfc6e1	Feat: Make the embedded page of chat compatible with mobile devices. (#13262 ) ### What problem does this PR solve? Feat: Make the embedded page of chat compatible with mobile devices. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-27 19:17:41 +08:00
avianion	5f53fbe0f1	feat: Add Avian as an LLM provider (#13256 ) ### What problem does this PR solve? This PR adds [Avian](https://avian.io) as a new LLM provider to RAGFlow. Avian provides an OpenAI-compatible API with competitive pricing, offering access to models like DeepSeek V3.2, Kimi K2.5, GLM-5, and MiniMax M2.5. Provider details: - API Base URL: `https://api.avian.io/v1` - Auth: Bearer token via API key - OpenAI-compatible (chat completions, streaming, function calling) - Models: - `deepseek/deepseek-v3.2` — 164K context, $0.26/$0.38 per 1M tokens - `moonshotai/kimi-k2.5` — 131K context, $0.45/$2.20 per 1M tokens - `z-ai/glm-5` — 131K context, $0.30/$2.55 per 1M tokens - `minimax/minimax-m2.5` — 1M context, $0.30/$1.10 per 1M tokens Changes: - `rag/llm/chat_model.py` — Add `AvianChat` class extending `Base` - `rag/llm/__init__.py` — Register in `SupportedLiteLLMProvider`, `FACTORY_DEFAULT_BASE_URL`, `LITELLM_PROVIDER_PREFIX` - `conf/llm_factories.json` — Add Avian factory with model definitions - `web/src/constants/llm.ts` — Add to `LLMFactory` enum, `IconMap`, `APIMapUrl` - `web/src/components/svg-icon.tsx` — Register SVG icon - `web/src/assets/svg/llm/avian.svg` — Provider icon - `docs/references/supported_models.mdx` — Add to supported models table This follows the same pattern as other OpenAI-compatible providers (e.g., n1n #12680, TokenPony). cc @KevinHuSh @JinHai-CN ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2026-02-27 17:36:55 +08:00
6ba3i	bb59a27e55	Doc : Add french Readme (#13254 ) ### What problem does this PR solve? Add fench Readme ### Type of change - [x] Documentation Update	2026-02-27 11:34:13 +08:00
qinling0210	8b6d363a98	Use pagination in _search_metadata (#13238 ) ### What problem does this PR solve? Fix [#13210](https://github.com/infiniflow/ragflow/issues/13210) Remove limit in _search_metadata, use pagination in _search_metadata. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 11:24:49 +08:00
Jin Hai	a1549c0fdc	Fix UI (#13239 ) ### What problem does this PR solve? This pull request makes a minor update to the English locale strings for the Table of Contents toggle buttons, changing the labels from "Show TOC"/"Hide TOC" to "Show content"/"Hide content" for improved clarity. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-02-26 19:21:08 +08:00
Magicbook1108	c03c537bf8	Feat: optimize gmail/google-drive (#13230 ) ### What problem does this PR solve? Feat: optimize gmail/google-drive Now: <img width="700" alt="image" src="https://github.com/user-attachments/assets/0c4b6044-7209-4c4f-ac0c-32070b79daf7" /> <img width="700" alt="image" src="https://github.com/user-attachments/assets/406f93d8-9b0f-4f5a-b8bb-3936990f558c" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-26 19:19:40 +08:00
6ba3i	22c4d72891	tests: improve RAGFlow coverage based on Codecov report (#13219 ) ### What problem does this PR solve? Codecov’s coverage report shows that several RAGFlow code paths are currently untested or under-tested. This makes it easier for regressions to slip in during refactors and feature work. This PR adds targeted automated tests to cover the files and branches highlighted by Codecov, improving confidence in core behavior while keeping runtime functionality unchanged. ### Type of change - [x] Other (please describe): Test coverage improvement (adds/extends unit and integration tests to address Codecov-reported gaps)	2026-02-26 19:03:26 +08:00
Magicbook1108	1aa49a11f0	Feat: support AWS SES smtp (#13195 ) ### What problem does this PR solve? Support AWS SES smtp #13179 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-26 13:49:53 +08:00
writinwaters	74dc43406f	Docs: After careful consideration, the RAGFlow team decided to hold o… (#13226 ) …ff publishing this guide. ### What problem does this PR solve? Removed failsure mode checklist per your request. @JinHai-CN ### Type of change - [x] Documentation Update	2026-02-26 12:39:58 +08:00
balibabu	d2dd0b7e50	Fix: The agent is embedded in the webpage; interrupting its operation will redirect to the login page. #12697 (#13224 ) ### What problem does this PR solve? Fix: The agent is embedded in the webpage; interrupting its operation will redirect to the login page. #12697 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-26 12:39:28 +08:00
chanx	8bce212284	Fix: error in retrieval testing page (#13225 ) ### What problem does this PR solve? Fix: error in retrieval testing page ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-26 12:39:09 +08:00
Angel98518	024edba1b8	fix(web): prevent duplicate i18n languageChanged listeners (#13218 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring	2026-02-26 10:45:50 +08:00
PandaMan	d43aebe701	Fix/13142 auto metadata (#13217 ) ### What problem does this PR solve? Close #13142 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-26 10:25:48 +08:00
Angel98518	b54260bcd7	fix(web): correct initial chat variable enabled state (#13214 ) ## Summary Fixes the initial enabled/disabled state of chat variable checkboxes by correcting a helper function that previously always returned . ## Problem in had two statements: Because of the early , the function always returned , so all chat variable checkboxes were initially disabled regardless of the field. This also made the helper inconsistent with , which enables all fields by default except . ## Fix Update to use the same condition as : This ensures: - All chat variable checkboxes are enabled by default - remains the only field disabled by default - Behavior is consistent between the helper and the checkbox map initialization in . No API or backend changes are involved; this is a small, isolated frontend bugfix.	2026-02-26 10:25:14 +08:00
Magicbook1108	158503a1aa	Feat: optimize ingestion pipeline with preprocess (#13211 ) ### What problem does this PR solve? Feat: optimize ingestion pipeline with preprocess ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-26 10:24:13 +08:00
PSBigBig × MiniPS	b7eca981d4	docs: add RAG failure modes checklist guide (refs #13138 ) (#13204 ) ### What problem does this PR solve? This PR adds a new guide: "RAG failure modes checklist". RAG systems often fail in ways that are not immediately visible from a single metric like accuracy or latency. In practice, debugging production RAG applications requires identifying recurring failure patterns across retrieval, routing, evaluation, and deployment stages. This guide introduces a structured, pattern-based checklist (P01–P12) to help users interpret traces, evaluation results, and dataset behavior within RAGFlow. The goal is to provide a practical way to classify incidents (e.g., retrieval hallucination, chunking issues, index staleness, routing misalignment) and reason about minimal structural fixes rather than ad-hoc prompt changes. The change is documentation-only and does not modify any code or configuration. Refs #13138 ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [x] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2026-02-25 19:35:15 +08:00
writinwaters	f9e0eb38ec	Refact: Updated ingestion pipeline UI. (#13216 ) ### What problem does this PR solve? Updated ingestion pipeline-specific UI tips. ### Type of change - [x] Refactoring	2026-02-25 19:29:04 +08:00
6ba3i	38011f2c16	tests: improve RAGFlow coverage based on Codecov report (#13200 ) ### What problem does this PR solve? Codecov’s coverage report shows that several RAGFlow code paths are currently untested or under-tested. This makes it easier for regressions to slip in during refactors and feature work. This PR adds targeted automated tests to cover the files and branches highlighted by Codecov, improving confidence in core behavior while keeping runtime functionality unchanged. ### Type of change - [x] Other (please describe): Test coverage improvement (adds/extends unit and integration tests to address Codecov-reported gaps)	2026-02-25 19:12:11 +08:00
balibabu	2a5ddf064d	Fix: Note component text area does not resize with component #13065 (#13212 ) ### What problem does this PR solve? Fix: Note component text area does not resize with component #13065 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-25 17:24:07 +08:00
Jimmy Ben Klieve	220e611e33	refactor: ux improvements for variable picker in prompt editor (#13213 ) ### What problem does this PR solve? User experience enhancement for variable picker in prompt editor: - Add case-insensitive string search for variables. - Add basic keyboard navigation in variable picker: - Hit <kbd>UpArrow</kbd> and <kbd>DownArrow</kbd> for navigating. - Hit <kbd>Tab</kbd> or <kbd>Enter</kbd> for selecting focused item into editor. - Fix unexpectedly inserting invalid variable into editor by hitting <kbd>Tab</kbd>. _Note: you still need to pick variables inside secondary menu (agent structured output, etc.) by using your pointing device. May finish these later._ ### Type of change - [x] Refactoring	2026-02-25 17:22:48 +08:00
He Wang	394ff16b66	fix: OceanBase metadata not returned in document list API (#13209 ) ### What problem does this PR solve? Fix #13144. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-25 15:29:17 +08:00
Phives	4ceb668d40	feat(api/utils): Harden file_utils for robustness and edge cases (#12915 ) ## Summary Improves robustness and edge-case handling in `api.utils.file_utils` to avoid crashes, DoS/OOM risks, and timeouts when processing user-provided filenames, paths, and file blobs. ## Changes ### Resource limits & timeouts - `MAX_BLOB_SIZE_THUMBNAIL` (50 MiB) and `MAX_BLOB_SIZE_PDF` (100 MiB) to reject oversized inputs before thumbnail/PDF processing. - `GHOSTSCRIPT_TIMEOUT_SEC` (120 s) for `repair_pdf_with_ghostscript` subprocess to avoid hangs on malicious or broken PDFs. ### `filename_type` - Handles `None`, empty string, non-string (e.g. int/list), and path-only input via new `_normalize_filename_for_type()`. - Uses basename for type detection (e.g. `a/b/c.pdf` → PDF). - Enforces `FILE_NAME_LEN_LIMIT`; invalid input returns `FileType.OTHER`. ### `thumbnail_img` - Rejects `None`/empty/oversized blob and invalid filename; returns `None` instead of raising. - Wraps PDF, image, and PPT handling in try/except so corrupt or malformed files return `None`. - Ensures PDF has pages and PPT has slides before use. - Normalizes PIL image mode (RGBA/P/LA → RGB) for safe PNG export. ### `repair_pdf_with_ghostscript` - Handles `None`/empty input; skips repair when input size exceeds limit. - Uses `subprocess.run(..., timeout=GHOSTSCRIPT_TIMEOUT_SEC)` and catches `TimeoutExpired`. - Returns original bytes when Ghostscript output is empty. ### `read_potential_broken_pdf` - `None` → `b""`; non–sequence-like (no `len`) → `b""`; empty → return as-is. - Oversized blob returned as-is (no repair) to avoid DoS. ### `sanitize_path` - Explicit `None` and non-string check; strips whitespace before normalizing. ## Testing - `test/unit_test/utils/test_api_file_utils.py` added with 36 unit tests covering the above behavior (filename_type, sanitize_path, read_potential_broken_pdf, thumbnail_img, thumbnail, repair_pdf_with_ghostscript, constants). - All tests pass. --------- Co-authored-by: Gittensor Miner <miner@gittensor.io>	2026-02-25 14:34:47 +08:00
PentaFDevs	8ad47bf242	feat: add 'Open in new tab' button for agents (#13044 ) - Add new button in agent management dropdown to open agent in new tab - Implement token-based authentication for shared agent access - Add translations for 9 languages (en, zh, zh-tw, de, fr, it, ru, pt-br, vi) - Keep existing 'Embed into webpage' functionality intact ### What problem does this PR solve? This allows users to open agents in a separate tab to work in background while continuing to use other parts of the application. <img width="1920" height="1080" alt="image" src="https://github.com/user-attachments/assets/ca1719c8-2f00-4570-a730-1321fa0bfd57" /> <img width="254" height="222" alt="image" src="https://github.com/user-attachments/assets/b3dd6d9f-b7e7-46b0-83e7-f0ea86e7b156" /> <img width="1920" height="1080" alt="image" src="https://github.com/user-attachments/assets/e94e99f9-9039-43f7-b2d9-862b9448630c" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-25 13:39:02 +08:00
Yao Wei	cf6fd6f115	fix: When using OceanBase as storage, the list_chunk sorting is abnormal. #13198 (#13208 ) Actual behavior When using OceanBase as storage, the list_chunk sorting is abnormal. The following is the SQL statement. SELECT id, content_with_weight, important_kwd, question_kwd, img_id, available_int, position_int, doc_type_kwd, create_timestamp_flt, create_time, array_to_string(page_num_int, ',') AS page_num_int_sort, array_to_string(top_int, ',') AS top_int_sort FROM rag_store_284250730805059584 WHERE doc_id = '' AND kb_id IN ('') ORDER BY page_num_int_sort ASC, top_int_sort ASC, create_timestamp_flt DESC LIMIT 0, 20 <img width="1610" height="740" alt="image" src="https://github.com/user-attachments/assets/84e14c30-a97f-4e8f-8c8c-6ccac915d97d" /> Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local>	2026-02-25 13:36:18 +08:00
Ray Zhang	cbe64402db	feat(migration): support docker compose -p project name for backup/restore (#13191 ) ### What problem does this PR solve? When users start RAGFlow with `docker compose -p <alias>`, Docker creates volumes prefixed with the alias (e.g., `myproject_mysql_data`). The migration script (`docker/migration.sh`) previously hardcoded the `docker_` prefix in volume names, causing backup/restore to silently skip all volumes for any non-default project name. This PR adds a `-p <project_name>` option so the script correctly targets volumes regardless of the Docker Compose project name used. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Changes - Add `-p <project_name>` flag (default: `docker`) for specifying Docker Compose project name - Build volume names dynamically: `${project_name}_${base_name}` - Update help text with new option documentation and examples - Show project-aware `docker compose` commands in error messages - Fix deprecated `docker-compose` to `docker compose` in hints - Use dynamic step count instead of hardcoded `4` - Fully backward compatible — existing usage without `-p` works unchanged ### Usage ```bash # Existing usage (unchanged) ./migration.sh backup ./migration.sh restore my_backup # New: custom project name ./migration.sh -p myproject backup ./migration.sh -p myproject restore my_backup ```	2026-02-25 13:18:47 +08:00
Yongteng Lei	2bf2abfdbc	Fix: authorization bypass (IDOR) in /v1/document/web_crawl (#13203 ) ### What problem does this PR solve? Fix authorization bypass (IDOR) in `/v1/document/web_crawl` allows Cross-Tenant Dataset Modification. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-25 12:59:41 +08:00
Ahmad Intisar	99d1c9725c	Bug mysql connector empty content resolved: Semantic ID Issue (#13206 ) The RDBMS (MySQL/PostgreSQL) connector generates document filenames using the first 100 characters of the content column (semantic_identifier). When the content contains newline characters (\n), the resulting filename includes those newlines — for example: Category: غير صحيح كليًا\nTitle: تفنيد حقائق....txt RAGFlow's filename_type() function uses re.match(r".\.txt$", filename) to detect file types, but . does not match newline characters by default in Python regex. This causes the regex to fail, returning FileType.OTHER, which triggers: pythonraise RuntimeError("This type of file has not been supported yet!") As a result, all documents synced via the MySQL/PostgreSQL connector are silently discarded. The sync logs report success (e.g., "399 docs synchronized"), but zero documents actually appear in the dataset. This is the root cause of issue #13001. Root cause trace: rdbms_connector.py → _row_to_document() sets semantic_identifier from raw content (may contain \n) connector_service.py → duplicate_and_parse() uses semantic_identifier as the filename file_service.py → upload_document() calls filename_type(filename) file_utils.py → filename_type() regex .*\.txt$ fails on newlines → returns FileType.OTHER upload_document() raises "This type of file has not been supported yet!" Fix: Sanitize the semantic_identifier in _row_to_document() by replacing newlines and carriage returns with spaces before truncating to 100 characters. Relates to: #13001, #12817 Type of change Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>	2026-02-25 12:55:04 +08:00

1 2 3 4 5 ...

5357 Commits