ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-08 08:07:21 +08:00

Author	SHA1	Message	Date
box4wangjing	292b0b8bce	chore: fix some comments to improve readability (#14756 ) ### What problem does this PR solve? fix some comments to improve readability ### Type of change - [x] Documentation Update --------- Signed-off-by: box4wangjing <box4wangjing@outlook.com>	2026-05-11 16:48:48 +08:00
Ricardo-M-L	5ef7f50eef	fix: use context manager for ThreadPoolExecutor in file_service.py (#14144 ) ## Summary - Wrap 2 `ThreadPoolExecutor` instances in `file_service.py` with `with` statement - Ensures threads are properly shut down after all futures complete ## Problem `parse_docs()` (line 532) and the file processing method (line 694) create `ThreadPoolExecutor` instances that are never shut down. In a long-running server process, this leaks thread resources on every invocation — threads remain alive consuming memory even after all submitted work is complete. ## Fix Replace bare `ThreadPoolExecutor()` with `with ThreadPoolExecutor() as exe:` context manager, which calls `executor.shutdown(wait=True)` on exit. ## Test plan - [x] Verified both call sites use `with` statement after fix - [x] No remaining bare `ThreadPoolExecutor` in `file_service.py` - [x] `document_service.py:1066` is a module-level executor (different pattern, not changed in this PR) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 14:02:45 +08:00
web-dev0521	cc207b5b05	Refactor: tidy up ThreadPoolExecutor lifecycle in file_service and task executor (#14668 ) ## Summary - Wrap the `ThreadPoolExecutor` instances in `FileService.parse_docs` and `FileService.get_files` with `with ... as exe:` blocks for deterministic cleanup - Replace the `concurrent.futures.ThreadPoolExecutor` in `do_handle_task` with `asyncio.create_task(asyncio.to_thread(build_TOC, ...))`, preserving the existing parallelism with chunk insertion while leveraging the surrounding async context - Drop the now-unused `import concurrent` and the `executor.shutdown(wait=False)` call in the `finally` block Closes #14622. No behavioral change, no public API change. Net diff: ~19 insertions / 25 deletions across two files. ## Test plan - [ ] `uv run ruff check api/db/services/file_service.py rag/svr/task_executor.py` passes - [ ] Upload a multi-file batch through the chat/file endpoint and confirm `FileService.parse_docs` still returns combined parsed text - [ ] Trigger `FileService.get_files` via the chat reference flow with a mix of image and non-image files; verify both `raw=True` and `raw=False` paths return correctly - [ ] Run a `naive`-parser document task with `toc_extraction: true` and confirm the TOC chunk is generated and inserted exactly as before - [ ] Run a `naive`-parser document task with `toc_extraction: false` and confirm the path with `toc_thread = None` is unaffected - [ ] Cancel a running task to exercise the `finally` block and confirm cleanup still works without the executor shutdown call --------- Co-authored-by: web-dev0521 <jasonpette1783@gmail.com> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-05-11 12:59:00 +08:00
Hunnyboy1217	782084780e	feat(connectors): ETag-based bypass for incremental S3 ingestion (#14628 ) (#14677 ) ### What problem does this PR solve? S3-family connector syncs currently re-download every in-window object just so we can compute `xxhash128(blob)` and compare against `Document.content_hash`. Anything that bumps `LastModified` without changing bytes (`aws s3 cp` touches, bucket re-encryption, etc.) pays full bandwidth and re-parses files that didn't actually change. #14628 covers the broader incremental-ingestion redesign; this PR is the first slice. The fix is a pre-listing short-circuit. `BlobStorageConnector` (S3 / R2 / GCS / OCI / S3-compat) now implements a new `FingerprintConnector` interface: `list_keys()` paginates `list_objects_v2` and yields `KeyRecord(key, fingerprint)` where `fingerprint = xxhash128(ETag)`. The orchestrator joins those against the connector's existing `{doc_id: content_hash}` map and only calls `get_value(key)` when the fingerprint differs. Unchanged keys are skipped entirely — no `GetObject`, no re-parse. No DDL. xxhash128(ETag) is 32 hex chars and reuses the existing `Document.content_hash` column per @yingfeng's suggestion; the connector decides at listing time whether to populate it. Local uploads and connectors that don't opt in fall through to the existing post-download `xxhash128(blob)` path with no behavior change. This is PR-1 of a 4-PR series — full design lives on #14628. Subsequent PRs extend tier 1 to local FS / WebDAV / Dropbox / Seafile / RDBMS (PR-2), wire up tier 2 cursor connectors with `SyncLogs.next_checkpoint` (PR-3), and unify deletion via `KeyRecord(deleted=True)` reconciliation (PR-4). Holding those back keeps this PR additive and reviewable on its own. #### Files touched - `common/data_source/models.py` — new `KeyRecord`; optional `fingerprint` on `Document` - `common/data_source/interfaces.py` — `IncrementalCapability` enum, `FingerprintConnector` ABC - `common/data_source/blob_connector.py` — `BlobStorageConnector` implements `FingerprintConnector`; per-object download factored into `_build_document_from_obj()` so `_yield_blob_objects`, `list_keys`, `get_value` all share it - `rag/svr/sync_data_source.py` — `_BlobLikeBase._fingerprint_filtered_generator` does the bypass loop; `_run_task_logic` plumbs `doc.fingerprint` into the upload dict - `api/db/services/document_service.py` — `list_id_content_hash_map_by_kb_and_source_type()` helper - `api/db/services/connector_service.py` + `file_service.py` — fingerprint flows through `duplicate_and_parse → upload_document` and lands in `content_hash` - `test/unit_test/common/test_blob_connector_fingerprint.py` — 14 tests covering ETag normalization (single-part, multipart, quoted, empty), `list_keys()` not calling `GetObject`, `get_value()` materializing with fingerprint, deterministic/stable fingerprints, and the bypass loop asserting `GetObject` is not called on a match #### Worth flagging for review Old `_BlobLikeBase._generate` called `poll_source(start, now)` with a `LastModified` window when `poll_range_start` was set. New code uses `_fingerprint_filtered_generator` (full bucket listing + fingerprint compare) outside of explicit `reindex=1`. Strictly better for unchanged-bucket cases since it skips `GetObject`, but it does mean every sync now does a full `list_objects_v2` paginate. Should still be cheap for most buckets — flagging in case anyone has a very large bucket where the time-window filter was meaningful. On migration: existing rows have `content_hash = xxhash128(blob)` from the old code. The first sync after this lands sees ETag-derived fingerprints that don't match, re-fetches every object once, and writes the new fingerprint. From the second sync onward the bypass works as expected. "Slow day one, fast every day after." A `fingerprint_backfill: trust` opt-out is sketched in the design doc but not in this PR. #### Test plan - [x] `uv run ruff check` — clean on all 8 touched files - [x] `uv run pytest test/unit_test/common/test_blob_connector_fingerprint.py -v` — 14 passed - [x] Broader unit-test suite — no regressions in anything I touched - [ ] Manual smoke against a real S3 bucket — configure a connector, run sync twice, expect the second sync to log `bypassed=N, fetched=0` and no `GetObject` calls in CloudTrail / bucket access logs - [ ] Manual smoke with `reindex=1` — confirm the full re-download path still works ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-05-09 20:03:56 +08:00
Yingfeng	4ee0702aed	Feat: add skills space to context engine (#13908 ) ### What problem does this PR solve? issue #13714 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-04-30 12:36:03 +08:00
euvre	2846a93998	Fix: Remove hardcoded page limits causing parsing failures on large PDFs (>300 pages) (#14382 ) ### What problem does this PR solve? Fixes #14196 ## Problem When using DeepDOC to parse large PDFs (over 1000 pages), the parser silently truncated processing at 300 pages due to a hardcoded default `page_to=299` in `RAGFlowPdfParser.__images__()`. This caused: - Errors on pages beyond the limit - Poor image quality as the parser attempted to compensate with missing page data - Inconsistent chunk splitting between full PDF imports and partial imports Additionally, the codebase scattered magic numbers (`299`, `600`, `10000`, `100000`, `100000000`, `10000000000`, `10*9`) across 22 files as sentinel values for "parse all pages", making future maintenance error-prone. ## Root Cause ```python # deepdoc/parser/pdf_parser.py (before) def __images__(self, fnm, zoomin=3, page_from=0, page_to=299, callback=None): # Only the first 300 pages were rendered; everything beyond was silently dropped ``` While most callers in `rag/app/.py` correctly passed `to_page=100000`, the base class `RAGFlowPdfParser.__call__()` and `parse_into_bboxes()` invoked `__images__` without forwarding `page_from`/`page_to`, falling back to the restrictive default of 299. ## Solution ### 1. Define constants in `common/constants.py` ```python MAXIMUM_PAGE_NUMBER = 100000 # Used by the parsing layer MAXIMUM_TASK_PAGE_NUMBER = MAXIMUM_PAGE_NUMBER * 1000 # Used by the task/DB layer ``` ### 2. Replace all hardcoded sentinel values \| Layer \| Files Changed \| Old Values \| New Value \| \|---\|---\|---\|---\| \| Deepdoc parsers \| `pdf_parser.py`, `mineru_parser.py`, `docling_parser.py`, `opendataloader_parser.py`, `paddleocr_parser.py`, `docx_parser.py` \| `299`, `600`, `109`, `100000000` \| `MAXIMUM_PAGE_NUMBER` \| \| Chunk parsers \| `naive.py`, `book.py`, `qa.py`, `one.py`, `manual.py`, `paper.py`, `presentation.py`, `laws.py`, `resume.py`, `email.py`, `table.py` \| `100000`, `10000`, `10000000000` \| `MAXIMUM_PAGE_NUMBER` \| \| Task/DB layer** \| `db_models.py`, `task_service.py`, `document_service.py`, `file_service.py` \| `100000000` \| `MAXIMUM_TASK_PAGE_NUMBER` \| ### 3. Fix `parse_into_bboxes()` missing parameters Added `from_page`/`to_page` parameters to `parse_into_bboxes()` so that the `rag/flow/parser/parser.py` DeepDOC path no longer falls back to the restrictive default. ## Files Changed (22) - `common/constants.py` - `deepdoc/parser/pdf_parser.py` - `deepdoc/parser/mineru_parser.py` - `deepdoc/parser/docling_parser.py` - `deepdoc/parser/opendataloader_parser.py` - `deepdoc/parser/paddleocr_parser.py` - `deepdoc/parser/docx_parser.py` - `rag/app/naive.py` - `rag/app/book.py` - `rag/app/qa.py` - `rag/app/one.py` - `rag/app/manual.py` - `rag/app/paper.py` - `rag/app/presentation.py` - `rag/app/laws.py` - `rag/app/resume.py` - `rag/app/email.py` - `rag/app/table.py` - `api/db/db_models.py` - `api/db/services/task_service.py` - `api/db/services/document_service.py` - `api/db/services/file_service.py` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-27 14:57:20 +08:00
Xing Hong	fb95136f39	Fix: validate URL scheme and resolved IP before crawling to prevent SSRF (#14090 ) ### What problem does this PR solve? The POST /upload_info?url=<url> endpoint accepted a user-supplied URL and passed it directly to AsyncWebCrawler without any validation. There were no restrictions on URL scheme, destination hostname, or resolved IP address. This allowed any authenticated user to instruct the server to make outbound HTTP requests to internal infrastructure — including RFC 1918 private networks, loopback addresses, and cloud metadata services such as http://169.254.169.254 — effectively using the server as a proxy for internal network reconnaissance or credential theft. This PR adds an SSRF guard (_validate_url_for_crawl) that runs before any crawl is initiated. It enforces an allowlist of safe schemes (http/https), resolves the hostname at validation time, and rejects any URL whose resolved IP falls within a private or reserved network range. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-25 14:30:15 +08:00
NeedmeFordev	c3f79dbcb0	fix(jira): prevent missed incremental updates after issue edits (#13674 ) ### What problem does this PR solve? Fixes [#13505](https://github.com/infiniflow/ragflow/issues/13505): Jira incremental sync could miss updated issues after initial sync, especially near time boundaries. Root cause: - Jira JQL uses minute-level precision for `updated` filters. - Incremental windows had no overlap buffer, so boundary updates could be skipped. - Sync log cursor tracking used a backward-facing update for `poll_range_start`. - Existing-doc updates in `upload_document` lacked a KB ownership guard for doc-id collisions. What changed: - Added Jira incremental overlap buffer (`time_buffer_seconds`, defaulting to `JIRA_SYNC_TIME_BUFFER_SECONDS`) when building JQL lower-bound time. - Preserved second-level post-filtering to avoid duplicate reprocessing while still catching boundary updates. - Improved Jira sync logging to include start/end window and overlap configuration. - Updated sync cursor tracking in `increase_docs` to keep `poll_range_start` moving forward with max update time. - Added KB ID safety check before updating existing document records in `upload_document`. Verification performed: - Python syntax compile checks passed for modified files. - Manual verification flow: 1. Run full Jira sync. 2. Edit an already-indexed Jira issue. 3. Run next incremental sync. 4. Confirm updated content is re-ingested into KB. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-18 23:31:05 +08:00
BitToby	383986dc5f	fix: re-chunk documents when data source content is updated (#12918 ) Closes: #12889 ### What problem does this PR solve? When syncing external data sources (e.g., Jira, Confluence, Google Drive), updated documents were not being re-chunked. The raw content was correctly updated in blob storage, but the vector database retained stale chunks, causing search results to return outdated information. Root cause: The task digest used for chunk reuse optimization was calculated only from parser configuration fields (`parser_id`, `parser_config`, `kb_id`, etc.), without any content-dependent fields. When a document's content changed but the parser configuration remained the same, the system incorrectly reused old chunks instead of regenerating new ones. Example scenario: 1. User syncs a Jira issue: "Meeting scheduled for Monday" 2. User updates the Jira issue to: "Meeting rescheduled to Friday" 3. User triggers sync again 4. Raw content panel shows updated text ✓ 5. Chunk panel still shows old text "Monday" ✗ Solution: 1. Include `update_time` and `size` in the chunking config, so the task digest changes when document content is updated 2. Track updated documents separately in `upload_document()` and return them for processing 3. Process updated documents through the re-parsing pipeline to regenerate chunks [1.webm](https://github.com/user-attachments/assets/d21d4dcd-e189-4d39-8700-053bae0ca5a0) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 12:48:47 +08:00
statxc	839b603768	feat: Add PDF parser selection to Agent Begin and Await Response comp… (#13325 ) ### Issue: #12756 ### What problem does this PR solve? When users upload files through Agent's Begin or Await Response components, the parsing is hardcoded to "Plain Text", ignoring all other available parsers (DeepDOC, TCADP, Docling, MinerU, PaddleOCR). This PR adds a PDF parser dropdown to these components so users can select the appropriate parser for their file inputs. ### Changes Backend - `agent/component/fillup.py` - Added `layout_recognize` param to `UserFillUpParam`, forwarded to `FileService.get_files()` - `agent/component/begin.py` - Same forwarding in `Begin._invoke()` - `agent/canvas.py` - Extract Begin's `layout_recognize` for `sys.files` parsing, added param to `get_files_async()` / `get_files()` - `api/db/services/file_service.py` - Added `layout_recognize` param to `parse()` and `get_files()`, replacing hardcoded `"Plain Text"` - `rag/app/naive.py` - Added `"plain text"` and `"tcadp parser"` aliases to PARSERS dict to match dropdown values after `.lower()` Frontend - `web/src/pages/agent/form/begin-form/index.tsx` - Show `LayoutRecognizeFormField` dropdown when file inputs exist - `web/src/pages/agent/form/begin-form/schema.ts` - Added `layout_recognize` to Zod schema - `web/src/pages/agent/form/user-fill-up-form/index.tsx` - Same dropdown for Await Response component ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-04 11:09:33 +08:00
Magicbook1108	109441628b	Fix: upload image files (#13071 ) ### What problem does this PR solve? Fix: upload image files ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-11 09:47:33 +08:00
Magicbook1108	b40a7b2e7d	Feat: Hash doc id to avoid duplicate name. (#12573 ) ### What problem does this PR solve? Feat: Hash doc id to avoid duplicate name. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-15 14:02:15 +08:00
wenjuhao	55c0468ac9	Include document_id in knowledgebase info retrieval (#12041 ) ### What problem does this PR solve? After a file in the file list is associated with a knowledge base, the knowledge base document ID is returned ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-19 19:32:24 +08:00
Jin Hai	30019dab9f	Change knowledge base to dataset (#11976 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-12-17 10:03:33 +08:00
Yongteng Lei	b6c4722687	Refa: make RAGFlow more asynchronous (#11601 ) ### What problem does this PR solve? Try to make this more asynchronous. Verified in chat and agent scenarios, reducing blocking behavior. #11551, #11579. However, the impact of these changes still requires further investigation to ensure everything works as expected. ### Type of change - [x] Refactoring	2025-12-01 14:24:06 +08:00
Kevin Hu	14616cf845	Feat: add child parent chunking method in backend. (#11598 ) ### What problem does this PR solve? #7996 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-28 19:25:32 +08:00
Kevin Hu	1c201c4d54	Fix: circle imports issue. (#11374 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-19 16:13:21 +08:00
Kevin Hu	d1716d865a	Feat: Alter flask to Quart for async API serving. (#11275 ) ### What problem does this PR solve? #11277 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-18 17:05:16 +08:00
YngvarHuang	bd5dda6b10	Feature/doc upload api add parent path 20251112 (#11231 ) ### What problem does this PR solve? Add the specified parent_path to the document upload api interface (#11230) ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: virgilwong <hyhvirgil@gmail.com>	2025-11-13 09:59:39 +08:00
Jin Hai	f98b24c9bf	Move api.settings to common.settings (#11036 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-06 09:36:38 +08:00
Jin Hai	bab3fce136	Move some constants to common (#11004 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 08:01:39 +08:00
Kevin Hu	3e5a39482e	Feat: Support multiple data sources synchronizations (#10954 ) ### What problem does this PR solve? #10953 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-03 19:59:18 +08:00
Jin Hai	f52e56c2d6	Remove 'get_lan_ip' and add common misc_utils.py (#10880 ) ### What problem does this PR solve? Add get_uuid, download_img and hash_str2int into misc_utils.py ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-10-31 16:42:01 +08:00
Yongteng Lei	86b254d214	Improve file management (#10577 ) ### What problem does this PR solve? Improve file management. #10287. Passed tests: 1. Create folder `A` and `B`. 2. Upload a file inside `A`, called `file`. 3. Create a KB, called `K`. 3. Link `file` to `K`. 4. Parse `file` inside of `K`. (OK) 5. Move `file` from `A` to `B`. 6. Parse `file` inside of `K`. (OK) 7. Move `file` from `B` to `A`. 8. Parse `file` inside of `K`. (OK) 9. Move entire folder `A` into `B`. (B -> A -> file) 10. Parse `file` inside of `K`. (OK) 11. Delete folder `B`. 12. All clear. (There is no document inside of `K`) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-16 09:38:25 +08:00
Kevin Hu	cbf04ee470	Feat: Use data pipeline to visualize the parsing configuration of the knowledge base (#10423 ) ### What problem does this PR solve? #9869 ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: jinhai <haijin.chn@gmail.com> Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: chanx <1243304602@qq.com> Co-authored-by: balibabu <cike8899@users.noreply.github.com> Co-authored-by: Lynn <lynn_inf@hotmail.com> Co-authored-by: 纷繁下的无奈 <zhileihuang@126.com> Co-authored-by: huangzl <huangzl@shinemo.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Wilmer <33392318@qq.com> Co-authored-by: Adrian Weidig <adrianweidig@gmx.net> Co-authored-by: Zhichang Yu <yuzhichang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: Liu An <asiro@qq.com> Co-authored-by: buua436 <66937541+buua436@users.noreply.github.com> Co-authored-by: BadwomanCraZY <511528396@qq.com> Co-authored-by: cucusenok <31804608+cucusenok@users.noreply.github.com> Co-authored-by: Russell Valentine <russ@coldstonelabs.org> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Billy Bao <newyorkupperbay@gmail.com> Co-authored-by: Zhedong Cen <cenzhedong2@126.com> Co-authored-by: TensorNull <129579691+TensorNull@users.noreply.github.com> Co-authored-by: TensorNull <tensor.null@gmail.com> Co-authored-by: TeslaZY <TeslaZY@outlook.com> Co-authored-by: Ajay <160579663+aybanda@users.noreply.github.com> Co-authored-by: AB <aj@Ajays-MacBook-Air.local> Co-authored-by: 天海蒼灆 <huangaoqin@tecpie.com> Co-authored-by: He Wang <wanghechn@qq.com> Co-authored-by: Atsushi Hatakeyama <atu729@icloud.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Mohamed Mathari <155896313+melmathari@users.noreply.github.com> Co-authored-by: Mohamed Mathari <nocodeventure@Mac-mini-van-Mohamed.fritz.box> Co-authored-by: Stephen Hu <stephenhu@seismic.com> Co-authored-by: Shaun Zhang <zhangwfjh@users.noreply.github.com> Co-authored-by: zhimeng123 <60221886+zhimeng123@users.noreply.github.com> Co-authored-by: mxc <mxc@example.com> Co-authored-by: Dominik Novotný <50611433+SgtMarmite@users.noreply.github.com> Co-authored-by: EVGENY M <168018528+rjohny55@users.noreply.github.com> Co-authored-by: mcoder6425 <mcoder64@gmail.com> Co-authored-by: lemsn <lemsn@msn.com> Co-authored-by: lemsn <lemsn@126.com> Co-authored-by: Adrian Gora <47756404+adagora@users.noreply.github.com> Co-authored-by: Womsxd <45663319+Womsxd@users.noreply.github.com> Co-authored-by: FatMii <39074672+FatMii@users.noreply.github.com>	2025-10-09 12:36:19 +08:00
Lynn	2d5d10ecbf	Feat/admin drop user (#10342 ) ### What problem does this PR solve? - Admin client support drop user. Issue: #10241 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-09-29 10:16:13 +08:00
Yongteng Lei	2b3318cd3d	Fix: KB folder may not there while creating virtual file (#9431 ) ### What problem does this PR solve? KB folder may not there while creating virtual file. #9423 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-08-14 09:40:30 +08:00
Kevin Hu	d9fe279dde	Feat: Redesign and refactor agent module (#9113 ) ### What problem does this PR solve? #9082 #6365 <u> WARNING: it's not compatible with the older version of `Agent` module, which means that `Agent` from older versions can not work anymore.</u> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-30 19:41:09 +08:00
Yongteng Lei	c1f6e6f00e	Feat: add advanced document filter (#8723 ) ### What problem does this PR solve? Add advanced document filter ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-07-09 09:33:11 +08:00
Liu An	0a13d79b94	Refa: Implement centralized file name length limit using FILE_NAME_LEN_LIMIT constant (#8318 ) ### What problem does this PR solve? - Replace hardcoded 255-byte file name length checks with FILE_NAME_LEN_LIMIT constant - Update error messages to show the actual limit value - #8290 ### Type of change - [x] Refactoring Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-06-17 18:01:30 +08:00
Jin Hai	31f4d44c73	Update upload filename length limit from 128 to 256, which is aligned with os (#7971 ) ### What problem does this PR solve? Change filename length limit from 128 to 256 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-05-30 14:25:59 +08:00
Yongteng Lei	0ebf05440e	Feat: repair corrupted PDF files on upload automatically (#7693 ) ### What problem does this PR solve? Try the best to repair corrupted PDF files on upload automatically. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-19 14:54:06 +08:00
Stephen Hu	2fa8e3309f	Fix: file name length limit mismtach (#7630 ) ### What problem does this PR solve? Close #7597 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-05-14 10:13:03 +08:00
Xinghan Pan	47926f7d21	Improve API Documentation, Standardize Error Handling, and Enhance Comments (#5990 ) ### What problem does this PR solve? - The API documentation lacks detailed error code explanations. Added error code tables to `python_api_reference.md` and `http_api_reference.md` to clarify possible error codes and their meanings. - Error handling in the codebase is inconsistent. Standardized error handling logic in `sdk/python/ragflow_sdk/modules/chunk.py`. - Improved API comments by adding standardized docstrings to enhance code readability and maintainability. ### Type of change - [x] Documentation Update - [x] Refactoring	2025-03-13 19:06:50 +08:00
Kevin Hu	dd0ebbea35	Light GraphRAG (#4585 ) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-22 19:43:14 +08:00
Kevin Hu	c5da3cdd97	Tagging (#4426 ) ### What problem does this PR solve? #4367 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-09 17:07:21 +08:00
Kevin Hu	cb45431412	Fix Voyage re-rank model. Limit file name length. (#4171 ) ### What problem does this PR solve? #4152 #4154 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-12-23 10:03:50 +08:00
Zhichang Yu	0d68a6cd1b	Fix errors detected by Ruff (#3918 ) ### What problem does this PR solve? Fix errors detected by Ruff ### Type of change - [x] Refactoring	2024-12-08 14:21:12 +08:00
Zhichang Yu	30f6421760	Use consistent log file names, introduced initLogger (#3403 ) ### What problem does this PR solve? Use consistent log file names, introduced initLogger ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2024-11-14 17:13:48 +08:00
Kevin Hu	4caf932808	fix bug about fetching knowledge graph (#3394 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-11-14 12:29:15 +08:00
Zhichang Yu	a2a5631da4	Rework logging (#3358 ) Unified all log files into one. ### What problem does this PR solve? Unified all log files into one. ### Type of change - [x] Refactoring	2024-11-12 17:35:13 +08:00
chongchuanbing	2a86472b88	fix chat and thumbnail bug (#2803 ) ### What problem does this PR solve? 1. fix white screen issue when chat response 2. thumbnail bug when document not support ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): --------- Co-authored-by: chongchuanbing <chongchuanbing@gmail.com>	2024-10-11 16:10:27 +08:00
chongchuanbing	485bfd6c08	fix: Large document thumbnail display failed (#2763 ) ### What problem does this PR solve? In MySQL, when the thumbnail base64 of a document is relatively large, the display of the document's thumbnail fails. Now, I put the document thumbnail into MiniIO storage. ### Type of change - [✓] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: chongchuanbing <chongchuanbing@gmail.com>	2024-10-10 09:09:29 +08:00
Kevin Hu	2324b88579	fix parser for pptx of which files are from filemanager (#2482 ) ### What problem does this PR solve? #2467 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-09-18 19:13:37 +08:00
Kevin Hu	e7dd487779	fix ppt file from filemanager error (#2470 ) ### What problem does this PR solve? #2467 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-09-18 09:22:14 +08:00
Fachuan Bai	8dd3adc443	Storage: Support the s3, azure blob as the object storage of ragflow. (#2278 ) ### What problem does this PR solve? issue: https://github.com/infiniflow/ragflow/issues/2277 _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2024-09-09 09:41:14 +08:00
Kevin Hu	4810cb2dc9	refine upload & parse (#1969 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-08-15 19:30:43 +08:00
Kevin Hu	8badf3f423	fix api argument error (#1920 ) ### What problem does this PR solve? #1918 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-08-13 10:15:55 +08:00
Kevin Hu	4fecc2fae6	fix no modual error (#1892 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2024-08-09 17:02:18 +08:00
Kevin Hu	e3cf14a3c9	add function: upload and parse (#1889 ) ### What problem does this PR solve? #1880 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-08-09 16:20:02 +08:00

1 2

58 Commits