ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-04-29 14:57:48 +08:00

Author	SHA1	Message	Date
yuch85	3ad3241ae0	feat: persist RAPTOR layer metadata on summary chunks (#13286 ) ## Summary RAPTOR's recursive clustering builds a `layers` list tracking `(start_idx, end_idx)` boundaries per level, but currently discards this information — only the flat `chunks` list is returned. This makes it impossible to distinguish leaf-level summaries from top-level ones. This PR: - Returns `(chunks, layers)` tuple from `raptor.py`'s `__call__` - Annotates each RAPTOR summary chunk with `raptor_layer_int` (1 = first summary level, 2 = summary-of-summaries, etc.) - Adds `raptor_layer_int` to `infinity_mapping.json` (Elasticsearch handles it via existing `_int` dynamic template) ### Why this matters Downstream features need to know which RAPTOR layer a summary belongs to: - Retrieving the top-level document summary* for entity extraction, search snippets, or document comparison - Filtering by abstraction level — users may want only high-level summaries or only leaf-level cluster summaries - RAPTOR recall quality — #10951 reports summaries not being recalled for definition queries; layer metadata enables targeted retrieval ### Changes \| File \| Change \| LOC \| \|------\|--------\|-----\| \| `rag/raptor.py` \| Return `(chunks, layers)` tuple \| ~3 \| \| `rag/svr/task_executor.py` \| Build `chunk_layer` mapping, set `raptor_layer_int` \| ~12 \| \| `conf/infinity_mapping.json` \| Add `raptor_layer_int` integer field \| ~1 \| ### Backward compatibility - Additive only — no existing fields or behavior changed - Existing RAPTOR chunks continue to work (they'll have `raptor_layer_int = 0` by default) - New RAPTOR chunks get layer metadata automatically ## Test plan - [ ] Parse a document with RAPTOR enabled, verify `raptor_layer_int` is set on indexed chunks - [ ] Verify `raptor_layer_int` values increase with abstraction level (layer 1 < layer 2 < ...) - [ ] Verify existing RAPTOR deletion (`delete by raptor_kwd`) still works - [ ] Verify Infinity backend accepts the new field Fixes #7488 Related: #4104, #11191, #10951 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: yuch85 <yuch85.1@gmail.com> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-04-27 10:20:46 +08:00
Magicbook1108	69264b3a70	Feat: Refact pipeline (#13826 ) ### What problem does this PR solve? ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 19:26:45 +08:00
Liu An	2240fc778c	Fix: add missing "mom" field to infinity_mapping.json for parent-child chunker (#13821 ) ### What problem does this PR solve? When using Infinity as DOC_ENGINE with parent-child chunker enabled, vector insertion fails because the "mom" field is missing from the index mapping. This fix adds the required field to resolve the issue. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-27 13:06:18 +08:00
akie	51210a1762	Add secondary index to infinity (#12825 ) Add secondary index: 1. kb_id 2. available_int --------- Signed-off-by: zpf121 <1219290549@qq.com> Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>	2026-02-02 13:22:29 +08:00
6ba3i	4f036a881d	Fix: Infinity keyword round-trip, highlight fallback, and KB update guards (#12660 ) ### What problem does this PR solve? Fixes Infinity-specific API regressions: preserves ```important_kwd``` round‑trip for ```[""]```, restores required highlight key in retrieval responses, and enforces Infinity guards for unsupported ```parser_id=tag``` and pagerank in ```/v1/kb/update```. Also removes a slow/buggy pandas row-wise apply that was throwing ```ValueError``` and causing flakiness. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-16 20:03:52 +08:00
Kevin Hu	bd76b8ff1a	Fix: Tika server upgrades. (#12073 ) ### What problem does this PR solve? #12037 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-23 09:35:52 +08:00
buua436	dd046be976	Fix: parent-child chunking method (#11810 ) ### What problem does this PR solve? change: parent-child chunking method ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-09 09:34:01 +08:00
Zhichang Yu	40e84ca41a	Use Infinity single-field-multi-index (#11444 ) ### What problem does this PR solve? Use Infinity single-field-multi-index ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-11-26 11:06:37 +08:00
Liu An	a191933f81	Fix(config): Add raptor_kwd field to infinity mapping (#11146 ) ### What problem does this PR solve? fix infinity "INSERT: Column raptor_kwd not found in table" error ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-10 19:02:25 +08:00
Liu An	8af769de41	Fix: add toc_kwd field and update page_num_int type (#10596 ) ### What problem does this PR solve? - Added new field 'toc_kwd' to infinity_mapping.json for table of contents keyword support - Changed page_num_int from integer to array type in task_executor.py to handle multiple page numbers ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-16 12:47:24 +08:00
Zhichang Yu	342a04ec8a	Added infinity rank_feature support (#9044 ) ### What problem does this PR solve? Added infinity rank_feature support ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-07-29 09:14:23 +08:00
Kevin Hu	321a280031	Feat: add image preview to retrieval test. (#7610 ) ### What problem does this PR solve? #7608 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-05-13 14:30:36 +08:00
Zhichang Yu	65a8cd1772	Fix knowledge_graph_kwd on infinity. Close #6476 and #6624 (#6651 ) ### What problem does this PR solve? Fix knowledge_graph_kwd on infinity. Close #6476 and #6624 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-03-28 22:05:40 +08:00
Zhichang Yu	6bf26e2a81	Optimize graphrag again (#6513 ) ### What problem does this PR solve? Removed set_entity and set_relation to avoid accessing doc engine during graph computation. Introduced GraphChange to avoid writing unchanged chunks. ### Type of change - [x] Performance Improvement	2025-03-26 15:34:42 +08:00
汪威	76e8285904	use to_df replace to_pl when get infinity Result (#5604 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Performance Improvement --------- Co-authored-by: wangwei <dwxiayi@163.com>	2025-03-05 09:35:40 +08:00
Kevin Hu	50055c47ec	Infinity mapping refine. (#4665 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2025-01-27 18:53:49 +08:00
Kevin Hu	6f30397bb5	Infinity adapt to graphrag. (#4663 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-27 18:35:18 +08:00
Kevin Hu	71c132f76d	Make infinity adapt (#4635 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-01-24 17:45:04 +08:00
Kevin Hu	dd0ebbea35	Light GraphRAG (#4585 ) ### What problem does this PR solve? #4543 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-22 19:43:14 +08:00
Kevin Hu	c5da3cdd97	Tagging (#4426 ) ### What problem does this PR solve? #4367 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-01-09 17:07:21 +08:00
Kevin Hu	ce1e855328	Upgrades Document Layout Analysis model. (#4054 ) ### What problem does this PR solve? #4052 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-17 11:27:19 +08:00
Zhichang Yu	0bca46ac3a	Migrate infinity at startup (#3858 ) ### What problem does this PR solve? Migrate infinity at startup #3809 https://github.com/infiniflow/infinity/issues/2321 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-13 13:43:56 +08:00
Zhichang Yu	03f00c9e6f	Rename page_num_list, top_list, position_list (#3940 ) ### What problem does this PR solve? Rename page_num_list, top_list, position_list to page_num_int, top_int, position_int ### Type of change - [x] Refactoring	2024-12-10 16:32:58 +08:00
Kevin Hu	56f473b680	Feat: Add question parameter to edit chunk modal (#3875 ) ### What problem does this PR solve? Close #3873 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-05 14:51:19 +08:00
Kevin Hu	74b28ef1b0	Add pagerank to KB. (#3809 ) ### What problem does this PR solve? #3794 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-12-03 14:30:35 +08:00
Zhichang Yu	f4c52371ab	Integration with Infinity (#2894 ) ### What problem does this PR solve? Integration with Infinity - Replaced ELASTICSEARCH with dataStoreConn - Renamed deleteByQuery with delete - Renamed bulk to upsertBulk - getHighlight, getAggregation - Fix KGSearch.search - Moved Dealer.sql_retrieval to es_conn.py ### Type of change - [x] Refactoring	2024-11-12 14:59:41 +08:00

26 Commits