Commit Graph

4 Commits

Author SHA1 Message Date
d5c306de30 Fix: remove unit test checkpoint resume (#14216)
### What problem does this PR solve?

remove unit test checkpoint resume

### Type of change

- [x] Performance Improvement
2026-04-20 11:27:40 +08:00
f930389311 Refact: improve task resume mechanism for graphrag (#14096)
### What problem does this PR solve?

Addresses review feedback on #14074 (Checkpoint mechanism for
long-running workflow jobs, issue #12494).

**Changes based on @yuzhichang's review:**

1. **Renamed `checkpoint_service.py` → `task_checkpoint.py`** as
suggested.
2. **Replaced Redis with direct docEngine queries** as suggested — the
subgraph already gets persisted to the doc store by
`generate_subgraph()`, so we just query for it instead of maintaining a
separate checkpoint in Redis. This is simpler, has no extra dependency,
and uses a single source of truth.

**Changes based on CodeRabbit review:**

3. **Fixed `source_id` query format mismatch** — subgraphs are stored
with `source_id: [doc_id]` (list), but the original query used
`source_id: doc_id` (string). Now follows the same pattern as
`does_graph_contains()` in `rag/graphrag/utils.py`: filter by
`knowledge_graph_kwd` only, then match `source_id` in Python. This
avoids ambiguity across Elasticsearch / Infinity / OceanBase backends.

### Changes

| File | Change |
|---|---|
| `api/db/services/task_checkpoint.py` (new) |
`load_subgraph_from_store()` and `has_raptor_chunks()` — docEngine-based
checkpoint queries |
| `rag/graphrag/general/index.py` | `build_one()` calls
`load_subgraph_from_store()` before running LLM extraction |
| `rag/svr/task_executor.py` | RAPTOR per-doc loop calls
`has_raptor_chunks()` before processing |
| `test/unit_test/rag/graphrag/test_checkpoint_resume.py` (new) | 10
unit tests covering subgraph loading, source_id filtering, edge cases |

### How it works

- **GraphRAG:** Before running expensive LLM entity/relation extraction
for a doc, checks the doc store for an existing subgraph (saved by a
previous interrupted run). If found, loads it directly and skips LLM
calls.
- **RAPTOR:** Before processing a doc, checks if RAPTOR chunks
(`raptor_kwd="raptor"`) already exist for it. If yes, skips.

### Testing

- 10 new unit tests — all passing
- Full existing suite: 617 passed

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-04-15 17:37:28 +08:00
b33d2fdea5 Refa: GraphRAG to use async chat methods instead of thread pool execution (#14002)
### What problem does this PR solve?

GraphRAG _async_chat.

### Type of change

- [x] Refactoring
- [x] Performance Improvement


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Unified chat calls to an async invocation across extractors, improving
timeout handling and ensuring task IDs propagate reliably.
* **Tests**
* Added and expanded unit tests and mocks to cover extractor behavior,
timeout scenarios, and safe test-package imports, reducing regression
risk.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-09 19:57:35 +08:00
e1f1184b01 test: add unit tests for graphrag/utils.py (87 test cases) (#13328)
Add comprehensive unit tests for `graphrag/utils.py`, covering 15
functions/classes with 87 test cases.

Tested functions:
- clean_str, dict_has_keys_with_types, perform_variable_replacements
- get_from_to, compute_args_hash, is_float_regex
- GraphChange dataclass
- handle_single_entity_extraction, handle_single_relationship_extraction
- graph_merge, tidy_graph
- split_string_by_multi_markers, pack_user_ass_to_openai_messages
- is_continuous_subsequence, merge_tuples, flat_uniq_list

All 327 existing + new tests pass with no regressions.
2026-03-05 15:30:43 +08:00