From eacec8650051ec227bb251f597bfc4f2a3d095c7 Mon Sep 17 00:00:00 2001 From: Prateek Jain Date: Tue, 19 May 2026 15:10:03 +0530 Subject: [PATCH] fix(infinity): declare `extra` field + serialize dict on write to unbreak RAPTOR (#14998) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ### What problem does this PR solve? Fixes #14997. RAPTOR builds on the Infinity backend have been broken since v0.25.2 introduced the `extra` field in code (`rag/svr/task_executor.py:1011`) without declaring it in `conf/infinity_mapping.json`. Every RAPTOR job fails with: ``` infinity.common.InfinityException: (3013, 'Fail to bind the expression: extra@src/planner/expression_binder_impl.cpp:99') ``` The auto-migration in `common/doc_store/infinity_conn_base.py:_migrate_db()` adds any columns it finds in the mapping JSON to existing tables — so the only thing standing between users and a working RAPTOR build is that one missing declaration. OceanBase, ES, and OpenSearch were unaffected because they store `extra` as a native JSON type; only Infinity (which has a strict `varchar`/`integer`/`float` schema) needed the addition. ### The fix Two-part change: 1. **`conf/infinity_mapping.json`**: declare `"extra": {"type": "varchar", "default": ""}`. On next startup, `_migrate_db()` adds the column to all existing chunk tables — no manual DDL needed for upgrading installations. 2. **`rag/utils/infinity_conn.py` `insert()`**: serialize the `extra` dict to a JSON string at write time, since Infinity's `varchar` can't store a Python dict directly. Modelled on the existing `chunk_data` handling a few lines above. The read path (`rag/utils/raptor_utils.py:_as_extra_dict`) already normalises both dict and JSON-string inputs, so no read-side change is needed. Other backends are untouched — `task_executor.py` still writes the dict, and the OceanBase/ES/OpenSearch insert paths handle dicts natively. ### Verification Tested on a v0.25.4 deployment with the Infinity backend by applying the same two changes via mounted-volume override: - Confirmed `_migrate_db()` adds the `extra` column to all pre-existing chunk tables on startup (column visible via Infinity's `show_columns()`). - Triggered RAPTOR builds on four datasets (~21k chunks total) via `POST /api/v1/datasets//index?type=raptor`. - All four progressed past the previously-failing `get_raptor_chunk_methods()` call into actual entity-extraction and clustering work without the (3013) error. - GraphRAG builds (which can trigger the same path indirectly via `task_executor.py:857`) also progressed cleanly. ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue) --- conf/infinity_mapping.json | 3 ++- rag/utils/infinity_conn.py | 10 ++++++++++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/conf/infinity_mapping.json b/conf/infinity_mapping.json index 5f7ed80f2..893e18632 100644 --- a/conf/infinity_mapping.json +++ b/conf/infinity_mapping.json @@ -39,5 +39,6 @@ "doc_type_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"}, "toc_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"}, "raptor_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"}, - "raptor_layer_int": {"type": "integer", "default": 0} + "raptor_layer_int": {"type": "integer", "default": 0}, + "extra": {"type": "varchar", "default": ""} } diff --git a/rag/utils/infinity_conn.py b/rag/utils/infinity_conn.py index 7ffd9f13d..9407ab11a 100644 --- a/rag/utils/infinity_conn.py +++ b/rag/utils/infinity_conn.py @@ -438,6 +438,16 @@ class InfinityConnection(InfinityConnectionBase): d[k] = json.dumps(v) else: d[k] = v + elif k == "extra": + # RAPTOR writes {"raptor_method": ...} as a dict; Infinity's + # `extra` column is varchar so we serialize on the write path. + # The read path (raptor_utils._as_extra_dict) already accepts + # both dict and JSON-string. Other backends (OceanBase JSON + # column, ES/OpenSearch) keep dict shape — this is Infinity-only. + if isinstance(v, dict): + d[k] = json.dumps(v) + else: + d[k] = v if v else "" elif k == "kb_id": if isinstance(d[k], list): d[k] = d[k][0] # since d[k] is a list, but we need a str