fix(infinity): declare extra field + serialize dict on write to unbreak RAPTOR (#14998)

### What problem does this PR solve? Fixes #14997. RAPTOR builds on the Infinity backend have been broken since v0.25.2 introduced the `extra` field in code (`rag/svr/task_executor.py:1011`) without declaring it in `conf/infinity_mapping.json`. Every RAPTOR job fails with: ``` infinity.common.InfinityException: (3013, 'Fail to bind the expression: extra@src/planner/expression_binder_impl.cpp:99') ``` The auto-migration in `common/doc_store/infinity_conn_base.py:_migrate_db()` adds any columns it finds in the mapping JSON to existing tables — so the only thing standing between users and a working RAPTOR build is that one missing declaration. OceanBase, ES, and OpenSearch were unaffected because they store `extra` as a native JSON type; only Infinity (which has a strict `varchar`/`integer`/`float` schema) needed the addition. ### The fix Two-part change: 1. **`conf/infinity_mapping.json`**: declare `"extra": {"type": "varchar", "default": ""}`. On next startup, `_migrate_db()` adds the column to all existing chunk tables — no manual DDL needed for upgrading installations. 2. **`rag/utils/infinity_conn.py` `insert()`**: serialize the `extra` dict to a JSON string at write time, since Infinity's `varchar` can't store a Python dict directly. Modelled on the existing `chunk_data` handling a few lines above. The read path (`rag/utils/raptor_utils.py:_as_extra_dict`) already normalises both dict and JSON-string inputs, so no read-side change is needed. Other backends are untouched — `task_executor.py` still writes the dict, and the OceanBase/ES/OpenSearch insert paths handle dicts natively. ### Verification Tested on a v0.25.4 deployment with the Infinity backend by applying the same two changes via mounted-volume override: - Confirmed `_migrate_db()` adds the `extra` column to all pre-existing chunk tables on startup (column visible via Infinity's `show_columns()`). - Triggered RAPTOR builds on four datasets (~21k chunks total) via `POST /api/v1/datasets/<id>/index?type=raptor`. - All four progressed past the previously-failing `get_raptor_chunk_methods()` call into actual entity-extraction and clustering work without the (3013) error. - GraphRAG builds (which can trigger the same path indirectly via `task_executor.py:857`) also progressed cleanly. ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue)
2026-05-28 11:43:06 +08:00 · 2026-05-19 15:10:03 +05:30
parent f6537ae4ce
commit eacec86500
2 changed files with 12 additions and 1 deletions
--- a/conf/infinity_mapping.json
+++ b/conf/infinity_mapping.json
@ -39,5 +39,6 @@
 	"doc_type_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
 	"toc_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
 	"raptor_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
-	"raptor_layer_int": {"type": "integer", "default": 0}
+	"raptor_layer_int": {"type": "integer", "default": 0},
+	"extra": {"type": "varchar", "default": ""}
 }
--- a/rag/utils/infinity_conn.py
+++ b/rag/utils/infinity_conn.py
@ -438,6 +438,16 @@ class InfinityConnection(InfinityConnectionBase):
                            d[k] = json.dumps(v)
                        else:
                            d[k] = v
+                    elif k == "extra":
+                        # RAPTOR writes {"raptor_method": ...} as a dict; Infinity's
+                        # `extra` column is varchar so we serialize on the write path.
+                        # The read path (raptor_utils._as_extra_dict) already accepts
+                        # both dict and JSON-string. Other backends (OceanBase JSON
+                        # column, ES/OpenSearch) keep dict shape — this is Infinity-only.
+                        if isinstance(v, dict):
+                            d[k] = json.dumps(v)
+                        else:
+                            d[k] = v if v else ""
                    elif k == "kb_id":
                        if isinstance(d[k], list):
                            d[k] = d[k][0]  # since d[k] is a list, but we need a str