From eacec8650051ec227bb251f597bfc4f2a3d095c7 Mon Sep 17 00:00:00 2001
From: Prateek Jain <mrprateekjain@gmail.com>
Date: Tue, 19 May 2026 15:10:03 +0530
Subject: [PATCH] fix(infinity): declare `extra` field + serialize dict on
 write to unbreak RAPTOR (#14998)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

### What problem does this PR solve?

Fixes #14997.

RAPTOR builds on the Infinity backend have been broken since v0.25.2
introduced the `extra` field in code (`rag/svr/task_executor.py:1011`)
without declaring it in `conf/infinity_mapping.json`. Every RAPTOR job
fails with:

```
infinity.common.InfinityException: (3013, 'Fail to bind the expression: extra@src/planner/expression_binder_impl.cpp:99')
```

The auto-migration in
`common/doc_store/infinity_conn_base.py:_migrate_db()` adds any columns
it finds in the mapping JSON to existing tables — so the only thing
standing between users and a working RAPTOR build is that one missing
declaration. OceanBase, ES, and OpenSearch were unaffected because they
store `extra` as a native JSON type; only Infinity (which has a strict
`varchar`/`integer`/`float` schema) needed the addition.

### The fix

Two-part change:

1. **`conf/infinity_mapping.json`**: declare `"extra": {"type":
"varchar", "default": ""}`. On next startup, `_migrate_db()` adds the
column to all existing chunk tables — no manual DDL needed for upgrading
installations.
2. **`rag/utils/infinity_conn.py` `insert()`**: serialize the `extra`
dict to a JSON string at write time, since Infinity's `varchar` can't
store a Python dict directly. Modelled on the existing `chunk_data`
handling a few lines above.

The read path (`rag/utils/raptor_utils.py:_as_extra_dict`) already
normalises both dict and JSON-string inputs, so no read-side change is
needed. Other backends are untouched — `task_executor.py` still writes
the dict, and the OceanBase/ES/OpenSearch insert paths handle dicts
natively.

### Verification

Tested on a v0.25.4 deployment with the Infinity backend by applying the
same two changes via mounted-volume override:

- Confirmed `_migrate_db()` adds the `extra` column to all pre-existing
chunk tables on startup (column visible via Infinity's
`show_columns()`).
- Triggered RAPTOR builds on four datasets (~21k chunks total) via `POST
/api/v1/datasets/<id>/index?type=raptor`.
- All four progressed past the previously-failing
`get_raptor_chunk_methods()` call into actual entity-extraction and
clustering work without the (3013) error.
- GraphRAG builds (which can trigger the same path indirectly via
`task_executor.py:857`) also progressed cleanly.

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
---
 conf/infinity_mapping.json |  3 ++-
 rag/utils/infinity_conn.py | 10 ++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/conf/infinity_mapping.json b/conf/infinity_mapping.json
index 5f7ed80f2..893e18632 100644
--- a/conf/infinity_mapping.json
+++ b/conf/infinity_mapping.json
@@ -39,5 +39,6 @@
 	"doc_type_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
 	"toc_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
 	"raptor_kwd": {"type": "varchar", "default": "", "analyzer": "whitespace-#"},
-	"raptor_layer_int": {"type": "integer", "default": 0}
+	"raptor_layer_int": {"type": "integer", "default": 0},
+	"extra": {"type": "varchar", "default": ""}
 }
diff --git a/rag/utils/infinity_conn.py b/rag/utils/infinity_conn.py
index 7ffd9f13d..9407ab11a 100644
--- a/rag/utils/infinity_conn.py
+++ b/rag/utils/infinity_conn.py
@@ -438,6 +438,16 @@ class InfinityConnection(InfinityConnectionBase):
                             d[k] = json.dumps(v)
                         else:
                             d[k] = v
+                    elif k == "extra":
+                        # RAPTOR writes {"raptor_method": ...} as a dict; Infinity's
+                        # `extra` column is varchar so we serialize on the write path.
+                        # The read path (raptor_utils._as_extra_dict) already accepts
+                        # both dict and JSON-string. Other backends (OceanBase JSON
+                        # column, ES/OpenSearch) keep dict shape — this is Infinity-only.
+                        if isinstance(v, dict):
+                            d[k] = json.dumps(v)
+                        else:
+                            d[k] = v if v else ""
                     elif k == "kb_id":
                         if isinstance(d[k], list):
                             d[k] = d[k][0]  # since d[k] is a list, but we need a str