mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-05-21 00:36:43 +08:00
### What problem does this PR solve? ## Problem During the REST API refactoring (#13690), the `/api/v2/kb/check_embedding` endpoint was removed and never migrated to the new RESTful structure. The frontend was pointed to the `/api/v1/datasets/{id}/embedding` endpoint (which is `run_embedding` — a completely different function). Additionally, a hard guard was introduced that rejects any `embd_id` change when `chunk_num > 0`, making it impossible to switch embedding models on datasets with existing chunks. ## Root Cause 1. **Missing endpoint**: The old `check_embedding` logic (sample random chunks, re-embed with the new model, compare cosine similarity) was not carried over to the new REST API service layer. 2. **Wrong frontend URL**: `checkEmbedding` in `api.ts` pointed to `/datasets/{id}/embedding` (`run_embedding`) instead of a dedicated check endpoint. 3. **Overly restrictive guard**: `dataset_api_service.py` line 310 blocked all `embd_id` updates when `chunk_num > 0`. This check did not exist in the pre-refactor code — it was incorrectly introduced during the refactor. ## Changes ### Backend - **`api/apps/services/dataset_api_service.py`** - Remove the `chunk_num > 0` hard guard on `embd_id` updates - Add `check_embedding()` service function: samples random chunks, re-embeds them with the candidate model, computes cosine similarity, returns compatibility result (avg ≥ 0.9 = compatible) - Add `import re` for the `_clean()` helper - **`api/apps/restful_apis/dataset_api.py`** - Add `POST /datasets/<dataset_id>/embedding/check` endpoint following the new REST API conventions - Clean up unused top-level imports (`random`, `re`, `numpy`) ### Frontend - **`web/src/utils/api.ts`** - Fix `checkEmbedding` URL from `/datasets/${datasetId}/embedding` → `/datasets/${datasetId}/embedding/check` ### Tests - **`test/testcases/test_http_api/test_dataset_management/test_update_dataset.py`** - Update `test_embedding_model_with_existing_chunks` to assert success (`code == 0`) instead of expecting the old `102` error - **`test/testcases/test_web_api/test_dataset_management/test_dataset_sdk_routes_unit.py`** - Update `test_update_route_branch_matrix_unit` to assert `RetCode.SUCCESS` when updating `embd_id` on a chunked dataset, replacing the old `chunk_num` error assertion ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: noob <yixiao121314@outlook.com>