fix: remove company info from resume_summary to prevent over-retrieval (#13358)

### What problem does this PR solve? Problem: When searching for a specific company name like(Daofeng Technology), the search would incorrectly return unrelated resumes containing generic terms like (Technology) in their company names Root Cause: The `corporation_name_tks` field was included in the identity fields that are redundantly written to every chunk. This caused common words like "科技" to match across all chunks, leading to over-retrieval of irrelevant resumes. Solution: Remove `corporation_name_tks` from the `_IDENTITY_FIELDS` list. Company information is still preserved in the "Work Overview" chunk where it belongs, allowing proper company-based searches while preventing false positives from generic terms. --------- Co-authored-by: Aron.Yao <yaowei@192.168.1.68> Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local> Co-authored-by: Liu An <asiro@qq.com>
2026-04-28 22:37:50 +08:00 · 2026-03-04 19:24:49 +08:00
parent 70e9743ef1
commit c99b53064d
1 changed files with 1 additions and 1 deletions
--- a/rag/app/resume.py
+++ b/rag/app/resume.py
@ -2125,7 +2125,7 @@ def _build_chunk_document(filename: str, resume: dict,
    # Extract key identity fields, redundantly written to each chunk
    # These fields are small in size but high in information density; once retrieved, the candidate can be immediately identified
    _IDENTITY_FIELDS = ("name_kwd", "phone_kwd", "email_tks", "gender_kwd",
-                        "highest_degree_kwd", "work_exp_flt", "corporation_name_tks")
+                        "highest_degree_kwd", "work_exp_flt")
    identity_meta = {}
    for ik in _IDENTITY_FIELDS:
        iv = resume.get(ik)