fix: remove company info from resume_summary to prevent over-retrieval (#13358)

### What problem does this PR solve?

Problem: When searching for a specific company name like(Daofeng
Technology), the search would incorrectly return unrelated resumes
containing generic terms like (Technology) in their company names

Root Cause: The `corporation_name_tks` field was included in the
identity fields that are redundantly written to every chunk. This caused
common words like "科技" to match across all chunks, leading to
over-retrieval of irrelevant resumes.

Solution: Remove `corporation_name_tks` from the `_IDENTITY_FIELDS`
list. Company information is still preserved in the "Work Overview"
chunk where it belongs, allowing proper company-based searches while
preventing false positives from generic terms.

---------

Co-authored-by: Aron.Yao <yaowei@192.168.1.68>
Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local>
Co-authored-by: Liu An <asiro@qq.com>
This commit is contained in:
Yao Wei
2026-03-04 19:24:49 +08:00
committed by GitHub
parent 70e9743ef1
commit c99b53064d

View File

@ -2125,7 +2125,7 @@ def _build_chunk_document(filename: str, resume: dict,
# Extract key identity fields, redundantly written to each chunk
# These fields are small in size but high in information density; once retrieved, the candidate can be immediately identified
_IDENTITY_FIELDS = ("name_kwd", "phone_kwd", "email_tks", "gender_kwd",
"highest_degree_kwd", "work_exp_flt", "corporation_name_tks")
"highest_degree_kwd", "work_exp_flt")
identity_meta = {}
for ik in _IDENTITY_FIELDS:
iv = resume.get(ik)