mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-04-28 22:37:50 +08:00
fix: remove company info from resume_summary to prevent over-retrieval (#13358)
### What problem does this PR solve? Problem: When searching for a specific company name like(Daofeng Technology), the search would incorrectly return unrelated resumes containing generic terms like (Technology) in their company names Root Cause: The `corporation_name_tks` field was included in the identity fields that are redundantly written to every chunk. This caused common words like "科技" to match across all chunks, leading to over-retrieval of irrelevant resumes. Solution: Remove `corporation_name_tks` from the `_IDENTITY_FIELDS` list. Company information is still preserved in the "Work Overview" chunk where it belongs, allowing proper company-based searches while preventing false positives from generic terms. --------- Co-authored-by: Aron.Yao <yaowei@192.168.1.68> Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local> Co-authored-by: Liu An <asiro@qq.com>
This commit is contained in:
@ -2125,7 +2125,7 @@ def _build_chunk_document(filename: str, resume: dict,
|
||||
# Extract key identity fields, redundantly written to each chunk
|
||||
# These fields are small in size but high in information density; once retrieved, the candidate can be immediately identified
|
||||
_IDENTITY_FIELDS = ("name_kwd", "phone_kwd", "email_tks", "gender_kwd",
|
||||
"highest_degree_kwd", "work_exp_flt", "corporation_name_tks")
|
||||
"highest_degree_kwd", "work_exp_flt")
|
||||
identity_meta = {}
|
||||
for ik in _IDENTITY_FIELDS:
|
||||
iv = resume.get(ik)
|
||||
|
||||
Reference in New Issue
Block a user