mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-05-04 01:07:48 +08:00
Core optimizations (refer to arXiv:2510.09722): 1. PDF text fusion: Metadata + OCR dual-path extraction and fusion 2. Page-aware reconstruction: YOLOv10 page segmentation + hierarchical sorting + line number indexing 3. Parallel task decomposition: Basic information/work experience/educational background three-way parallel LLM extraction 4. Index pointer mechanism: LLM returns a range of line numbers instead of generating the full text, reducing the illusion of full text. --------- Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local> Co-authored-by: Aron.Yao <yaowei@192.168.1.68> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
1.5 KiB
1.5 KiB
Please extract work experience from the following line-indexed resume text.
{indexed_text}
Extract into JSON, each work experience entry contains: {{ "workExperience": [ {{ "company": "", "position": "", "internship": 0, "start_date": "", "end_date": "", "desc_lines": [start_index, end_index] }} ] }}
Field descriptions:
- company: Full company name (including region info in brackets), e.g. "Google Inc."
- position: Job title, follow original text, do not fabricate or guess
- internship: Whether this is an internship, 1 for yes, 0 for no
- start_date: Start date, format %Y.%m or %Y, e.g. "2024.1"
- end_date: End date, use "Present" if still employed, "" if not available
- desc_lines: [start_line, end_line], line number range for job description (integer array)
- Refers to the original text reference range for job description, including achievements, responsibilities, tech stack, etc.
- Include as much as possible until the next work experience entry or other section heading
- STOP before these section headings (do not include them in desc_lines): Self-evaluation, Personal Summary, Skills, Technical Skills, Education, Project Experience, Certificates, Languages, Hobbies, Career Objective
- Use [] if not available
Example: [22]: Google Inc. 2021.11-2022.11 Senior Engineer [23]: Job description: Responsible for backend development [24]: Achieved 99.9% uptime for core services Then desc_lines should be [23, 24]
Return JSON only. /no_think