Fix: duplicate content in chunk (#12655)

### What problem does this PR solve?

Fix: duplicate content in chunk #12336

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
This commit is contained in:
Magicbook1108
2026-01-16 15:32:04 +08:00
committed by GitHub
parent 2b20d0b3bb
commit 045314a1aa

View File

@ -554,6 +554,7 @@ class RAGFlowPdfParser:
merged_boxes.extend(bxs)
# self.boxes = sorted(merged_boxes, key=lambda x: (x["page_number"], x.get("col_id", 0), x["top"]))
self.boxes = merged_boxes
def _final_reading_order_merge(self, zoomin=3):
if not self.boxes: