Files
dify/api/models
Frederick2313072 626e71cb3b feat: implement content-based deduplication for document segments
- Add database index on (dataset_id, index_node_hash) for efficient deduplication queries
- Add deduplication check in SegmentService.create_segment and multi_create_segment methods
- Add deduplication check in DatasetDocumentStore.add_documents method to prevent duplicate embedding processing
- Skip creating segments with identical content hashes across the entire dataset

This prevents duplicate content from being re-processed and re-embedded when uploading documents with repeated content, improving efficiency and reducing unnecessary compute costs.
2025-09-20 06:28:14 +08:00
..
2025-09-18 12:49:10 +08:00
2025-09-02 19:13:43 +08:00
2025-09-18 12:49:10 +08:00
2025-09-18 12:49:10 +08:00
2025-09-18 12:49:10 +08:00
2025-09-18 12:49:10 +08:00
2025-09-18 12:49:10 +08:00