dify/models at 626e71cb3b82d43b1c40d64b5341124b0c1e92e6 - dify

mirror of https://github.com/langgenius/dify.git synced 2026-05-29 21:27:54 +08:00

Files

Frederick2313072 626e71cb3b feat: implement content-based deduplication for document segments

- Add database index on (dataset_id, index_node_hash) for efficient deduplication queries
- Add deduplication check in SegmentService.create_segment and multi_create_segment methods
- Add deduplication check in DatasetDocumentStore.add_documents method to prevent duplicate embedding processing
- Skip creating segments with identical content hashes across the entire dataset

This prevents duplicate content from being re-processed and re-embedded when uploading documents with repeated content, improving efficiency and reducing unnecessary compute costs.

2025-09-20 06:28:14 +08:00

__init__.py

feat: knowledge pipeline (#25360 )

2025-09-18 12:49:10 +08:00

_workflow_exc.py

feat: Persist Variables for Enhanced Debugging Workflow (#20699 )

2025-06-24 09:05:29 +08:00

account.py

chore: add ast-grep rule to convert Optional[T] to T | None (#25560 )

2025-09-15 13:06:33 +08:00

api_based_extension.py

replace db with sa to get typing support (#23240 )