diff --git a/README.md b/README.md index ca09adec08..ec399e49ee 100644 --- a/README.md +++ b/README.md @@ -226,6 +226,11 @@ Deploy Dify to AWS with [CDK](https://aws.amazon.com/cdk/) - [AWS CDK by @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Using Alibaba Cloud Computing Nest + +Quickly deploy Dify to Alibaba cloud with [Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## Contributing For those who'd like to contribute code, see our [Contribution Guide](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). diff --git a/README_AR.md b/README_AR.md index df288fd33c..5214da4894 100644 --- a/README_AR.md +++ b/README_AR.md @@ -209,6 +209,9 @@ docker compose up -d - [AWS CDK بواسطة @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### استخدام Alibaba Cloud للنشر + [بسرعة نشر Dify إلى سحابة علي بابا مع عش الحوسبة السحابية علي بابا](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + ## المساهمة لأولئك الذين يرغبون في المساهمة، انظر إلى [دليل المساهمة](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md) لدينا. diff --git a/README_BN.md b/README_BN.md index 4a5b5f3928..1911f186d7 100644 --- a/README_BN.md +++ b/README_BN.md @@ -225,6 +225,11 @@ GitHub-এ ডিফাইকে স্টার দিয়ে রাখুন - [AWS CDK by @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Alibaba Cloud ব্যবহার করে ডিপ্লয় + + [Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## Contributing যারা কোড অবদান রাখতে চান, তাদের জন্য আমাদের [অবদান নির্দেশিকা] দেখুন (https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md)। diff --git a/README_CN.md b/README_CN.md index ba7ee0006d..a194b01937 100644 --- a/README_CN.md +++ b/README_CN.md @@ -221,6 +221,11 @@ docker compose up -d ##### AWS - [AWS CDK by @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### 使用 阿里云计算巢 部署 + +使用 [阿里云计算巢](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) 将 Dify 一键部署到 阿里云 + + ## Star History [![Star History Chart](https://api.star-history.com/svg?repos=langgenius/dify&type=Date)](https://star-history.com/#langgenius/dify&Date) diff --git a/README_DE.md b/README_DE.md index f6023a3935..fd550a5b96 100644 --- a/README_DE.md +++ b/README_DE.md @@ -221,6 +221,11 @@ Bereitstellung von Dify auf AWS mit [CDK](https://aws.amazon.com/cdk/) ##### AWS - [AWS CDK by @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Alibaba Cloud + +[Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## Contributing Falls Sie Code beitragen möchten, lesen Sie bitte unseren [Contribution Guide](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). Gleichzeitig bitten wir Sie, Dify zu unterstützen, indem Sie es in den sozialen Medien teilen und auf Veranstaltungen und Konferenzen präsentieren. diff --git a/README_ES.md b/README_ES.md index 12f2ce8c11..38dea09be1 100644 --- a/README_ES.md +++ b/README_ES.md @@ -221,6 +221,10 @@ Despliegue Dify en AWS usando [CDK](https://aws.amazon.com/cdk/) ##### AWS - [AWS CDK por @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Alibaba Cloud + +[Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + ## Contribuir Para aquellos que deseen contribuir con código, consulten nuestra [Guía de contribución](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). diff --git a/README_FR.md b/README_FR.md index b106615b31..925918e47e 100644 --- a/README_FR.md +++ b/README_FR.md @@ -219,6 +219,11 @@ Déployez Dify sur AWS en utilisant [CDK](https://aws.amazon.com/cdk/) ##### AWS - [AWS CDK par @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Alibaba Cloud + +[Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## Contribuer Pour ceux qui souhaitent contribuer du code, consultez notre [Guide de contribution](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). diff --git a/README_JA.md b/README_JA.md index 26703f3958..3f8a5b859d 100644 --- a/README_JA.md +++ b/README_JA.md @@ -220,6 +220,10 @@ docker compose up -d ##### AWS - [@KevinZhaoによるAWS CDK](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Alibaba Cloud +[Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## 貢献 コードに貢献したい方は、[Contribution Guide](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md)を参照してください。 diff --git a/README_KL.md b/README_KL.md index ea91baa5aa..9e562a4d73 100644 --- a/README_KL.md +++ b/README_KL.md @@ -219,6 +219,11 @@ wa'logh nIqHom neH ghun deployment toy'wI' [CDK](https://aws.amazon.com/cdk/) lo ##### AWS - [AWS CDK qachlot @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Alibaba Cloud + +[Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## Contributing For those who'd like to contribute code, see our [Contribution Guide](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). diff --git a/README_KR.md b/README_KR.md index 89301e8b2c..683b3a86f4 100644 --- a/README_KR.md +++ b/README_KR.md @@ -213,6 +213,11 @@ Dify를 Kubernetes에 배포하고 프리미엄 스케일링 설정을 구성했 ##### AWS - [KevinZhao의 AWS CDK](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Alibaba Cloud + +[Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## 기여 코드에 기여하고 싶은 분들은 [기여 가이드](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md)를 참조하세요. diff --git a/README_PT.md b/README_PT.md index 157772d528..b81127b70b 100644 --- a/README_PT.md +++ b/README_PT.md @@ -218,6 +218,11 @@ Implante o Dify na AWS usando [CDK](https://aws.amazon.com/cdk/) ##### AWS - [AWS CDK por @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Alibaba Cloud + +[Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## Contribuindo Para aqueles que desejam contribuir com código, veja nosso [Guia de Contribuição](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md). diff --git a/README_SI.md b/README_SI.md index 14de1ea792..7034233233 100644 --- a/README_SI.md +++ b/README_SI.md @@ -219,6 +219,11 @@ Uvedite Dify v AWS z uporabo [CDK](https://aws.amazon.com/cdk/) ##### AWS - [AWS CDK by @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Alibaba Cloud + +[Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## Prispevam Za tiste, ki bi radi prispevali kodo, si oglejte naš vodnik za prispevke . Hkrati vas prosimo, da podprete Dify tako, da ga delite na družbenih medijih ter na dogodkih in konferencah. diff --git a/README_TR.md b/README_TR.md index 563a05af3c..51156933d4 100644 --- a/README_TR.md +++ b/README_TR.md @@ -212,6 +212,11 @@ Dify'ı bulut platformuna tek tıklamayla dağıtın [terraform](https://www.ter ##### AWS - [AWS CDK tarafından @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### Alibaba Cloud + +[Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## Katkıda Bulunma Kod katkısında bulunmak isteyenler için [Katkı Kılavuzumuza](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md) bakabilirsiniz. diff --git a/README_TW.md b/README_TW.md index f4a76ac109..291da28825 100644 --- a/README_TW.md +++ b/README_TW.md @@ -224,6 +224,11 @@ Dify 的所有功能都提供相應的 API,因此您可以輕鬆地將 Dify - [由 @KevinZhao 提供的 AWS CDK](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) +#### 使用 阿里云计算巢進行部署 + +[阿里云](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## 貢獻 對於想要貢獻程式碼的開發者,請參閱我們的[貢獻指南](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md)。 diff --git a/README_VI.md b/README_VI.md index 4e1e05cbf3..51a2e9e9e6 100644 --- a/README_VI.md +++ b/README_VI.md @@ -214,6 +214,12 @@ Triển khai Dify trên AWS bằng [CDK](https://aws.amazon.com/cdk/) ##### AWS - [AWS CDK bởi @KevinZhao](https://github.com/aws-samples/solution-for-deploying-dify-on-aws) + +#### Alibaba Cloud + +[Alibaba Cloud Computing Nest](https://computenest.console.aliyun.com/service/instance/create/default?type=user&ServiceName=Dify%E7%A4%BE%E5%8C%BA%E7%89%88) + + ## Đóng góp Đối với những người muốn đóng góp mã, xem [Hướng dẫn Đóng góp](https://github.com/langgenius/dify/blob/main/CONTRIBUTING.md) của chúng tôi. diff --git a/api/controllers/console/admin.py b/api/controllers/console/admin.py index 8cb7ad9f5b..f5257fae79 100644 --- a/api/controllers/console/admin.py +++ b/api/controllers/console/admin.py @@ -56,8 +56,7 @@ class InsertExploreAppListApi(Resource): parser.add_argument("position", type=int, required=True, nullable=False, location="json") args = parser.parse_args() - with Session(db.engine) as session: - app = session.execute(select(App).filter(App.id == args["app_id"])).scalar_one_or_none() + app = db.session.execute(select(App).filter(App.id == args["app_id"])).scalar_one_or_none() if not app: raise NotFound(f"App '{args['app_id']}' is not found") @@ -78,38 +77,38 @@ class InsertExploreAppListApi(Resource): select(RecommendedApp).filter(RecommendedApp.app_id == args["app_id"]) ).scalar_one_or_none() - if not recommended_app: - recommended_app = RecommendedApp( - app_id=app.id, - description=desc, - copyright=copy_right, - privacy_policy=privacy_policy, - custom_disclaimer=custom_disclaimer, - language=args["language"], - category=args["category"], - position=args["position"], - ) + if not recommended_app: + recommended_app = RecommendedApp( + app_id=app.id, + description=desc, + copyright=copy_right, + privacy_policy=privacy_policy, + custom_disclaimer=custom_disclaimer, + language=args["language"], + category=args["category"], + position=args["position"], + ) - db.session.add(recommended_app) + db.session.add(recommended_app) - app.is_public = True - db.session.commit() + app.is_public = True + db.session.commit() - return {"result": "success"}, 201 - else: - recommended_app.description = desc - recommended_app.copyright = copy_right - recommended_app.privacy_policy = privacy_policy - recommended_app.custom_disclaimer = custom_disclaimer - recommended_app.language = args["language"] - recommended_app.category = args["category"] - recommended_app.position = args["position"] + return {"result": "success"}, 201 + else: + recommended_app.description = desc + recommended_app.copyright = copy_right + recommended_app.privacy_policy = privacy_policy + recommended_app.custom_disclaimer = custom_disclaimer + recommended_app.language = args["language"] + recommended_app.category = args["category"] + recommended_app.position = args["position"] - app.is_public = True + app.is_public = True - db.session.commit() + db.session.commit() - return {"result": "success"}, 200 + return {"result": "success"}, 200 class InsertExploreAppApi(Resource): diff --git a/api/controllers/console/datasets/datasets_document.py b/api/controllers/console/datasets/datasets_document.py index f7c04102a9..7ac60a0dc2 100644 --- a/api/controllers/console/datasets/datasets_document.py +++ b/api/controllers/console/datasets/datasets_document.py @@ -43,7 +43,6 @@ from core.model_runtime.errors.invoke import InvokeAuthorizationError from core.plugin.impl.exc import PluginDaemonClientSideError from core.rag.extractor.entity.extract_setting import ExtractSetting from extensions.ext_database import db -from extensions.ext_redis import redis_client from fields.document_fields import ( dataset_and_document_fields, document_fields, @@ -54,8 +53,6 @@ from libs.login import login_required from models import Dataset, DatasetProcessRule, Document, DocumentSegment, UploadFile from services.dataset_service import DatasetService, DocumentService from services.entities.knowledge_entities.knowledge_entities import KnowledgeConfig -from tasks.add_document_to_index_task import add_document_to_index_task -from tasks.remove_document_from_index_task import remove_document_from_index_task class DocumentResource(Resource): @@ -862,77 +859,16 @@ class DocumentStatusApi(DocumentResource): DatasetService.check_dataset_permission(dataset, current_user) document_ids = request.args.getlist("document_id") - for document_id in document_ids: - document = self.get_document(dataset_id, document_id) - indexing_cache_key = "document_{}_indexing".format(document.id) - cache_result = redis_client.get(indexing_cache_key) - if cache_result is not None: - raise InvalidActionError(f"Document:{document.name} is being indexed, please try again later") + try: + DocumentService.batch_update_document_status(dataset, document_ids, action, current_user) + except services.errors.document.DocumentIndexingError as e: + raise InvalidActionError(str(e)) + except ValueError as e: + raise InvalidActionError(str(e)) + except NotFound as e: + raise NotFound(str(e)) - if action == "enable": - if document.enabled: - continue - document.enabled = True - document.disabled_at = None - document.disabled_by = None - document.updated_at = datetime.now(UTC).replace(tzinfo=None) - db.session.commit() - - # Set cache to prevent indexing the same document multiple times - redis_client.setex(indexing_cache_key, 600, 1) - - add_document_to_index_task.delay(document_id) - - elif action == "disable": - if not document.completed_at or document.indexing_status != "completed": - raise InvalidActionError(f"Document: {document.name} is not completed.") - if not document.enabled: - continue - - document.enabled = False - document.disabled_at = datetime.now(UTC).replace(tzinfo=None) - document.disabled_by = current_user.id - document.updated_at = datetime.now(UTC).replace(tzinfo=None) - db.session.commit() - - # Set cache to prevent indexing the same document multiple times - redis_client.setex(indexing_cache_key, 600, 1) - - remove_document_from_index_task.delay(document_id) - - elif action == "archive": - if document.archived: - continue - - document.archived = True - document.archived_at = datetime.now(UTC).replace(tzinfo=None) - document.archived_by = current_user.id - document.updated_at = datetime.now(UTC).replace(tzinfo=None) - db.session.commit() - - if document.enabled: - # Set cache to prevent indexing the same document multiple times - redis_client.setex(indexing_cache_key, 600, 1) - - remove_document_from_index_task.delay(document_id) - - elif action == "un_archive": - if not document.archived: - continue - document.archived = False - document.archived_at = None - document.archived_by = None - document.updated_at = datetime.now(UTC).replace(tzinfo=None) - db.session.commit() - - # Set cache to prevent indexing the same document multiple times - redis_client.setex(indexing_cache_key, 600, 1) - - add_document_to_index_task.delay(document_id) - - else: - raise InvalidActionError() return {"result": "success"}, 200 diff --git a/api/controllers/service_api/dataset/dataset.py b/api/controllers/service_api/dataset/dataset.py index 1467dfb6b3..839afdb9fd 100644 --- a/api/controllers/service_api/dataset/dataset.py +++ b/api/controllers/service_api/dataset/dataset.py @@ -4,7 +4,7 @@ from werkzeug.exceptions import Forbidden, NotFound import services.dataset_service from controllers.service_api import api -from controllers.service_api.dataset.error import DatasetInUseError, DatasetNameDuplicateError +from controllers.service_api.dataset.error import DatasetInUseError, DatasetNameDuplicateError, InvalidActionError from controllers.service_api.wraps import ( DatasetApiResource, cloud_edition_billing_rate_limit_check, @@ -17,7 +17,7 @@ from fields.dataset_fields import dataset_detail_fields from fields.tag_fields import tag_fields from libs.login import current_user from models.dataset import Dataset, DatasetPermissionEnum -from services.dataset_service import DatasetPermissionService, DatasetService +from services.dataset_service import DatasetPermissionService, DatasetService, DocumentService from services.entities.knowledge_entities.knowledge_entities import RetrievalModel from services.tag_service import TagService @@ -329,6 +329,56 @@ class DatasetApi(DatasetApiResource): raise DatasetInUseError() +class DocumentStatusApi(DatasetApiResource): + """Resource for batch document status operations.""" + + def patch(self, tenant_id, dataset_id, action): + """ + Batch update document status. + + Args: + tenant_id: tenant id + dataset_id: dataset id + action: action to perform (enable, disable, archive, un_archive) + + Returns: + dict: A dictionary with a key 'result' and a value 'success' + int: HTTP status code 200 indicating that the operation was successful. + + Raises: + NotFound: If the dataset with the given ID does not exist. + Forbidden: If the user does not have permission. + InvalidActionError: If the action is invalid or cannot be performed. + """ + dataset_id_str = str(dataset_id) + dataset = DatasetService.get_dataset(dataset_id_str) + + if dataset is None: + raise NotFound("Dataset not found.") + + # Check user's permission + try: + DatasetService.check_dataset_permission(dataset, current_user) + except services.errors.account.NoPermissionError as e: + raise Forbidden(str(e)) + + # Check dataset model setting + DatasetService.check_dataset_model_setting(dataset) + + # Get document IDs from request body + data = request.get_json() + document_ids = data.get("document_ids", []) + + try: + DocumentService.batch_update_document_status(dataset, document_ids, action, current_user) + except services.errors.document.DocumentIndexingError as e: + raise InvalidActionError(str(e)) + except ValueError as e: + raise InvalidActionError(str(e)) + + return {"result": "success"}, 200 + + class DatasetTagsApi(DatasetApiResource): @validate_dataset_token @marshal_with(tag_fields) @@ -457,6 +507,7 @@ class DatasetTagsBindingStatusApi(DatasetApiResource): api.add_resource(DatasetListApi, "/datasets") api.add_resource(DatasetApi, "/datasets/") +api.add_resource(DocumentStatusApi, "/datasets//documents/status/") api.add_resource(DatasetTagsApi, "/datasets/tags") api.add_resource(DatasetTagBindingApi, "/datasets/tags/binding") api.add_resource(DatasetTagUnbindingApi, "/datasets/tags/unbinding") diff --git a/api/core/rag/extractor/markdown_extractor.py b/api/core/rag/extractor/markdown_extractor.py index 849852ac23..c97765b1dc 100644 --- a/api/core/rag/extractor/markdown_extractor.py +++ b/api/core/rag/extractor/markdown_extractor.py @@ -68,22 +68,17 @@ class MarkdownExtractor(BaseExtractor): continue header_match = re.match(r"^#+\s", line) if header_match: - if current_header is not None: - markdown_tups.append((current_header, current_text)) - + markdown_tups.append((current_header, current_text)) current_header = line current_text = "" else: current_text += line + "\n" markdown_tups.append((current_header, current_text)) - if current_header is not None: - # pass linting, assert keys are defined - markdown_tups = [ - (re.sub(r"#", "", cast(str, key)).strip(), re.sub(r"<.*?>", "", value)) for key, value in markdown_tups - ] - else: - markdown_tups = [(key, re.sub("\n", "", value)) for key, value in markdown_tups] + markdown_tups = [ + (re.sub(r"#", "", cast(str, key)).strip() if key else None, re.sub(r"<.*?>", "", value)) + for key, value in markdown_tups + ] return markdown_tups diff --git a/api/core/workflow/nodes/variable_assigner/v2/constants.py b/api/core/workflow/nodes/variable_assigner/v2/constants.py index 7d7922abd4..3797bfa77a 100644 --- a/api/core/workflow/nodes/variable_assigner/v2/constants.py +++ b/api/core/workflow/nodes/variable_assigner/v2/constants.py @@ -8,5 +8,4 @@ EMPTY_VALUE_MAPPING = { SegmentType.ARRAY_STRING: [], SegmentType.ARRAY_NUMBER: [], SegmentType.ARRAY_OBJECT: [], - SegmentType.ARRAY_FILE: [], } diff --git a/api/core/workflow/nodes/variable_assigner/v2/helpers.py b/api/core/workflow/nodes/variable_assigner/v2/helpers.py index f33f406145..8fb2a27388 100644 --- a/api/core/workflow/nodes/variable_assigner/v2/helpers.py +++ b/api/core/workflow/nodes/variable_assigner/v2/helpers.py @@ -1,6 +1,5 @@ from typing import Any -from core.file import File from core.variables import SegmentType from .enums import Operation @@ -86,8 +85,6 @@ def is_input_value_valid(*, variable_type: SegmentType, operation: Operation, va return isinstance(value, int | float) case SegmentType.ARRAY_OBJECT if operation == Operation.APPEND: return isinstance(value, dict) - case SegmentType.ARRAY_FILE if operation == Operation.APPEND: - return isinstance(value, File) # Array & Extend / Overwrite case SegmentType.ARRAY_ANY if operation in {Operation.EXTEND, Operation.OVER_WRITE}: @@ -98,8 +95,6 @@ def is_input_value_valid(*, variable_type: SegmentType, operation: Operation, va return isinstance(value, list) and all(isinstance(item, int | float) for item in value) case SegmentType.ARRAY_OBJECT if operation in {Operation.EXTEND, Operation.OVER_WRITE}: return isinstance(value, list) and all(isinstance(item, dict) for item in value) - case SegmentType.ARRAY_FILE if operation in {Operation.EXTEND, Operation.OVER_WRITE}: - return isinstance(value, list) and all(isinstance(item, File) for item in value) case _: return False diff --git a/api/factories/variable_factory.py b/api/factories/variable_factory.py index 2badb7f502..a41ef4ae4e 100644 --- a/api/factories/variable_factory.py +++ b/api/factories/variable_factory.py @@ -101,8 +101,6 @@ def _build_variable_from_mapping(*, mapping: Mapping[str, Any], selector: Sequen result = ArrayNumberVariable.model_validate(mapping) case SegmentType.ARRAY_OBJECT if isinstance(value, list): result = ArrayObjectVariable.model_validate(mapping) - case SegmentType.ARRAY_FILE if isinstance(value, list): - result = ArrayFileVariable.model_validate(mapping) case _: raise VariableError(f"not supported value type {value_type}") if result.size > dify_config.MAX_VARIABLE_SIZE: diff --git a/api/services/dataset_service.py b/api/services/dataset_service.py index 747c3d6fe6..49ca98624a 100644 --- a/api/services/dataset_service.py +++ b/api/services/dataset_service.py @@ -59,6 +59,7 @@ from services.external_knowledge_service import ExternalDatasetService from services.feature_service import FeatureModel, FeatureService from services.tag_service import TagService from services.vector_service import VectorService +from tasks.add_document_to_index_task import add_document_to_index_task from tasks.batch_clean_document_task import batch_clean_document_task from tasks.clean_notion_document_task import clean_notion_document_task from tasks.deal_dataset_vector_index_task import deal_dataset_vector_index_task @@ -70,6 +71,7 @@ from tasks.document_indexing_update_task import document_indexing_update_task from tasks.duplicate_document_indexing_task import duplicate_document_indexing_task from tasks.enable_segments_to_index_task import enable_segments_to_index_task from tasks.recover_document_indexing_task import recover_document_indexing_task +from tasks.remove_document_from_index_task import remove_document_from_index_task from tasks.retry_document_indexing_task import retry_document_indexing_task from tasks.sync_website_document_indexing_task import sync_website_document_indexing_task @@ -434,7 +436,7 @@ class DatasetService: raise ValueError(ex.description) filtered_data["updated_by"] = user.id - filtered_data["updated_at"] = datetime.datetime.now() + filtered_data["updated_at"] = datetime.datetime.now(datetime.UTC).replace(tzinfo=None) # update Retrieval model filtered_data["retrieval_model"] = data["retrieval_model"] @@ -976,12 +978,17 @@ class DocumentService: process_rule = knowledge_config.process_rule if process_rule: if process_rule.mode in ("custom", "hierarchical"): - dataset_process_rule = DatasetProcessRule( - dataset_id=dataset.id, - mode=process_rule.mode, - rules=process_rule.rules.model_dump_json() if process_rule.rules else None, - created_by=account.id, - ) + if process_rule.rules: + dataset_process_rule = DatasetProcessRule( + dataset_id=dataset.id, + mode=process_rule.mode, + rules=process_rule.rules.model_dump_json() if process_rule.rules else None, + created_by=account.id, + ) + else: + dataset_process_rule = dataset.latest_process_rule + if not dataset_process_rule: + raise ValueError("No process rule found.") elif process_rule.mode == "automatic": dataset_process_rule = DatasetProcessRule( dataset_id=dataset.id, @@ -1603,6 +1610,191 @@ class DocumentService: if not isinstance(args["process_rule"]["rules"]["segmentation"]["max_tokens"], int): raise ValueError("Process rule segmentation max_tokens is invalid") + @staticmethod + def batch_update_document_status(dataset: Dataset, document_ids: list[str], action: str, user): + """ + Batch update document status. + + Args: + dataset (Dataset): The dataset object + document_ids (list[str]): List of document IDs to update + action (str): Action to perform (enable, disable, archive, un_archive) + user: Current user performing the action + + Raises: + DocumentIndexingError: If document is being indexed or not in correct state + ValueError: If action is invalid + """ + if not document_ids: + return + + # Early validation of action parameter + valid_actions = ["enable", "disable", "archive", "un_archive"] + if action not in valid_actions: + raise ValueError(f"Invalid action: {action}. Must be one of {valid_actions}") + + documents_to_update = [] + + # First pass: validate all documents and prepare updates + for document_id in document_ids: + document = DocumentService.get_document(dataset.id, document_id) + if not document: + continue + + # Check if document is being indexed + indexing_cache_key = f"document_{document.id}_indexing" + cache_result = redis_client.get(indexing_cache_key) + if cache_result is not None: + raise DocumentIndexingError(f"Document:{document.name} is being indexed, please try again later") + + # Prepare update based on action + update_info = DocumentService._prepare_document_status_update(document, action, user) + if update_info: + documents_to_update.append(update_info) + + # Second pass: apply all updates in a single transaction + if documents_to_update: + try: + for update_info in documents_to_update: + document = update_info["document"] + updates = update_info["updates"] + + # Apply updates to the document + for field, value in updates.items(): + setattr(document, field, value) + + db.session.add(document) + + # Batch commit all changes + db.session.commit() + except Exception as e: + # Rollback on any error + db.session.rollback() + raise e + # Execute async tasks and set Redis cache after successful commit + # propagation_error is used to capture any errors for submitting async task execution + propagation_error = None + for update_info in documents_to_update: + try: + # Execute async tasks after successful commit + if update_info["async_task"]: + task_info = update_info["async_task"] + task_func = task_info["function"] + task_args = task_info["args"] + task_func.delay(*task_args) + except Exception as e: + # Log the error but do not rollback the transaction + logging.exception(f"Error executing async task for document {update_info['document'].id}") + # don't raise the error immediately, but capture it for later + propagation_error = e + try: + # Set Redis cache if needed after successful commit + if update_info["set_cache"]: + document = update_info["document"] + indexing_cache_key = f"document_{document.id}_indexing" + redis_client.setex(indexing_cache_key, 600, 1) + except Exception as e: + # Log the error but do not rollback the transaction + logging.exception(f"Error setting cache for document {update_info['document'].id}") + # Raise any propagation error after all updates + if propagation_error: + raise propagation_error + + @staticmethod + def _prepare_document_status_update(document, action: str, user): + """ + Prepare document status update information. + + Args: + document: Document object to update + action: Action to perform + user: Current user + + Returns: + dict: Update information or None if no update needed + """ + now = datetime.datetime.now(datetime.UTC).replace(tzinfo=None) + + if action == "enable": + return DocumentService._prepare_enable_update(document, now) + elif action == "disable": + return DocumentService._prepare_disable_update(document, user, now) + elif action == "archive": + return DocumentService._prepare_archive_update(document, user, now) + elif action == "un_archive": + return DocumentService._prepare_unarchive_update(document, now) + + return None + + @staticmethod + def _prepare_enable_update(document, now): + """Prepare updates for enabling a document.""" + if document.enabled: + return None + + return { + "document": document, + "updates": {"enabled": True, "disabled_at": None, "disabled_by": None, "updated_at": now}, + "async_task": {"function": add_document_to_index_task, "args": [document.id]}, + "set_cache": True, + } + + @staticmethod + def _prepare_disable_update(document, user, now): + """Prepare updates for disabling a document.""" + if not document.completed_at or document.indexing_status != "completed": + raise DocumentIndexingError(f"Document: {document.name} is not completed.") + + if not document.enabled: + return None + + return { + "document": document, + "updates": {"enabled": False, "disabled_at": now, "disabled_by": user.id, "updated_at": now}, + "async_task": {"function": remove_document_from_index_task, "args": [document.id]}, + "set_cache": True, + } + + @staticmethod + def _prepare_archive_update(document, user, now): + """Prepare updates for archiving a document.""" + if document.archived: + return None + + update_info = { + "document": document, + "updates": {"archived": True, "archived_at": now, "archived_by": user.id, "updated_at": now}, + "async_task": None, + "set_cache": False, + } + + # Only set async task and cache if document is currently enabled + if document.enabled: + update_info["async_task"] = {"function": remove_document_from_index_task, "args": [document.id]} + update_info["set_cache"] = True + + return update_info + + @staticmethod + def _prepare_unarchive_update(document, now): + """Prepare updates for unarchiving a document.""" + if not document.archived: + return None + + update_info = { + "document": document, + "updates": {"archived": False, "archived_at": None, "archived_by": None, "updated_at": now}, + "async_task": None, + "set_cache": False, + } + + # Only re-index if the document is currently enabled + if document.enabled: + update_info["async_task"] = {"function": add_document_to_index_task, "args": [document.id]} + update_info["set_cache"] = True + + return update_info + class SegmentService: @classmethod diff --git a/api/services/plugin/data_migration.py b/api/services/plugin/data_migration.py index 02de5a79d7..5324036414 100644 --- a/api/services/plugin/data_migration.py +++ b/api/services/plugin/data_migration.py @@ -22,7 +22,7 @@ class PluginDataMigration: cls.migrate_datasets() cls.migrate_db_records("embeddings", "provider_name", ModelProviderID) # large table cls.migrate_db_records("dataset_collection_bindings", "provider_name", ModelProviderID) - cls.migrate_db_records("tool_builtin_providers", "provider_name", ToolProviderID) + cls.migrate_db_records("tool_builtin_providers", "provider", ToolProviderID) @classmethod def migrate_datasets(cls) -> None: diff --git a/api/tests/unit_tests/conftest.py b/api/tests/unit_tests/conftest.py index e09acc4c39..077ffe3408 100644 --- a/api/tests/unit_tests/conftest.py +++ b/api/tests/unit_tests/conftest.py @@ -1,4 +1,5 @@ import os +from unittest.mock import MagicMock, patch import pytest from flask import Flask @@ -11,6 +12,24 @@ PROJECT_DIR = os.path.abspath(os.path.join(ABS_PATH, os.pardir, os.pardir)) CACHED_APP = Flask(__name__) +# set global mock for Redis client +redis_mock = MagicMock() +redis_mock.get = MagicMock(return_value=None) +redis_mock.setex = MagicMock() +redis_mock.setnx = MagicMock() +redis_mock.delete = MagicMock() +redis_mock.lock = MagicMock() +redis_mock.exists = MagicMock(return_value=False) +redis_mock.set = MagicMock() +redis_mock.expire = MagicMock() +redis_mock.hgetall = MagicMock(return_value={}) +redis_mock.hdel = MagicMock() +redis_mock.incr = MagicMock(return_value=1) + +# apply the mock to the Redis client in the Flask app +redis_patcher = patch("extensions.ext_redis.redis_client", redis_mock) +redis_patcher.start() + @pytest.fixture def app() -> Flask: @@ -21,3 +40,19 @@ def app() -> Flask: def _provide_app_context(app: Flask): with app.app_context(): yield + + +@pytest.fixture(autouse=True) +def reset_redis_mock(): + """reset the Redis mock before each test""" + redis_mock.reset_mock() + redis_mock.get.return_value = None + redis_mock.setex.return_value = None + redis_mock.setnx.return_value = None + redis_mock.delete.return_value = None + redis_mock.exists.return_value = False + redis_mock.set.return_value = None + redis_mock.expire.return_value = None + redis_mock.hgetall.return_value = {} + redis_mock.hdel.return_value = None + redis_mock.incr.return_value = 1 diff --git a/api/tests/unit_tests/core/rag/extractor/test_markdown_extractor.py b/api/tests/unit_tests/core/rag/extractor/test_markdown_extractor.py new file mode 100644 index 0000000000..d4cf534c56 --- /dev/null +++ b/api/tests/unit_tests/core/rag/extractor/test_markdown_extractor.py @@ -0,0 +1,22 @@ +from core.rag.extractor.markdown_extractor import MarkdownExtractor + + +def test_markdown_to_tups(): + markdown = """ +this is some text without header + +# title 1 +this is balabala text + +## title 2 +this is more specific text. + """ + extractor = MarkdownExtractor(file_path="dummy_path") + updated_output = extractor.markdown_to_tups(markdown) + assert len(updated_output) == 3 + key, header_value = updated_output[0] + assert key == None + assert header_value.strip() == "this is some text without header" + title_1, value = updated_output[1] + assert title_1.strip() == "title 1" + assert value.strip() == "this is balabala text" diff --git a/api/tests/unit_tests/services/test_dataset_service.py b/api/tests/unit_tests/services/test_dataset_service.py new file mode 100644 index 0000000000..f22500cfe4 --- /dev/null +++ b/api/tests/unit_tests/services/test_dataset_service.py @@ -0,0 +1,1238 @@ +import datetime +import unittest + +# Mock redis_client before importing dataset_service +from unittest.mock import Mock, call, patch + +import pytest + +from models.dataset import Dataset, Document +from services.dataset_service import DocumentService +from services.errors.document import DocumentIndexingError +from tests.unit_tests.conftest import redis_mock + + +class TestDatasetServiceBatchUpdateDocumentStatus(unittest.TestCase): + """ + Comprehensive unit tests for DocumentService.batch_update_document_status method. + + This test suite covers all supported actions (enable, disable, archive, un_archive), + error conditions, edge cases, and validates proper interaction with Redis cache, + database operations, and async task triggers. + """ + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.add_document_to_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_enable_documents_success(self, mock_datetime, mock_get_doc, mock_add_task, mock_db): + """ + Test successful enabling of disabled documents. + + Verifies that: + 1. Only disabled documents are processed (already enabled documents are skipped) + 2. Document attributes are updated correctly (enabled=True, metadata cleared) + 3. Database changes are committed for each document + 4. Redis cache keys are set to prevent concurrent indexing + 5. Async indexing task is triggered for each enabled document + 6. Timestamp fields are properly updated + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock disabled document + mock_disabled_doc_1 = Mock(spec=Document) + mock_disabled_doc_1.id = "doc-1" + mock_disabled_doc_1.name = "disabled_document.pdf" + mock_disabled_doc_1.enabled = False + mock_disabled_doc_1.archived = False + mock_disabled_doc_1.indexing_status = "completed" + mock_disabled_doc_1.completed_at = datetime.datetime.now() + + mock_disabled_doc_2 = Mock(spec=Document) + mock_disabled_doc_2.id = "doc-2" + mock_disabled_doc_2.name = "disabled_document.pdf" + mock_disabled_doc_2.enabled = False + mock_disabled_doc_2.archived = False + mock_disabled_doc_2.indexing_status = "completed" + mock_disabled_doc_2.completed_at = datetime.datetime.now() + + # Set up mock return values + current_time = datetime.datetime(2023, 1, 1, 12, 0, 0) + mock_datetime.datetime.now.return_value = current_time + mock_datetime.UTC = datetime.UTC + + # Mock document retrieval to return disabled documents + mock_get_doc.side_effect = [mock_disabled_doc_1, mock_disabled_doc_2] + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Call the method to enable documents + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1", "doc-2"], action="enable", user=mock_user + ) + + # Verify document attributes were updated correctly + for mock_doc in [mock_disabled_doc_1, mock_disabled_doc_2]: + # Check that document was enabled + assert mock_doc.enabled == True + # Check that disable metadata was cleared + assert mock_doc.disabled_at is None + assert mock_doc.disabled_by is None + # Check that update timestamp was set + assert mock_doc.updated_at == current_time.replace(tzinfo=None) + + # Verify Redis cache operations + expected_cache_calls = [call("document_doc-1_indexing"), call("document_doc-2_indexing")] + redis_mock.get.assert_has_calls(expected_cache_calls) + + # Verify Redis cache was set to prevent concurrent indexing (600 seconds) + expected_setex_calls = [call("document_doc-1_indexing", 600, 1), call("document_doc-2_indexing", 600, 1)] + redis_mock.setex.assert_has_calls(expected_setex_calls) + + # Verify async tasks were triggered for indexing + expected_task_calls = [call("doc-1"), call("doc-2")] + mock_add_task.delay.assert_has_calls(expected_task_calls) + + # Verify database add counts (one add for one document) + assert mock_db.add.call_count == 2 + # Verify database commits (one commit for the batch operation) + assert mock_db.commit.call_count == 1 + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.remove_document_from_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_disable_documents_success(self, mock_datetime, mock_get_doc, mock_remove_task, mock_db): + """ + Test successful disabling of enabled and completed documents. + + Verifies that: + 1. Only completed and enabled documents can be disabled + 2. Document attributes are updated correctly (enabled=False, disable metadata set) + 3. User ID is recorded in disabled_by field + 4. Database changes are committed for each document + 5. Redis cache keys are set to prevent concurrent indexing + 6. Async task is triggered to remove documents from index + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock enabled document + mock_enabled_doc_1 = Mock(spec=Document) + mock_enabled_doc_1.id = "doc-1" + mock_enabled_doc_1.name = "enabled_document.pdf" + mock_enabled_doc_1.enabled = True + mock_enabled_doc_1.archived = False + mock_enabled_doc_1.indexing_status = "completed" + mock_enabled_doc_1.completed_at = datetime.datetime.now() + + mock_enabled_doc_2 = Mock(spec=Document) + mock_enabled_doc_2.id = "doc-2" + mock_enabled_doc_2.name = "enabled_document.pdf" + mock_enabled_doc_2.enabled = True + mock_enabled_doc_2.archived = False + mock_enabled_doc_2.indexing_status = "completed" + mock_enabled_doc_2.completed_at = datetime.datetime.now() + + # Set up mock return values + current_time = datetime.datetime(2023, 1, 1, 12, 0, 0) + mock_datetime.datetime.now.return_value = current_time + mock_datetime.UTC = datetime.UTC + + # Mock document retrieval to return enabled, completed documents + mock_get_doc.side_effect = [mock_enabled_doc_1, mock_enabled_doc_2] + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Call the method to disable documents + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1", "doc-2"], action="disable", user=mock_user + ) + + # Verify document attributes were updated correctly + for mock_doc in [mock_enabled_doc_1, mock_enabled_doc_2]: + # Check that document was disabled + assert mock_doc.enabled == False + # Check that disable metadata was set correctly + assert mock_doc.disabled_at == current_time.replace(tzinfo=None) + assert mock_doc.disabled_by == mock_user.id + # Check that update timestamp was set + assert mock_doc.updated_at == current_time.replace(tzinfo=None) + + # Verify Redis cache operations for indexing prevention + expected_setex_calls = [call("document_doc-1_indexing", 600, 1), call("document_doc-2_indexing", 600, 1)] + redis_mock.setex.assert_has_calls(expected_setex_calls) + + # Verify async tasks were triggered to remove from index + expected_task_calls = [call("doc-1"), call("doc-2")] + mock_remove_task.delay.assert_has_calls(expected_task_calls) + + # Verify database add counts (one add for one document) + assert mock_db.add.call_count == 2 + # Verify database commits (totally 1 for any batch operation) + assert mock_db.commit.call_count == 1 + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.remove_document_from_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_archive_documents_success(self, mock_datetime, mock_get_doc, mock_remove_task, mock_db): + """ + Test successful archiving of unarchived documents. + + Verifies that: + 1. Only unarchived documents are processed (already archived are skipped) + 2. Document attributes are updated correctly (archived=True, archive metadata set) + 3. User ID is recorded in archived_by field + 4. If documents are enabled, they are removed from the index + 5. Redis cache keys are set only for enabled documents being archived + 6. Database changes are committed for each document + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create unarchived enabled document + unarchived_doc = Mock(spec=Document) + # Manually set attributes to ensure they can be modified + unarchived_doc.id = "doc-1" + unarchived_doc.name = "unarchived_document.pdf" + unarchived_doc.enabled = True + unarchived_doc.archived = False + + # Set up mock return values + current_time = datetime.datetime(2023, 1, 1, 12, 0, 0) + mock_datetime.datetime.now.return_value = current_time + mock_datetime.UTC = datetime.UTC + + mock_get_doc.return_value = unarchived_doc + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Call the method to archive documents + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1"], action="archive", user=mock_user + ) + + # Verify document attributes were updated correctly + assert unarchived_doc.archived == True + assert unarchived_doc.archived_at == current_time.replace(tzinfo=None) + assert unarchived_doc.archived_by == mock_user.id + assert unarchived_doc.updated_at == current_time.replace(tzinfo=None) + + # Verify Redis cache was set (because document was enabled) + redis_mock.setex.assert_called_once_with("document_doc-1_indexing", 600, 1) + + # Verify async task was triggered to remove from index (because enabled) + mock_remove_task.delay.assert_called_once_with("doc-1") + + # Verify database add + mock_db.add.assert_called_once() + # Verify database commit + mock_db.commit.assert_called_once() + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.add_document_to_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_unarchive_documents_success(self, mock_datetime, mock_get_doc, mock_add_task, mock_db): + """ + Test successful unarchiving of archived documents. + + Verifies that: + 1. Only archived documents are processed (already unarchived are skipped) + 2. Document attributes are updated correctly (archived=False, archive metadata cleared) + 3. If documents are enabled, they are added back to the index + 4. Redis cache keys are set only for enabled documents being unarchived + 5. Database changes are committed for each document + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock archived document + mock_archived_doc = Mock(spec=Document) + mock_archived_doc.id = "doc-3" + mock_archived_doc.name = "archived_document.pdf" + mock_archived_doc.enabled = True + mock_archived_doc.archived = True + mock_archived_doc.indexing_status = "completed" + mock_archived_doc.completed_at = datetime.datetime.now() + + # Set up mock return values + current_time = datetime.datetime(2023, 1, 1, 12, 0, 0) + mock_datetime.datetime.now.return_value = current_time + mock_datetime.UTC = datetime.UTC + + mock_get_doc.return_value = mock_archived_doc + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Call the method to unarchive documents + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-3"], action="un_archive", user=mock_user + ) + + # Verify document attributes were updated correctly + assert mock_archived_doc.archived == False + assert mock_archived_doc.archived_at is None + assert mock_archived_doc.archived_by is None + assert mock_archived_doc.updated_at == current_time.replace(tzinfo=None) + + # Verify Redis cache was set (because document is enabled) + redis_mock.setex.assert_called_once_with("document_doc-3_indexing", 600, 1) + + # Verify async task was triggered to add back to index (because enabled) + mock_add_task.delay.assert_called_once_with("doc-3") + + # Verify database add + mock_db.add.assert_called_once() + # Verify database commit + mock_db.commit.assert_called_once() + + @patch("services.dataset_service.DocumentService.get_document") + def test_batch_update_document_indexing_error_redis_cache_hit(self, mock_get_doc): + """ + Test that DocumentIndexingError is raised when documents are currently being indexed. + + Verifies that: + 1. The method checks Redis cache for active indexing operations + 2. DocumentIndexingError is raised if any document is being indexed + 3. Error message includes the document name for user feedback + 4. No further processing occurs when indexing is detected + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock enabled document + mock_enabled_doc = Mock(spec=Document) + mock_enabled_doc.id = "doc-1" + mock_enabled_doc.name = "enabled_document.pdf" + mock_enabled_doc.enabled = True + mock_enabled_doc.archived = False + mock_enabled_doc.indexing_status = "completed" + mock_enabled_doc.completed_at = datetime.datetime.now() + + # Set up mock to indicate document is being indexed + mock_get_doc.return_value = mock_enabled_doc + + # Reset module-level Redis mock, set to indexing status + redis_mock.reset_mock() + redis_mock.get.return_value = "indexing" + + # Verify that DocumentIndexingError is raised + with pytest.raises(DocumentIndexingError) as exc_info: + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1"], action="enable", user=mock_user + ) + + # Verify error message contains document name + assert "enabled_document.pdf" in str(exc_info.value) + assert "is being indexed" in str(exc_info.value) + + # Verify Redis cache was checked + redis_mock.get.assert_called_once_with("document_doc-1_indexing") + + @patch("services.dataset_service.DocumentService.get_document") + def test_batch_update_disable_non_completed_document_error(self, mock_get_doc): + """ + Test that DocumentIndexingError is raised when trying to disable non-completed documents. + + Verifies that: + 1. Only completed documents can be disabled + 2. DocumentIndexingError is raised for non-completed documents + 3. Error message indicates the document is not completed + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create a document that's not completed + non_completed_doc = Mock(spec=Document) + # Manually set attributes to ensure they can be modified + non_completed_doc.id = "doc-1" + non_completed_doc.name = "indexing_document.pdf" + non_completed_doc.enabled = True + non_completed_doc.indexing_status = "indexing" # Not completed + non_completed_doc.completed_at = None # Not completed + + mock_get_doc.return_value = non_completed_doc + + # Verify that DocumentIndexingError is raised + with pytest.raises(DocumentIndexingError) as exc_info: + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1"], action="disable", user=mock_user + ) + + # Verify error message indicates document is not completed + assert "is not completed" in str(exc_info.value) + + @patch("services.dataset_service.DocumentService.get_document") + def test_batch_update_empty_document_list(self, mock_get_doc): + """ + Test batch operations with an empty document ID list. + + Verifies that: + 1. The method handles empty input gracefully + 2. No document operations are performed with empty input + 3. No errors are raised with empty input + 4. Method returns early without processing + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Call method with empty document list + result = DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=[], action="enable", user=mock_user + ) + + # Verify no document lookups were performed + mock_get_doc.assert_not_called() + + # Verify method returns None (early return) + assert result is None + + @patch("services.dataset_service.DocumentService.get_document") + def test_batch_update_document_not_found_skipped(self, mock_get_doc): + """ + Test behavior when some documents don't exist in the database. + + Verifies that: + 1. Non-existent documents are gracefully skipped + 2. Processing continues for existing documents + 3. No errors are raised for missing document IDs + 4. Method completes successfully despite missing documents + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Mock document service to return None (document not found) + mock_get_doc.return_value = None + + # Call method with non-existent document ID + # This should not raise an error, just skip the missing document + try: + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["non-existent-doc"], action="enable", user=mock_user + ) + except Exception as e: + pytest.fail(f"Method should not raise exception for missing documents: {e}") + + # Verify document lookup was attempted + mock_get_doc.assert_called_once_with(mock_dataset.id, "non-existent-doc") + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.DocumentService.get_document") + def test_batch_update_enable_already_enabled_document_skipped(self, mock_get_doc, mock_db): + """ + Test enabling documents that are already enabled. + + Verifies that: + 1. Already enabled documents are skipped (no unnecessary operations) + 2. No database commits occur for already enabled documents + 3. No Redis cache operations occur for skipped documents + 4. No async tasks are triggered for skipped documents + 5. Method completes successfully + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock enabled document + mock_enabled_doc = Mock(spec=Document) + mock_enabled_doc.id = "doc-1" + mock_enabled_doc.name = "enabled_document.pdf" + mock_enabled_doc.enabled = True + mock_enabled_doc.archived = False + mock_enabled_doc.indexing_status = "completed" + mock_enabled_doc.completed_at = datetime.datetime.now() + + # Mock document that is already enabled + mock_get_doc.return_value = mock_enabled_doc # Already enabled + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Attempt to enable already enabled document + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1"], action="enable", user=mock_user + ) + + # Verify no database operations occurred (document was skipped) + mock_db.commit.assert_not_called() + + # Verify no Redis setex operations occurred (document was skipped) + redis_mock.setex.assert_not_called() + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.DocumentService.get_document") + def test_batch_update_archive_already_archived_document_skipped(self, mock_get_doc, mock_db): + """ + Test archiving documents that are already archived. + + Verifies that: + 1. Already archived documents are skipped (no unnecessary operations) + 2. No database commits occur for already archived documents + 3. No Redis cache operations occur for skipped documents + 4. No async tasks are triggered for skipped documents + 5. Method completes successfully + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock archived document + mock_archived_doc = Mock(spec=Document) + mock_archived_doc.id = "doc-3" + mock_archived_doc.name = "archived_document.pdf" + mock_archived_doc.enabled = True + mock_archived_doc.archived = True + mock_archived_doc.indexing_status = "completed" + mock_archived_doc.completed_at = datetime.datetime.now() + + # Mock document that is already archived + mock_get_doc.return_value = mock_archived_doc # Already archived + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Attempt to archive already archived document + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-3"], action="archive", user=mock_user + ) + + # Verify no database operations occurred (document was skipped) + mock_db.commit.assert_not_called() + + # Verify no Redis setex operations occurred (document was skipped) + redis_mock.setex.assert_not_called() + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.add_document_to_index_task") + @patch("services.dataset_service.remove_document_from_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_mixed_document_states_and_actions( + self, mock_datetime, mock_get_doc, mock_remove_task, mock_add_task, mock_db + ): + """ + Test batch operations on documents with mixed states and various scenarios. + + Verifies that: + 1. Each document is processed according to its current state + 2. Some documents may be skipped while others are processed + 3. Different async tasks are triggered based on document states + 4. Method handles mixed scenarios gracefully + 5. Database commits occur only for documents that were actually modified + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock documents with different states + mock_disabled_doc = Mock(spec=Document) + mock_disabled_doc.id = "doc-1" + mock_disabled_doc.name = "disabled_document.pdf" + mock_disabled_doc.enabled = False + mock_disabled_doc.archived = False + mock_disabled_doc.indexing_status = "completed" + mock_disabled_doc.completed_at = datetime.datetime.now() + + mock_enabled_doc = Mock(spec=Document) + mock_enabled_doc.id = "doc-2" + mock_enabled_doc.name = "enabled_document.pdf" + mock_enabled_doc.enabled = True + mock_enabled_doc.archived = False + mock_enabled_doc.indexing_status = "completed" + mock_enabled_doc.completed_at = datetime.datetime.now() + + mock_archived_doc = Mock(spec=Document) + mock_archived_doc.id = "doc-3" + mock_archived_doc.name = "archived_document.pdf" + mock_archived_doc.enabled = True + mock_archived_doc.archived = True + mock_archived_doc.indexing_status = "completed" + mock_archived_doc.completed_at = datetime.datetime.now() + + # Set up mixed document states + current_time = datetime.datetime(2023, 1, 1, 12, 0, 0) + mock_datetime.datetime.now.return_value = current_time + mock_datetime.UTC = datetime.UTC + + # Mix of different document states + documents = [ + mock_disabled_doc, # Will be enabled + mock_enabled_doc, # Already enabled, will be skipped + mock_archived_doc, # Archived but enabled, will be skipped for enable action + ] + + mock_get_doc.side_effect = documents + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Perform enable operation on mixed state documents + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1", "doc-2", "doc-3"], action="enable", user=mock_user + ) + + # Verify only the disabled document was processed + # (enabled and archived documents should be skipped for enable action) + + # Only one add should occur (for the disabled document that was enabled) + mock_db.add.assert_called_once() + # Only one commit should occur + mock_db.commit.assert_called_once() + + # Only one Redis setex should occur (for the document that was enabled) + redis_mock.setex.assert_called_once_with("document_doc-1_indexing", 600, 1) + + # Only one async task should be triggered (for the document that was enabled) + mock_add_task.delay.assert_called_once_with("doc-1") + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.remove_document_from_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_archive_disabled_document_no_index_removal( + self, mock_datetime, mock_get_doc, mock_remove_task, mock_db + ): + """ + Test archiving disabled documents (should not trigger index removal). + + Verifies that: + 1. Disabled documents can be archived + 2. Archive metadata is set correctly + 3. No index removal task is triggered (because document is disabled) + 4. No Redis cache key is set (because document is disabled) + 5. Database commit still occurs + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Set up disabled, unarchived document + current_time = datetime.datetime(2023, 1, 1, 12, 0, 0) + mock_datetime.datetime.now.return_value = current_time + mock_datetime.UTC = datetime.UTC + + disabled_unarchived_doc = Mock(spec=Document) + # Manually set attributes to ensure they can be modified + disabled_unarchived_doc.id = "doc-1" + disabled_unarchived_doc.name = "disabled_document.pdf" + disabled_unarchived_doc.enabled = False # Disabled + disabled_unarchived_doc.archived = False # Not archived + + mock_get_doc.return_value = disabled_unarchived_doc + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Archive the disabled document + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1"], action="archive", user=mock_user + ) + + # Verify document was archived + assert disabled_unarchived_doc.archived == True + assert disabled_unarchived_doc.archived_at == current_time.replace(tzinfo=None) + assert disabled_unarchived_doc.archived_by == mock_user.id + + # Verify no Redis cache was set (document is disabled) + redis_mock.setex.assert_not_called() + + # Verify no index removal task was triggered (document is disabled) + mock_remove_task.delay.assert_not_called() + + # Verify database add still occurred + mock_db.add.assert_called_once() + # Verify database commit still occurred + mock_db.commit.assert_called_once() + + @patch("services.dataset_service.DocumentService.get_document") + def test_batch_update_invalid_action_error(self, mock_get_doc): + """ + Test that ValueError is raised when an invalid action is provided. + + Verifies that: + 1. Invalid actions are rejected with ValueError + 2. Error message includes the invalid action name + 3. No document processing occurs with invalid actions + 4. Method fails fast on invalid input + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock document + mock_doc = Mock(spec=Document) + mock_doc.id = "doc-1" + mock_doc.name = "test_document.pdf" + mock_doc.enabled = True + mock_doc.archived = False + + mock_get_doc.return_value = mock_doc + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Test with invalid action + invalid_action = "invalid_action" + with pytest.raises(ValueError) as exc_info: + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1"], action=invalid_action, user=mock_user + ) + + # Verify error message contains the invalid action + assert invalid_action in str(exc_info.value) + assert "Invalid action" in str(exc_info.value) + + # Verify no Redis operations occurred + redis_mock.setex.assert_not_called() + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.add_document_to_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_disable_already_disabled_document_skipped( + self, mock_datetime, mock_get_doc, mock_add_task, mock_db + ): + """ + Test disabling documents that are already disabled. + + Verifies that: + 1. Already disabled documents are skipped (no unnecessary operations) + 2. No database commits occur for already disabled documents + 3. No Redis cache operations occur for skipped documents + 4. No async tasks are triggered for skipped documents + 5. Method completes successfully + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock disabled document + mock_disabled_doc = Mock(spec=Document) + mock_disabled_doc.id = "doc-1" + mock_disabled_doc.name = "disabled_document.pdf" + mock_disabled_doc.enabled = False # Already disabled + mock_disabled_doc.archived = False + mock_disabled_doc.indexing_status = "completed" + mock_disabled_doc.completed_at = datetime.datetime.now() + + # Mock document that is already disabled + mock_get_doc.return_value = mock_disabled_doc + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Attempt to disable already disabled document + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1"], action="disable", user=mock_user + ) + + # Verify no database operations occurred (document was skipped) + mock_db.commit.assert_not_called() + + # Verify no Redis setex operations occurred (document was skipped) + redis_mock.setex.assert_not_called() + + # Verify no async tasks were triggered (document was skipped) + mock_add_task.delay.assert_not_called() + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.add_document_to_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_unarchive_already_unarchived_document_skipped( + self, mock_datetime, mock_get_doc, mock_add_task, mock_db + ): + """ + Test unarchiving documents that are already unarchived. + + Verifies that: + 1. Already unarchived documents are skipped (no unnecessary operations) + 2. No database commits occur for already unarchived documents + 3. No Redis cache operations occur for skipped documents + 4. No async tasks are triggered for skipped documents + 5. Method completes successfully + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock unarchived document + mock_unarchived_doc = Mock(spec=Document) + mock_unarchived_doc.id = "doc-1" + mock_unarchived_doc.name = "unarchived_document.pdf" + mock_unarchived_doc.enabled = True + mock_unarchived_doc.archived = False # Already unarchived + mock_unarchived_doc.indexing_status = "completed" + mock_unarchived_doc.completed_at = datetime.datetime.now() + + # Mock document that is already unarchived + mock_get_doc.return_value = mock_unarchived_doc + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Attempt to unarchive already unarchived document + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1"], action="un_archive", user=mock_user + ) + + # Verify no database operations occurred (document was skipped) + mock_db.commit.assert_not_called() + + # Verify no Redis setex operations occurred (document was skipped) + redis_mock.setex.assert_not_called() + + # Verify no async tasks were triggered (document was skipped) + mock_add_task.delay.assert_not_called() + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.add_document_to_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_unarchive_disabled_document_no_index_addition( + self, mock_datetime, mock_get_doc, mock_add_task, mock_db + ): + """ + Test unarchiving disabled documents (should not trigger index addition). + + Verifies that: + 1. Disabled documents can be unarchived + 2. Unarchive metadata is cleared correctly + 3. No index addition task is triggered (because document is disabled) + 4. No Redis cache key is set (because document is disabled) + 5. Database commit still occurs + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock archived but disabled document + mock_archived_disabled_doc = Mock(spec=Document) + mock_archived_disabled_doc.id = "doc-1" + mock_archived_disabled_doc.name = "archived_disabled_document.pdf" + mock_archived_disabled_doc.enabled = False # Disabled + mock_archived_disabled_doc.archived = True # Archived + mock_archived_disabled_doc.indexing_status = "completed" + mock_archived_disabled_doc.completed_at = datetime.datetime.now() + + # Set up mock return values + current_time = datetime.datetime(2023, 1, 1, 12, 0, 0) + mock_datetime.datetime.now.return_value = current_time + mock_datetime.UTC = datetime.UTC + + mock_get_doc.return_value = mock_archived_disabled_doc + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Unarchive the disabled document + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1"], action="un_archive", user=mock_user + ) + + # Verify document was unarchived + assert mock_archived_disabled_doc.archived == False + assert mock_archived_disabled_doc.archived_at is None + assert mock_archived_disabled_doc.archived_by is None + assert mock_archived_disabled_doc.updated_at == current_time.replace(tzinfo=None) + + # Verify no Redis cache was set (document is disabled) + redis_mock.setex.assert_not_called() + + # Verify no index addition task was triggered (document is disabled) + mock_add_task.delay.assert_not_called() + + # Verify database add still occurred + mock_db.add.assert_called_once() + # Verify database commit still occurred + mock_db.commit.assert_called_once() + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.add_document_to_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_async_task_error_handling(self, mock_datetime, mock_get_doc, mock_add_task, mock_db): + """ + Test handling of async task errors during batch operations. + + Verifies that: + 1. Async task errors are properly handled + 2. Database operations complete successfully + 3. Redis cache operations complete successfully + 4. Method continues processing despite async task errors + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create mock disabled document + mock_disabled_doc = Mock(spec=Document) + mock_disabled_doc.id = "doc-1" + mock_disabled_doc.name = "disabled_document.pdf" + mock_disabled_doc.enabled = False + mock_disabled_doc.archived = False + mock_disabled_doc.indexing_status = "completed" + mock_disabled_doc.completed_at = datetime.datetime.now() + + # Set up mock return values + current_time = datetime.datetime(2023, 1, 1, 12, 0, 0) + mock_datetime.datetime.now.return_value = current_time + mock_datetime.UTC = datetime.UTC + + mock_get_doc.return_value = mock_disabled_doc + + # Mock async task to raise an exception + mock_add_task.delay.side_effect = Exception("Celery task error") + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Verify that async task error is propagated + with pytest.raises(Exception) as exc_info: + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=["doc-1"], action="enable", user=mock_user + ) + + # Verify error message + assert "Celery task error" in str(exc_info.value) + + # Verify database operations completed successfully + mock_db.add.assert_called_once() + mock_db.commit.assert_called_once() + + # Verify Redis cache was set successfully + redis_mock.setex.assert_called_once_with("document_doc-1_indexing", 600, 1) + + # Verify document was updated + assert mock_disabled_doc.enabled == True + assert mock_disabled_doc.disabled_at is None + assert mock_disabled_doc.disabled_by is None + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.add_document_to_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_large_document_list_performance(self, mock_datetime, mock_get_doc, mock_add_task, mock_db): + """ + Test batch operations with a large number of documents. + + Verifies that: + 1. Method can handle large document lists efficiently + 2. All documents are processed correctly + 3. Database commits occur for each document + 4. Redis cache operations occur for each document + 5. Async tasks are triggered for each document + 6. Performance remains consistent with large inputs + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create large list of document IDs + document_ids = [f"doc-{i}" for i in range(1, 101)] # 100 documents + + # Create mock documents + mock_documents = [] + for i in range(1, 101): + mock_doc = Mock(spec=Document) + mock_doc.id = f"doc-{i}" + mock_doc.name = f"document_{i}.pdf" + mock_doc.enabled = False # All disabled, will be enabled + mock_doc.archived = False + mock_doc.indexing_status = "completed" + mock_doc.completed_at = datetime.datetime.now() + mock_documents.append(mock_doc) + + # Set up mock return values + current_time = datetime.datetime(2023, 1, 1, 12, 0, 0) + mock_datetime.datetime.now.return_value = current_time + mock_datetime.UTC = datetime.UTC + + mock_get_doc.side_effect = mock_documents + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Perform batch enable operation + DocumentService.batch_update_document_status( + dataset=mock_dataset, document_ids=document_ids, action="enable", user=mock_user + ) + + # Verify all documents were processed + assert mock_get_doc.call_count == 100 + + # Verify all documents were updated + for mock_doc in mock_documents: + assert mock_doc.enabled == True + assert mock_doc.disabled_at is None + assert mock_doc.disabled_by is None + assert mock_doc.updated_at == current_time.replace(tzinfo=None) + + # Verify database commits, one add for one document + assert mock_db.add.call_count == 100 + # Verify database commits, one commit for the batch operation + assert mock_db.commit.call_count == 1 + + # Verify Redis cache operations occurred for each document + assert redis_mock.setex.call_count == 100 + + # Verify async tasks were triggered for each document + assert mock_add_task.delay.call_count == 100 + + # Verify correct Redis cache keys were set + expected_redis_calls = [call(f"document_doc-{i}_indexing", 600, 1) for i in range(1, 101)] + redis_mock.setex.assert_has_calls(expected_redis_calls) + + # Verify correct async task calls + expected_task_calls = [call(f"doc-{i}") for i in range(1, 101)] + mock_add_task.delay.assert_has_calls(expected_task_calls) + + @patch("extensions.ext_database.db.session") + @patch("services.dataset_service.add_document_to_index_task") + @patch("services.dataset_service.DocumentService.get_document") + @patch("services.dataset_service.datetime") + def test_batch_update_mixed_document_states_complex_scenario( + self, mock_datetime, mock_get_doc, mock_add_task, mock_db + ): + """ + Test complex batch operations with documents in various states. + + Verifies that: + 1. Each document is processed according to its current state + 2. Some documents are skipped while others are processed + 3. Different actions trigger different async tasks + 4. Database commits occur only for modified documents + 5. Redis cache operations occur only for relevant documents + 6. Method handles complex mixed scenarios correctly + """ + # Create mock dataset + mock_dataset = Mock(spec=Dataset) + mock_dataset.id = "dataset-123" + mock_dataset.tenant_id = "tenant-456" + + # Create mock user + mock_user = Mock() + mock_user.id = "user-789" + + # Create documents in various states + current_time = datetime.datetime(2023, 1, 1, 12, 0, 0) + mock_datetime.datetime.now.return_value = current_time + mock_datetime.UTC = datetime.UTC + + # Document 1: Disabled, will be enabled + doc1 = Mock(spec=Document) + doc1.id = "doc-1" + doc1.name = "disabled_doc.pdf" + doc1.enabled = False + doc1.archived = False + doc1.indexing_status = "completed" + doc1.completed_at = datetime.datetime.now() + + # Document 2: Already enabled, will be skipped + doc2 = Mock(spec=Document) + doc2.id = "doc-2" + doc2.name = "enabled_doc.pdf" + doc2.enabled = True + doc2.archived = False + doc2.indexing_status = "completed" + doc2.completed_at = datetime.datetime.now() + + # Document 3: Enabled and completed, will be disabled + doc3 = Mock(spec=Document) + doc3.id = "doc-3" + doc3.name = "enabled_completed_doc.pdf" + doc3.enabled = True + doc3.archived = False + doc3.indexing_status = "completed" + doc3.completed_at = datetime.datetime.now() + + # Document 4: Unarchived, will be archived + doc4 = Mock(spec=Document) + doc4.id = "doc-4" + doc4.name = "unarchived_doc.pdf" + doc4.enabled = True + doc4.archived = False + doc4.indexing_status = "completed" + doc4.completed_at = datetime.datetime.now() + + # Document 5: Archived, will be unarchived + doc5 = Mock(spec=Document) + doc5.id = "doc-5" + doc5.name = "archived_doc.pdf" + doc5.enabled = True + doc5.archived = True + doc5.indexing_status = "completed" + doc5.completed_at = datetime.datetime.now() + + # Document 6: Non-existent, will be skipped + doc6 = None + + mock_get_doc.side_effect = [doc1, doc2, doc3, doc4, doc5, doc6] + + # Reset module-level Redis mock + redis_mock.reset_mock() + redis_mock.get.return_value = None + + # Perform mixed batch operations + DocumentService.batch_update_document_status( + dataset=mock_dataset, + document_ids=["doc-1", "doc-2", "doc-3", "doc-4", "doc-5", "doc-6"], + action="enable", # This will only affect doc1 and doc3 (doc3 will be enabled then disabled) + user=mock_user, + ) + + # Verify document 1 was enabled + assert doc1.enabled == True + assert doc1.disabled_at is None + assert doc1.disabled_by is None + + # Verify document 2 was skipped (already enabled) + assert doc2.enabled == True # No change + + # Verify document 3 was skipped (already enabled) + assert doc3.enabled == True + + # Verify document 4 was skipped (not affected by enable action) + assert doc4.enabled == True # No change + + # Verify document 5 was skipped (not affected by enable action) + assert doc5.enabled == True # No change + + # Verify database commits occurred for processed documents + # Only doc1 should be added (doc2, doc3, doc4, doc5 were skipped, doc6 doesn't exist) + assert mock_db.add.call_count == 1 + assert mock_db.commit.call_count == 1 + + # Verify Redis cache operations occurred for processed documents + # Only doc1 should have Redis operations + assert redis_mock.setex.call_count == 1 + + # Verify async tasks were triggered for processed documents + # Only doc1 should trigger tasks + assert mock_add_task.delay.call_count == 1 + + # Verify correct Redis cache keys were set + expected_redis_calls = [call("document_doc-1_indexing", 600, 1)] + redis_mock.setex.assert_has_calls(expected_redis_calls) + + # Verify correct async task calls + expected_task_calls = [call("doc-1")] + mock_add_task.delay.assert_has_calls(expected_task_calls) diff --git a/web/.env.example b/web/.env.example index 78b4f33e8c..c30064ffed 100644 --- a/web/.env.example +++ b/web/.env.example @@ -56,3 +56,5 @@ NEXT_PUBLIC_ENABLE_WEBSITE_JINAREADER=true NEXT_PUBLIC_ENABLE_WEBSITE_FIRECRAWL=true NEXT_PUBLIC_ENABLE_WEBSITE_WATERCRAWL=true +# The maximum number of tree node depth for workflow +NEXT_PUBLIC_MAX_TREE_DEPTH=50 diff --git a/web/app/(commonLayout)/datasets/template/template.en.mdx b/web/app/(commonLayout)/datasets/template/template.en.mdx index e1ff827c96..91293768b7 100644 --- a/web/app/(commonLayout)/datasets/template/template.en.mdx +++ b/web/app/(commonLayout)/datasets/template/template.en.mdx @@ -1124,6 +1124,63 @@ import { Row, Col, Properties, Property, Heading, SubProperty, PropertyInstructi
+ + + + ### Path + + + Knowledge ID + + + - `enable` - Enable document + - `disable` - Disable document + - `archive` - Archive document + - `un_archive` - Unarchive document + + + + ### Request Body + + + List of document IDs + + + + + + ```bash {{ title: 'cURL' }} + curl --location --request PATCH '${props.apiBaseUrl}/datasets/{dataset_id}/documents/status/{action}' \ + --header 'Authorization: Bearer {api_key}' \ + --header 'Content-Type: application/json' \ + --data-raw '{ + "document_ids": ["doc-id-1", "doc-id-2"] + }' + ``` + + + + ```json {{ title: 'Response' }} + { + "result": "success" + } + ``` + + + + +
+ + + + + ### パス + + + ナレッジ ID + + + - `enable` - ドキュメントを有効化 + - `disable` - ドキュメントを無効化 + - `archive` - ドキュメントをアーカイブ + - `un_archive` - ドキュメントのアーカイブを解除 + + + + ### リクエストボディ + + + ドキュメントIDのリスト + + + + + + ```bash {{ title: 'cURL' }} + curl --location --request PATCH '${props.apiBaseUrl}/datasets/{dataset_id}/documents/status/{action}' \ + --header 'Authorization: Bearer {api_key}' \ + --header 'Content-Type: application/json' \ + --data-raw '{ + "document_ids": ["doc-id-1", "doc-id-2"] + }' + ``` + + + + ```json {{ title: 'Response' }} + { + "result": "success" + } + ``` + + + + +
+
+ diff --git a/web/app/(commonLayout)/datasets/template/template.zh.mdx b/web/app/(commonLayout)/datasets/template/template.zh.mdx index 3994356b51..d407fad3ca 100644 --- a/web/app/(commonLayout)/datasets/template/template.zh.mdx +++ b/web/app/(commonLayout)/datasets/template/template.zh.mdx @@ -1131,6 +1131,63 @@ import { Row, Col, Properties, Property, Heading, SubProperty, PropertyInstructi
+ + + + ### Path + + + 知识库 ID + + + - `enable` - 启用文档 + - `disable` - 禁用文档 + - `archive` - 归档文档 + - `un_archive` - 取消归档文档 + + + + ### Request Body + + + 文档ID列表 + + + + + + ```bash {{ title: 'cURL' }} + curl --location --request PATCH '${props.apiBaseUrl}/datasets/{dataset_id}/documents/status/{action}' \ + --header 'Authorization: Bearer {api_key}' \ + --header 'Content-Type: application/json' \ + --data-raw '{ + "document_ids": ["doc-id-1", "doc-id-2"] + }' + ``` + + + + ```json {{ title: 'Response' }} + { + "result": "success" + } + ``` + + + + +
+ = ({
- break + return case 'echarts': { // Loading state: show loading indicator if (chartState === 'loading') { @@ -428,7 +426,7 @@ const CodeBlock: any = memo(({ inline, className, children = '', ...props }: any
{languageShowName}
- {(['mermaid', 'svg']).includes(language!) && } + {language === 'svg' && } diff --git a/web/app/components/base/mermaid/index.tsx b/web/app/components/base/mermaid/index.tsx index 31eaffb813..a953ef15a8 100644 --- a/web/app/components/base/mermaid/index.tsx +++ b/web/app/components/base/mermaid/index.tsx @@ -1,5 +1,5 @@ import React, { useCallback, useEffect, useMemo, useRef, useState } from 'react' -import mermaid from 'mermaid' +import mermaid, { type MermaidConfig } from 'mermaid' import { useTranslation } from 'react-i18next' import { ExclamationTriangleIcon } from '@heroicons/react/24/outline' import { MoonIcon, SunIcon } from '@heroicons/react/24/solid' @@ -68,14 +68,13 @@ const THEMES = { const initMermaid = () => { if (typeof window !== 'undefined' && !isMermaidInitialized) { try { - mermaid.initialize({ + const config: MermaidConfig = { startOnLoad: false, fontFamily: 'sans-serif', securityLevel: 'loose', flowchart: { htmlLabels: true, useMaxWidth: true, - diagramPadding: 10, curve: 'basis', nodeSpacing: 50, rankSpacing: 70, @@ -94,10 +93,10 @@ const initMermaid = () => { mindmap: { useMaxWidth: true, padding: 10, - diagramPadding: 20, }, maxTextSize: 50000, - }) + } + mermaid.initialize(config) isMermaidInitialized = true } catch (error) { @@ -113,7 +112,7 @@ const Flowchart = React.forwardRef((props: { theme?: 'light' | 'dark' }, ref) => { const { t } = useTranslation() - const [svgCode, setSvgCode] = useState(null) + const [svgString, setSvgString] = useState(null) const [look, setLook] = useState<'classic' | 'handDrawn'>('classic') const [isInitialized, setIsInitialized] = useState(false) const [currentTheme, setCurrentTheme] = useState<'light' | 'dark'>(props.theme || 'light') @@ -125,6 +124,7 @@ const Flowchart = React.forwardRef((props: { const [imagePreviewUrl, setImagePreviewUrl] = useState('') const [isCodeComplete, setIsCodeComplete] = useState(false) const codeCompletionCheckRef = useRef() + const prevCodeRef = useRef() // Create cache key from code, style and theme const cacheKey = useMemo(() => { @@ -169,50 +169,18 @@ const Flowchart = React.forwardRef((props: { */ const handleRenderError = (error: any) => { console.error('Mermaid rendering error:', error) - const errorMsg = (error as Error).message - if (errorMsg.includes('getAttribute')) { - diagramCache.clear() - mermaid.initialize({ - startOnLoad: false, - securityLevel: 'loose', - }) + // On any render error, assume the mermaid state is corrupted and force a re-initialization. + try { + diagramCache.clear() // Clear cache to prevent using potentially corrupted SVGs + isMermaidInitialized = false // <-- THE FIX: Force re-initialization + initMermaid() // Re-initialize with the default safe configuration } - else { - setErrMsg(`Rendering chart failed, please refresh and try again ${look === 'handDrawn' ? 'Or try using classic mode' : ''}`) - } - - if (look === 'handDrawn') { - try { - // Clear possible cache issues - diagramCache.delete(`${props.PrimitiveCode}-handDrawn-${currentTheme}`) - - // Reset mermaid configuration - mermaid.initialize({ - startOnLoad: false, - securityLevel: 'loose', - theme: 'default', - maxTextSize: 50000, - }) - - // Try rendering with standard mode - setLook('classic') - setErrMsg('Hand-drawn mode is not supported for this diagram. Switched to classic mode.') - - // Delay error clearing - setTimeout(() => { - if (containerRef.current) { - // Try rendering again with standard mode, but can't call renderFlowchart directly due to circular dependency - // Instead set state to trigger re-render - setIsCodeComplete(true) // This will trigger useEffect re-render - } - }, 500) - } - catch (e) { - console.error('Reset after handDrawn error failed:', e) - } + catch (reinitError) { + console.error('Failed to re-initialize Mermaid after error:', reinitError) } + setErrMsg(`Rendering failed: ${(error as Error).message || 'Unknown error. Please check the console.'}`) setIsLoading(false) } @@ -223,51 +191,23 @@ const Flowchart = React.forwardRef((props: { setIsInitialized(true) }, []) - // Update theme when prop changes + // Update theme when prop changes, but allow internal override. + const prevThemeRef = useRef() useEffect(() => { - if (props.theme) + // Only react if the theme prop from the outside has actually changed. + if (props.theme && props.theme !== prevThemeRef.current) { + // When the global theme prop changes, it should act as the source of truth, + // overriding any local theme selection. + diagramCache.clear() + setSvgString(null) setCurrentTheme(props.theme) + // Reset look to classic for a consistent state after a global change. + setLook('classic') + } + // Update the ref to the current prop value for the next render. + prevThemeRef.current = props.theme }, [props.theme]) - // Validate mermaid code and check for completeness - useEffect(() => { - if (codeCompletionCheckRef.current) - clearTimeout(codeCompletionCheckRef.current) - - // Reset code complete status when code changes - setIsCodeComplete(false) - - // If no code or code is extremely short, don't proceed - if (!props.PrimitiveCode || props.PrimitiveCode.length < 10) - return - - // Check if code already in cache - if so we know it's valid - if (diagramCache.has(cacheKey)) { - setIsCodeComplete(true) - return - } - - // Initial check using the extracted isMermaidCodeComplete function - const isComplete = isMermaidCodeComplete(props.PrimitiveCode) - if (isComplete) { - setIsCodeComplete(true) - return - } - - // Set a delay to check again in case code is still being generated - codeCompletionCheckRef.current = setTimeout(() => { - setIsCodeComplete(isMermaidCodeComplete(props.PrimitiveCode)) - }, 300) - - return () => { - if (codeCompletionCheckRef.current) - clearTimeout(codeCompletionCheckRef.current) - } - }, [props.PrimitiveCode, cacheKey]) - - /** - * Renders flowchart based on provided code - */ const renderFlowchart = useCallback(async (primitiveCode: string) => { if (!isInitialized || !containerRef.current) { setIsLoading(false) @@ -275,15 +215,11 @@ const Flowchart = React.forwardRef((props: { return } - // Don't render if code is not complete yet - if (!isCodeComplete) { - setIsLoading(true) - return - } - // Return cached result if available + const cacheKey = `${primitiveCode}-${look}-${currentTheme}` if (diagramCache.has(cacheKey)) { - setSvgCode(diagramCache.get(cacheKey) || null) + setErrMsg('') + setSvgString(diagramCache.get(cacheKey) || null) setIsLoading(false) return } @@ -294,17 +230,45 @@ const Flowchart = React.forwardRef((props: { try { let finalCode: string - // Check if it's a gantt chart or mindmap - const isGanttChart = primitiveCode.trim().startsWith('gantt') - const isMindMap = primitiveCode.trim().startsWith('mindmap') + const trimmedCode = primitiveCode.trim() + const isGantt = trimmedCode.startsWith('gantt') + const isMindMap = trimmedCode.startsWith('mindmap') + const isSequence = trimmedCode.startsWith('sequenceDiagram') - if (isGanttChart || isMindMap) { - // For gantt charts and mindmaps, ensure each task is on its own line - // and preserve exact whitespace/format - finalCode = primitiveCode.trim() + if (isGantt || isMindMap || isSequence) { + if (isGantt) { + finalCode = trimmedCode + .split('\n') + .map((line) => { + // Gantt charts have specific syntax needs. + const taskMatch = line.match(/^\s*([^:]+?)\s*:\s*(.*)/) + if (!taskMatch) + return line // Not a task line, return as is. + + const taskName = taskMatch[1].trim() + let paramsStr = taskMatch[2].trim() + + // Rule 1: Correct multiple "after" dependencies ONLY if they exist. + // This is a common mistake, e.g., "..., after task1, after task2, ..." + const afterCount = (paramsStr.match(/after /g) || []).length + if (afterCount > 1) + paramsStr = paramsStr.replace(/,\s*after\s+/g, ' ') + + // Rule 2: Normalize spacing between parameters for consistency. + const finalParams = paramsStr.replace(/\s*,\s*/g, ', ').trim() + return `${taskName} :${finalParams}` + }) + .join('\n') + } + else { + // For mindmap and sequence charts, which are sensitive to syntax, + // pass the code through directly. + finalCode = trimmedCode + } } else { // Step 1: Clean and prepare Mermaid code using the extracted prepareMermaidCode function + // This function handles flowcharts appropriately. finalCode = prepareMermaidCode(primitiveCode, look) } @@ -319,13 +283,12 @@ const Flowchart = React.forwardRef((props: { THEMES, ) - // Step 4: Clean SVG code and convert to base64 using the extracted functions + // Step 4: Clean up SVG code const cleanedSvg = cleanUpSvgCode(processedSvg) - const base64Svg = await svgToBase64(cleanedSvg) - if (base64Svg && typeof base64Svg === 'string') { - diagramCache.set(cacheKey, base64Svg) - setSvgCode(base64Svg) + if (cleanedSvg && typeof cleanedSvg === 'string') { + diagramCache.set(cacheKey, cleanedSvg) + setSvgString(cleanedSvg) } setIsLoading(false) @@ -334,12 +297,9 @@ const Flowchart = React.forwardRef((props: { // Error handling handleRenderError(error) } - }, [chartId, isInitialized, cacheKey, isCodeComplete, look, currentTheme, t]) + }, [chartId, isInitialized, look, currentTheme, t]) - /** - * Configure mermaid based on selected style and theme - */ - const configureMermaid = useCallback(() => { + const configureMermaid = useCallback((primitiveCode: string) => { if (typeof window !== 'undefined' && isInitialized) { const themeVars = THEMES[currentTheme] const config: any = { @@ -361,23 +321,37 @@ const Flowchart = React.forwardRef((props: { mindmap: { useMaxWidth: true, padding: 10, - diagramPadding: 20, }, } + const isFlowchart = primitiveCode.trim().startsWith('graph') || primitiveCode.trim().startsWith('flowchart') + if (look === 'classic') { config.theme = currentTheme === 'dark' ? 'dark' : 'neutral' - config.flowchart = { - htmlLabels: true, - useMaxWidth: true, - diagramPadding: 12, - nodeSpacing: 60, - rankSpacing: 80, - curve: 'linear', - ranker: 'tight-tree', + + if (isFlowchart) { + config.flowchart = { + htmlLabels: true, + useMaxWidth: true, + nodeSpacing: 60, + rankSpacing: 80, + curve: 'linear', + ranker: 'tight-tree', + } + } + + if (currentTheme === 'dark') { + config.themeVariables = { + background: themeVars.background, + primaryColor: themeVars.primaryColor, + primaryBorderColor: themeVars.primaryBorderColor, + primaryTextColor: themeVars.primaryTextColor, + secondaryColor: themeVars.secondaryColor, + tertiaryColor: themeVars.tertiaryColor, + } } } - else { + else { // look === 'handDrawn' config.theme = 'default' config.themeCSS = ` .node rect { fill-opacity: 0.85; } @@ -389,27 +363,17 @@ const Flowchart = React.forwardRef((props: { config.themeVariables = { fontSize: '14px', fontFamily: 'sans-serif', + primaryBorderColor: currentTheme === 'dark' ? THEMES.dark.connectionColor : THEMES.light.connectionColor, } - config.flowchart = { - htmlLabels: true, - useMaxWidth: true, - diagramPadding: 10, - nodeSpacing: 40, - rankSpacing: 60, - curve: 'basis', - } - config.themeVariables.primaryBorderColor = currentTheme === 'dark' ? THEMES.dark.connectionColor : THEMES.light.connectionColor - } - if (currentTheme === 'dark' && !config.themeVariables) { - config.themeVariables = { - background: themeVars.background, - primaryColor: themeVars.primaryColor, - primaryBorderColor: themeVars.primaryBorderColor, - primaryTextColor: themeVars.primaryTextColor, - secondaryColor: themeVars.secondaryColor, - tertiaryColor: themeVars.tertiaryColor, - fontFamily: 'sans-serif', + if (isFlowchart) { + config.flowchart = { + htmlLabels: true, + useMaxWidth: true, + nodeSpacing: 40, + rankSpacing: 60, + curve: 'basis', + } } } @@ -425,44 +389,50 @@ const Flowchart = React.forwardRef((props: { return false }, [currentTheme, isInitialized, look]) - // Effect for theme and style configuration + // This is the main rendering effect. + // It triggers whenever the code, theme, or style changes. useEffect(() => { - if (diagramCache.has(cacheKey)) { - setSvgCode(diagramCache.get(cacheKey) || null) - setIsLoading(false) - return - } - - if (configureMermaid() && containerRef.current && isCodeComplete) - renderFlowchart(props.PrimitiveCode) - }, [look, props.PrimitiveCode, renderFlowchart, isInitialized, cacheKey, currentTheme, isCodeComplete, configureMermaid]) - - // Effect for rendering with debounce - useEffect(() => { - if (diagramCache.has(cacheKey)) { - setSvgCode(diagramCache.get(cacheKey) || null) + if (!isInitialized) + return + + // Don't render if code is too short + if (!props.PrimitiveCode || props.PrimitiveCode.length < 10) { setIsLoading(false) + setSvgString(null) return } + // Use a timeout to handle streaming code and debounce rendering if (renderTimeoutRef.current) clearTimeout(renderTimeoutRef.current) - if (isCodeComplete) { - renderTimeoutRef.current = setTimeout(() => { - if (isInitialized) - renderFlowchart(props.PrimitiveCode) - }, 300) - } - else { - setIsLoading(true) - } + setIsLoading(true) + + renderTimeoutRef.current = setTimeout(() => { + // Final validation before rendering + if (!isMermaidCodeComplete(props.PrimitiveCode)) { + setIsLoading(false) + setErrMsg('Diagram code is not complete or invalid.') + return + } + + const cacheKey = `${props.PrimitiveCode}-${look}-${currentTheme}` + if (diagramCache.has(cacheKey)) { + setErrMsg('') + setSvgString(diagramCache.get(cacheKey) || null) + setIsLoading(false) + return + } + + if (configureMermaid(props.PrimitiveCode)) + renderFlowchart(props.PrimitiveCode) + }, 300) // 300ms debounce return () => { if (renderTimeoutRef.current) clearTimeout(renderTimeoutRef.current) } - }, [props.PrimitiveCode, renderFlowchart, isInitialized, cacheKey, isCodeComplete]) + }, [props.PrimitiveCode, look, currentTheme, isInitialized, configureMermaid, renderFlowchart]) // Cleanup on unmount useEffect(() => { @@ -471,14 +441,22 @@ const Flowchart = React.forwardRef((props: { containerRef.current.innerHTML = '' if (renderTimeoutRef.current) clearTimeout(renderTimeoutRef.current) - if (codeCompletionCheckRef.current) - clearTimeout(codeCompletionCheckRef.current) } }, []) + const handlePreviewClick = async () => { + if (svgString) { + const base64 = await svgToBase64(svgString) + setImagePreviewUrl(base64) + } + } + const toggleTheme = () => { - setCurrentTheme(prevTheme => prevTheme === 'light' ? Theme.dark : Theme.light) + const newTheme = currentTheme === 'light' ? 'dark' : 'light' + // Ensure a full, clean re-render cycle, consistent with global theme change. diagramCache.clear() + setSvgString(null) + setCurrentTheme(newTheme) } // Style classes for theme-dependent elements @@ -527,14 +505,26 @@ const Flowchart = React.forwardRef((props: {
setLook('classic')} + onClick={() => { + if (look !== 'classic') { + diagramCache.clear() + setSvgString(null) + setLook('classic') + } + }} >
{t('app.mermaid.classic')}
setLook('handDrawn')} + onClick={() => { + if (look !== 'handDrawn') { + diagramCache.clear() + setSvgString(null) + setLook('handDrawn') + } + }} >
{t('app.mermaid.handDrawn')}
@@ -544,7 +534,7 @@ const Flowchart = React.forwardRef((props: {
- {isLoading && !svgCode && ( + {isLoading && !svgString && (
{!isCodeComplete && ( @@ -555,8 +545,8 @@ const Flowchart = React.forwardRef((props: {
)} - {svgCode && ( -
setImagePreviewUrl(svgCode)}> + {svgString && ( +
- mermaid_chart { setErrMsg('Chart rendering failed, please refresh and retry') }} + dangerouslySetInnerHTML={{ __html: svgString }} />
)} diff --git a/web/app/components/base/mermaid/utils.ts b/web/app/components/base/mermaid/utils.ts index 9936a9fc59..9d56494227 100644 --- a/web/app/components/base/mermaid/utils.ts +++ b/web/app/components/base/mermaid/utils.ts @@ -3,52 +3,31 @@ export function cleanUpSvgCode(svgCode: string): string { } /** - * Preprocesses mermaid code to fix common syntax issues + * Prepares mermaid code for rendering by sanitizing common syntax issues. + * @param {string} mermaidCode - The mermaid code to prepare + * @param {'classic' | 'handDrawn'} style - The rendering style + * @returns {string} - The prepared mermaid code */ -export function preprocessMermaidCode(code: string): string { - if (!code || typeof code !== 'string') +export const prepareMermaidCode = (mermaidCode: string, style: 'classic' | 'handDrawn'): string => { + if (!mermaidCode || typeof mermaidCode !== 'string') return '' - // First check if this is a gantt chart - if (code.trim().startsWith('gantt')) { - // For gantt charts, we need to ensure each task is on its own line - // Split the code into lines and process each line separately - const lines = code.split('\n').map(line => line.trim()) - return lines.join('\n') - } + let code = mermaidCode.trim() - return code - // Replace English colons with Chinese colons in section nodes to avoid parsing issues - .replace(/section\s+([^:]+):/g, (match, sectionName) => `section ${sectionName}:`) - // Fix common syntax issues - .replace(/fifopacket/g, 'rect') - // Ensure graph has direction - .replace(/^graph\s+((?:TB|BT|RL|LR)*)/, (match, direction) => { - return direction ? match : 'graph TD' - }) - // Clean up empty lines and extra spaces - .trim() -} + // Security: Sanitize against javascript: protocol in click events (XSS vector) + code = code.replace(/(\bclick\s+\w+\s+")javascript:[^"]*(")/g, '$1#$2') -/** - * Prepares mermaid code based on selected style - */ -export function prepareMermaidCode(code: string, style: 'classic' | 'handDrawn'): string { - let finalCode = preprocessMermaidCode(code) + // Convenience: Basic BR replacement. This is a common and safe operation. + code = code.replace(//g, '\n') - // Special handling for gantt charts and mindmaps - if (finalCode.trim().startsWith('gantt') || finalCode.trim().startsWith('mindmap')) { - // For gantt charts and mindmaps, preserve the structure exactly as is - return finalCode - } + let finalCode = code + // Hand-drawn style requires some specific clean-up. if (style === 'handDrawn') { finalCode = finalCode - // Remove style definitions that interfere with hand-drawn style .replace(/style\s+[^\n]+/g, '') .replace(/linkStyle\s+[^\n]+/g, '') .replace(/^flowchart/, 'graph') - // Remove any styles that might interfere with hand-drawn style .replace(/class="[^"]*"/g, '') .replace(/fill="[^"]*"/g, '') .replace(/stroke="[^"]*"/g, '') @@ -82,7 +61,6 @@ export function svgToBase64(svgGraph: string): Promise { }) } catch (error) { - console.error('Error converting SVG to base64:', error) return Promise.resolve('') } } @@ -115,13 +93,11 @@ export function processSvgForTheme( } else { let i = 0 - themes.dark.nodeColors.forEach(() => { - const regex = /fill="#[a-fA-F0-9]{6}"[^>]*class="node-[^"]*"/g - processedSvg = processedSvg.replace(regex, (match: string) => { - const colorIndex = i % themes.dark.nodeColors.length - i++ - return match.replace(/fill="#[a-fA-F0-9]{6}"/, `fill="${themes.dark.nodeColors[colorIndex].bg}"`) - }) + const nodeColorRegex = /fill="#[a-fA-F0-9]{6}"[^>]*class="node-[^"]*"/g + processedSvg = processedSvg.replace(nodeColorRegex, (match: string) => { + const colorIndex = i % themes.dark.nodeColors.length + i++ + return match.replace(/fill="#[a-fA-F0-9]{6}"/, `fill="${themes.dark.nodeColors[colorIndex].bg}"`) }) processedSvg = processedSvg @@ -139,14 +115,12 @@ export function processSvgForTheme( .replace(/stroke-width="1"/g, 'stroke-width="1.5"') } else { - themes.light.nodeColors.forEach(() => { - const regex = /fill="#[a-fA-F0-9]{6}"[^>]*class="node-[^"]*"/g - let i = 0 - processedSvg = processedSvg.replace(regex, (match: string) => { - const colorIndex = i % themes.light.nodeColors.length - i++ - return match.replace(/fill="#[a-fA-F0-9]{6}"/, `fill="${themes.light.nodeColors[colorIndex].bg}"`) - }) + let i = 0 + const nodeColorRegex = /fill="#[a-fA-F0-9]{6}"[^>]*class="node-[^"]*"/g + processedSvg = processedSvg.replace(nodeColorRegex, (match: string) => { + const colorIndex = i % themes.light.nodeColors.length + i++ + return match.replace(/fill="#[a-fA-F0-9]{6}"/, `fill="${themes.light.nodeColors[colorIndex].bg}"`) }) processedSvg = processedSvg @@ -187,24 +161,10 @@ export function isMermaidCodeComplete(code: string): boolean { // Check for basic syntax structure const hasValidStart = /^(graph|flowchart|sequenceDiagram|classDiagram|classDef|class|stateDiagram|gantt|pie|er|journey|requirementDiagram|mindmap)/.test(trimmedCode) - // Check for balanced brackets and parentheses - const isBalanced = (() => { - const stack = [] - const pairs = { '{': '}', '[': ']', '(': ')' } - - for (const char of trimmedCode) { - if (char in pairs) { - stack.push(char) - } - else if (Object.values(pairs).includes(char)) { - const last = stack.pop() - if (pairs[last as keyof typeof pairs] !== char) - return false - } - } - - return stack.length === 0 - })() + // The balanced bracket check was too strict and produced false negatives for valid + // mermaid syntax like the asymmetric shape `A>B]`. Relying on Mermaid's own + // parser is more robust. + const isBalanced = true // Check for common syntax errors const hasNoSyntaxErrors = !trimmedCode.includes('undefined') @@ -215,7 +175,7 @@ export function isMermaidCodeComplete(code: string): boolean { return hasValidStart && isBalanced && hasNoSyntaxErrors } catch (error) { - console.debug('Mermaid code validation error:', error) + console.error('Mermaid code validation error:', error) return false } } diff --git a/web/app/components/datasets/create/step-two/index.tsx b/web/app/components/datasets/create/step-two/index.tsx index c931addd1a..acc891407f 100644 --- a/web/app/components/datasets/create/step-two/index.tsx +++ b/web/app/components/datasets/create/step-two/index.tsx @@ -161,7 +161,9 @@ const StepTwo = ({ const isInCreatePage = !datasetId || (datasetId && !currentDataset?.data_source_type) const dataSourceType = isInCreatePage ? inCreatePageDataSourceType : currentDataset?.data_source_type - const [segmentationType, setSegmentationType] = useState(ProcessMode.general) + const [segmentationType, setSegmentationType] = useState( + currentDataset?.doc_form === ChunkingMode.parentChild ? ProcessMode.parentChild : ProcessMode.general, + ) const [segmentIdentifier, doSetSegmentIdentifier] = useState(DEFAULT_SEGMENT_IDENTIFIER) const setSegmentIdentifier = useCallback((value: string, canEmpty?: boolean) => { doSetSegmentIdentifier(value ? escape(value) : (canEmpty ? '' : DEFAULT_SEGMENT_IDENTIFIER)) @@ -207,7 +209,14 @@ const StepTwo = ({ } if (value === ChunkingMode.parentChild && indexType === IndexingType.ECONOMICAL) setIndexType(IndexingType.QUALIFIED) + setDocForm(value) + + if (value === ChunkingMode.parentChild) + setSegmentationType(ProcessMode.parentChild) + else + setSegmentationType(ProcessMode.general) + // eslint-disable-next-line ts/no-use-before-define currentEstimateMutation.reset() } @@ -503,6 +512,20 @@ const StepTwo = ({ setOverlap(overlap!) setRules(rules.pre_processing_rules) setDefaultConfig(rules) + + if (documentDetail.dataset_process_rule.mode === 'hierarchical') { + setParentChildConfig({ + chunkForContext: rules.parent_mode || 'paragraph', + parent: { + delimiter: escape(rules.segmentation.separator), + maxLength: rules.segmentation.max_tokens, + }, + child: { + delimiter: escape(rules.subchunk_segmentation.separator), + maxLength: rules.subchunk_segmentation.max_tokens, + }, + }) + } } } @@ -965,8 +988,8 @@ const StepTwo = ({
{t('datasetSettings.form.retrievalSetting.title')}
{t('datasetSettings.form.retrievalSetting.learnMore')} + href={docLink('/guides/knowledge-base/create-knowledge-and-upload-documents')} + className='text-text-accent'>{t('datasetSettings.form.retrievalSetting.learnMore')} {t('datasetSettings.form.retrievalSetting.longDescription')}
@@ -1130,7 +1153,7 @@ const StepTwo = ({ const indexForLabel = index + 1 return ( { const currentWorkspace = useSelector(s => s.currentWorkspace) const getIconUrl = useCallback((fileName: string) => { - return `${apiPrefix}/workspaces/current/plugin/icon?tenant_id=${currentWorkspace.id}&filename=${fileName}` + return `${API_PREFIX}/workspaces/current/plugin/icon?tenant_id=${currentWorkspace.id}&filename=${fileName}` }, [currentWorkspace.id]) return { diff --git a/web/app/components/plugins/plugin-detail-panel/operation-dropdown.tsx b/web/app/components/plugins/plugin-detail-panel/operation-dropdown.tsx index f8b4f63924..9cc5af589b 100644 --- a/web/app/components/plugins/plugin-detail-panel/operation-dropdown.tsx +++ b/web/app/components/plugins/plugin-detail-panel/operation-dropdown.tsx @@ -12,6 +12,7 @@ import { PortalToFollowElemTrigger, } from '@/app/components/base/portal-to-follow-elem' import cn from '@/utils/classnames' +import { useGlobalPublicStore } from '@/context/global-public-context' type Props = { source: PluginSource @@ -40,6 +41,8 @@ const OperationDropdown: FC = ({ setOpen(!openRef.current) }, [setOpen]) + const { enable_marketplace } = useGlobalPublicStore(s => s.systemFeatures) + return ( = ({ className='system-md-regular cursor-pointer rounded-lg px-3 py-1.5 text-text-secondary hover:bg-state-base-hover' >{t('plugin.detailPanel.operation.checkUpdate')}
)} - {(source === PluginSource.marketplace || source === PluginSource.github) && ( + {(source === PluginSource.marketplace || source === PluginSource.github) && enable_marketplace && ( {t('plugin.detailPanel.operation.viewDetail')} )} - {(source === PluginSource.marketplace || source === PluginSource.github) && ( + {(source === PluginSource.marketplace || source === PluginSource.github) && enable_marketplace && (
)}
= ({ const getValueFromI18nObject = useRenderI18nObject() const title = getValueFromI18nObject(label) const descriptionText = getValueFromI18nObject(description) + const { enable_marketplace } = useGlobalPublicStore(s => s.systemFeatures) return (
= ({ } - {source === PluginSource.marketplace + {source === PluginSource.marketplace && enable_marketplace && <>
{t('plugin.from')} marketplace
diff --git a/web/app/components/plugins/plugin-page/index.tsx b/web/app/components/plugins/plugin-page/index.tsx index 1531e17896..bf2d327d31 100644 --- a/web/app/components/plugins/plugin-page/index.tsx +++ b/web/app/components/plugins/plugin-page/index.tsx @@ -35,7 +35,7 @@ import type { PluginDeclaration, PluginManifestInMarket } from '../types' import { sleep } from '@/utils' import { getDocsUrl } from '@/app/components/plugins/utils' import { fetchBundleInfoFromMarketPlace, fetchManifestFromMarketPlace } from '@/service/plugins' -import { marketplaceApiPrefix } from '@/config' +import { MARKETPLACE_API_PREFIX } from '@/config' import { SUPPORT_INSTALL_LOCAL_FILE_EXTENSIONS } from '@/config' import I18n from '@/context/i18n' import { noop } from 'lodash-es' @@ -106,7 +106,7 @@ const PluginPage = ({ setManifest({ ...plugin, version: version.version, - icon: `${marketplaceApiPrefix}/plugins/${plugin.org}/${plugin.name}/icon`, + icon: `${MARKETPLACE_API_PREFIX}/plugins/${plugin.org}/${plugin.name}/icon`, }) showInstallFromMarketplace() return diff --git a/web/app/components/workflow/block-selector/market-place-plugin/list.tsx b/web/app/components/workflow/block-selector/market-place-plugin/list.tsx index 27cd2ae3ea..019b32ae25 100644 --- a/web/app/components/workflow/block-selector/market-place-plugin/list.tsx +++ b/web/app/components/workflow/block-selector/market-place-plugin/list.tsx @@ -6,7 +6,7 @@ import Item from './item' import type { Plugin } from '@/app/components/plugins/types.ts' import cn from '@/utils/classnames' import Link from 'next/link' -import { marketplaceUrlPrefix } from '@/config' +import { MARKETPLACE_URL_PREFIX } from '@/config' import { RiArrowRightUpLine, RiSearchLine } from '@remixicon/react' import { noop } from 'lodash-es' @@ -32,7 +32,7 @@ const List = forwardRef(({ const { t } = useTranslation() const hasFilter = !searchText const hasRes = list.length > 0 - const urlWithSearchText = `${marketplaceUrlPrefix}/?q=${searchText}&tags=${tags.join(',')}` + const urlWithSearchText = `${MARKETPLACE_URL_PREFIX}/?q=${searchText}&tags=${tags.join(',')}` const nextToStickyELemRef = useRef(null) const { handleScroll, scrollPosition } = useStickyScroll({ @@ -71,7 +71,7 @@ const List = forwardRef(({ return ( {t('plugin.findMoreInMarketplace')} diff --git a/web/app/components/workflow/block-selector/tool/tool.tsx b/web/app/components/workflow/block-selector/tool/tool.tsx index f135b5bf4e..d48d0bfc90 100644 --- a/web/app/components/workflow/block-selector/tool/tool.tsx +++ b/web/app/components/workflow/block-selector/tool/tool.tsx @@ -10,7 +10,7 @@ import type { ToolWithProvider } from '../../types' import { BlockEnum } from '../../types' import type { ToolDefaultValue, ToolValue } from '../types' import { ViewType } from '../view-type-select' -import ActonItem from './action-item' +import ActionItem from './action-item' import BlockIcon from '../../block-icon' import { useTranslation } from 'react-i18next' @@ -118,7 +118,7 @@ const Tool: FC = ({ {hasAction && !isFold && ( actions.map(action => ( - { const { t } = useTranslation() diff --git a/web/app/components/workflow/nodes/_base/components/switch-plugin-version.tsx b/web/app/components/workflow/nodes/_base/components/switch-plugin-version.tsx index ceb38fbcfd..cddd529aea 100644 --- a/web/app/components/workflow/nodes/_base/components/switch-plugin-version.tsx +++ b/web/app/components/workflow/nodes/_base/components/switch-plugin-version.tsx @@ -15,7 +15,7 @@ import { pluginManifestToCardPluginProps } from '@/app/components/plugins/instal import { Badge as Badge2, BadgeState } from '@/app/components/base/badge/index' import Link from 'next/link' import { useTranslation } from 'react-i18next' -import { marketplaceUrlPrefix } from '@/config' +import { MARKETPLACE_URL_PREFIX } from '@/config' export type SwitchPluginVersionProps = { uniqueIdentifier: string @@ -82,7 +82,7 @@ export const SwitchPluginVersion: FC = (props) => { modalBottomLeft={ diff --git a/web/app/components/workflow/panel/chat-variable-panel/components/variable-modal.tsx b/web/app/components/workflow/panel/chat-variable-panel/components/variable-modal.tsx index 96f18cd717..d8da0e69a3 100644 --- a/web/app/components/workflow/panel/chat-variable-panel/components/variable-modal.tsx +++ b/web/app/components/workflow/panel/chat-variable-panel/components/variable-modal.tsx @@ -37,7 +37,6 @@ const typeList = [ ChatVarType.ArrayString, ChatVarType.ArrayNumber, ChatVarType.ArrayObject, - ChatVarType.ArrayFile, ] const objectPlaceholder = `# example @@ -128,7 +127,6 @@ const ChatVariableModal = ({ case ChatVarType.ArrayString: case ChatVarType.ArrayNumber: case ChatVarType.ArrayObject: - case ChatVarType.ArrayFile: return value?.filter(Boolean) || [] } } @@ -296,86 +294,84 @@ const ChatVariableModal = ({
{/* default value */} - {type !== ChatVarType.ArrayFile && ( -
-
-
{t('workflow.chatVariable.modal.value')}
- {(type === ChatVarType.ArrayString || type === ChatVarType.ArrayNumber) && ( - - )} - {type === ChatVarType.Object && ( - - )} -
-
- {type === ChatVarType.String && ( - // Input will remove \n\r, so use Textarea just like description area -