Move routing table, emit(), and is_enterprise_telemetry_enabled() from
enterprise/telemetry/gateway.py into core/telemetry/gateway.py so both
CE and EE share one code path. The ce_eligible flag in CASE_ROUTING
controls which events flow in CE — flipping it is the only change needed
to enable an event in community edition.
- Delete enterprise/telemetry/gateway.py (class-based singleton)
- Create core/telemetry/gateway.py (stateless functions, no shared state)
- Simplify core/telemetry/__init__.py to thin facade over gateway
- Remove TelemetryGateway class and get_gateway() from ext_enterprise_telemetry
- Single-source is_enterprise_telemetry_enabled in core.telemetry.gateway
- Fix pre-existing test bugs (missing dify.event.id in metric handler tests)
- Update all imports and mock paths across 7 test files
Add comprehensive model tracking across all OTEL metrics and logs:
- Node execution metrics now include model_name for LLM operations
- Suggested question metrics include model_provider and model_name
- Dataset retrieval captures both embedding and rerank model info
- Updated DATA_DICTIONARY.md with complete metric label documentation
This enables granular cost tracking, performance analysis, and usage monitoring per model across all operation types.
Separate background/configuration instructions from the data dictionary:
- README.md: Overview, configuration, correlation model, content gating
- DATA_DICTIONARY.md: Pure reference format with signals and attributes
The data dictionary is now concise (465 lines vs 911) and focuses on
attribute types and relationships without verbose explanations.
- Added a check to prevent double acquisition of the DB migration lock, raising an error if an attempt is made to acquire it while already held.
- Implemented logic to reuse the lock object if it has already been created, improving efficiency and clarity in lock management.
- Reset the lock object to None upon release to ensure proper state management.
(cherry picked from commit d4b102d3c8a473c4fd6409dba7c198289bb5f921)
- Introduced constants for minimum and maximum join timeout values, enhancing clarity and maintainability.
- Updated the renewal interval calculation to use defined constants for better readability.
- Improved logging messages to include context information, making it easier to trace issues during lock operations.
(cherry picked from commit 1471b77bf5156a95417bde148753702d44221929)
- Updated the database migration locking mechanism to use DbMigrationAutoRenewLock for improved clarity and functionality.
- Removed the AutoRenewRedisLock implementation and its associated tests.
- Adjusted integration and unit tests to reflect the new locking class and its usage in the upgrade_db command.
(cherry picked from commit c812ad9ff26bed3eb59862bd7a5179b7ee83f11f)
- Removed the manual heartbeat function for renewing the Redis lock during database migrations.
- Integrated AutoRenewRedisLock to handle lock renewal automatically, simplifying the upgrade_db command.
- Updated unit tests to reflect changes in lock handling and error management during migrations.
(cherry picked from commit 8814256eb5fa20b29e554264f3b659b027bc4c9a)
- Introduced HEARTBEAT_WAIT_TIMEOUT_SECONDS constant to improve readability and maintainability of test code.
- Updated test assertions to use the new constant instead of a hardcoded value.
(cherry picked from commit 0d53743d83b03ae0e68fad143711ffa5f6354093)
- Added a migration_succeeded flag to track the success of database migrations.
- Enhanced logging messages to indicate the status of the migration when releasing the lock, providing clearer context for potential issues.
(cherry picked from commit e74be0392995d16d288eed2175c51148c9e5b9c0)
- Added a heartbeat function to renew the Redis lock during database migrations, preventing long blockages from crashed processes.
- Updated the upgrade_db command to utilize the new locking mechanism with a configurable TTL.
- Removed the deprecated MIGRATION_LOCK_TTL from DeploymentConfig and related files.
- Enhanced unit tests to cover the new lock renewal behavior and error handling during migrations.
(cherry picked from commit a3331c622435f9f215b95f6b0261f43ae56a9d9c)
- Introduced a configurable Redis lock TTL for database migrations in DeploymentConfig.
- Updated the upgrade_db command to handle lock release errors gracefully.
- Added documentation for the new MIGRATION_LOCK_TTL environment variable in the .env.example file and docker-compose.yaml.
(cherry picked from commit 4a05fb120622908bc109a3715686706aab3d3b59)
- Add dify.credential.id to node execution events
- Add dify.event.id to all telemetry events (APP_CREATED, APP_UPDATED, APP_DELETED, FEEDBACK_CREATED)
This ensures all .name fields have corresponding .id fields for reliable aggregation and deduplication.
- Add APP_CREATED, APP_UPDATED, APP_DELETED counters to EnterpriseTelemetryCounter
- Create EnterpriseTelemetryEvent StrEnum for type-safe event names
- Update metric_handler to use new app-specific counters with labels (tenant_id, app_id, mode)
- Convert all event_name strings to EnterpriseTelemetryEvent enum values
- Update exporter to create OTEL meters for new app counters (dify.app.created.total, etc.)
- Update tests to verify new counter behavior and enum usage
When setting a new default credential in enterprise mode, the code was
only clearing is_default for credentials matching the current user_id.
This caused issues when:
1. Enterprise credential A (synced with system user_id) was default
2. User sets local credential B as default
3. A still had is_default=true (different user_id)
4. Both A and B were considered defaults
The fix removes user_id from the filter only for enterprise deployments,
since enterprise credentials may have different user_id than local ones.
Non-enterprise behavior is unchanged to avoid breaking existing setups.
Fixes EE-1511