# Database Scripts This directory contains database-related utility scripts for RAGFlow. - **mysql_migration.py**: Data migration between tables with stage-based execution - **db_schema_sync.py**: Database schema synchronization using peewee-migrate --- # mysql_migration.py A flexible MySQL data migration tool for migrating data between tables with stage-based execution. ## Overview This script provides stage-based data migration between MySQL tables. Currently supports: - `tenant_model_provider` - `tenant_model_instance` - `tenant_model` ### Migration Stages | Stage | Source Table | Target Table | Description | |-------|-------------|--------------|-------------| | `tenant_model_provider` | `tenant_llm` | `tenant_model_provider` | Extracts distinct `(tenant_id, llm_factory)` pairs | | `tenant_model_instance` | `tenant_llm` + `tenant_model_provider` | `tenant_model_instance` | Creates instances with distinct `(tenant_id, llm_factory, api_key)` | | `tenant_model` | `tenant_llm` + `tenant_model_provider` + `tenant_model_instance` | `tenant_model` | Migrates model configurations (only `status='0'` records) | ### Stage Dependencies ``` tenant_model_provider (no dependencies) ↓ tenant_model_instance (depends on tenant_model_provider) ↓ tenant_model (depends on tenant_model_provider and tenant_model_instance) ``` ### Field Mapping Rules #### tenant_model_provider | Target Field | Source | Rule | |--------------|--------|------| | `id` | - | Random 32-character UUID1 | | `provider_name` | `tenant_llm.llm_factory` | Direct mapping | | `tenant_id` | `tenant_llm.tenant_id` | Direct mapping | - **Deduplication**: Groups by `(tenant_id, llm_factory)` and takes distinct pairs #### tenant_model_instance | Target Field | Source | Rule | |--------------|--------|------| | `id` | - | Random 32-character UUID1 | | `instance_name` | `tenant_llm.llm_factory` | Direct mapping | | `provider_id` | `tenant_model_provider.id` | JOIN on `tenant_id` and `provider_name=llm_factory` | | `api_key` | `tenant_llm.api_key` | Direct mapping | | `status` | `tenant_llm.status` | Direct mapping | - **Deduplication**: Groups by `(tenant_id, llm_factory, api_key)` and takes distinct records #### tenant_model | Target Field | Source | Rule | |--------------|--------|------| | `id` | - | Random 32-character UUID1 | | `model_name` | `tenant_llm.llm_name` | Direct mapping | | `provider_id` | `tenant_model_provider.id` | JOIN on `tenant_id` and `provider_name=llm_factory` | | `instance_id` | `tenant_model_instance.id` | JOIN on `provider_id` and `api_key` | | `model_type` | `tenant_llm.model_type` | Direct mapping | | `status` | `tenant_llm.status` | Direct mapping | - **Filter**: Only migrates records where `tenant_llm.status='0'` ## Usage ### Command Line Arguments ``` python mysql_migration.py [OPTIONS] ``` | Option | Short | Description | Default | |--------|-------|-------------|---------| | `--host` | - | MySQL host | `localhost` | | `--port` | - | MySQL port | `3306` | | `--user` | - | MySQL user | `root` | | `--password` | - | MySQL password | (empty) | | `--database` | - | MySQL database name | `rag_flow` | | `--config` | `-c` | Path to YAML config file | - | | `--stages` | `-s` | Comma-separated list of stages to run | - | | `--list-stages` | `-l` | List available stages and exit | - | | `--execute` | `-e` | Execute full migration (create tables and migrate data) | `False` | | `--create-table-only` | - | Only create target tables, skip data migration | `False` | > **Note**: MySQL connection can be configured via command line arguments (`--host`, `--port`, `--user`, `--password`, `--database`) or via a YAML config file (`--config`). Command line arguments take precedence over config file values. ### Execution Modes The script has three mutually exclusive modes: 1. **Dry-Run Mode** (default): Check only, no database writes ```bash # Using config file python mysql_migration.py --stages tenant_model_provider --config config.yaml # Using command line MySQL connection python mysql_migration.py --stages tenant_model_provider --host localhost --port 3306 --user root ``` 2. **Create Table Only Mode**: Create target tables without migrating data ```bash python mysql_migration.py --stages tenant_model_provider --config config.yaml --create-table-only ``` 3. **Execute Mode**: Create tables and migrate data ```bash python mysql_migration.py --stages tenant_model_provider --config config.yaml --execute ``` ### Configuration File Create a YAML configuration file with MySQL connection settings: ```yaml database: host: localhost port: 3306 user: root password: your_password name: rag_flow ``` Alternative keys are also supported: ```yaml mysql: host: localhost port: 3306 user: root password: your_password database: rag_flow ``` ### Examples ```bash # List all available stages python mysql_migration.py --list-stages # Dry run single stage using command line MySQL connection python mysql_migration.py --stages tenant_model_provider --host localhost --port 3306 --user root --password secret # Dry run single stage using config file python mysql_migration.py --stages tenant_model_provider --config /path/to/config.yaml # Create tables only for multiple stages python mysql_migration.py --stages tenant_model_provider,tenant_model_instance --config /path/to/config.yaml --create-table-only # Execute full migration for all stages (in dependency order) python mysql_migration.py --stages tenant_model_provider,tenant_model_instance,tenant_model --config /path/to/config.yaml --execute # Use config file with command line password override python mysql_migration.py --stages tenant_model_provider --config /path/to/config.yaml --password mypassword --execute ``` ## Output Interpretation ### Stage Execution Log Each stage displays a header showing progress: ``` ============================================================ Stage [1/3]: tenant_model_provider ============================================================ ``` The stage then performs: 1. Check phase: Verifies source/target tables exist and counts records to migrate 2. Execute phase: Creates tables (if needed) and migrates data in batches ### Dry-Run Output In dry-run mode, the script outputs what it would do without writing: ``` [DRY RUN] Would insert 150 records instance_name=OpenAI, provider_id=abc123, api_key=*** ... and 145 more records ``` ### Migration Summary After all stages complete, a summary is printed: ``` ============================================================ Migration Summary ============================================================ Total Duration: 2.45s Total Rows Processed: 350 Tables Operated: tenant_model_provider, tenant_model_instance ------------------------------------------------------------ Stage Details: [tenant_model_provider] Tables: tenant_model_provider, Rows: 50, Duration: 0.82s [tenant_model_instance] Tables: tenant_model_instance, Rows: 300, Duration: 1.63s ============================================================ ``` ### Common Messages | Message | Meaning | |---------|-------------------------------------------------------------------------| | `No new data to migrate` | All records already exist in target table | | `[DRY RUN] Target table does not exist` | Target table missing, use `--execute` or `--create-table-only`to create | | `Dependency table does not exist` | Required table from previous stage missing | | `Inserted batch X: Y records` | Successfully inserted batch of records | --- # db_schema_sync.py A database schema synchronization tool that uses peewee-migrate to detect and manage schema changes. ## Overview This script: 1. Reads model definitions from `api/db/db_models.py` 2. Compares with existing database tables specified via command line 3. Generates migration files in `tools/migrate/{version}/` ### Detected Change Types | Change Type | Description | Auto-included? | |-------------|-------------|----------------| | New table | Model class with no corresponding DB table | Yes | | New field | Model field not present in DB table | Yes | | Field type change | Model field type differs from DB column type | Yes | | Removed field | DB column not present in model definition | No (requires `--drop`) | > **Warning**: Removed fields are **not** included in migrations by default. You must explicitly use `--drop` to generate `DROP COLUMN` statements, as this operation permanently deletes data. ## Prerequisites Install peewee-migrate: ```bash pip install peewee-migrate ``` ## Usage ### Command Line Arguments ``` python db_schema_sync.py [OPTIONS] ``` | Option | Short | Description | |--------|-------|-------------| | `--host` | - | MySQL host (required) | | `--port` | - | MySQL port (default: 3306) | | `--user` | - | MySQL user (required) | | `--password` | - | MySQL password (required) | | `--database` | - | MySQL database name (required) | | `--version` | `-v` | Version number in format `vxx.xx.xx` (required) | | `--list` | `-l` | List all migrations | | `--create` | - | Create a new migration (auto-detect changes) | | `--migrate` | `-m` | Run pending migrations | | `--diff` | `-d` | Show schema differences | | `--name` | `-n` | Migration name (default: auto) | | `--drop` | - | Include `DROP COLUMN` for fields removed from models (destructive - permanently deletes data!) | ### Version Format Version must be in format `vxx.xx.xx` where `xx` are digits: - Valid: `v0.25.3`, `v1.0.0`, `v10.20.30` - Invalid: `0.25.3`, `v0.25`, `v0.25.3.1` ### Migration File Location Migration files are stored in: ``` tools/migrate/{version_dir}/ ``` Where `{version_dir}` is the version with `.` replaced by `_`. Example: Version `v0.25.3` → Directory `tools/migrate/v0_25_3/` ### Examples ```bash # List all migrations python db_schema_sync.py --list \ --host localhost --port 3306 --user root --password xxx --database rag_flow \ --version v0.25.3 # Create a new auto-detected migration (new tables, new fields, type changes only) python db_schema_sync.py --create \ --host localhost --port 3306 --user root --password xxx --database rag_flow \ --version v0.25.3 # Create a migration including dropped fields (destructive!) python db_schema_sync.py --create --drop \ --host localhost --port 3306 --user root --password xxx --database rag_flow \ --version v0.25.3 # Create a named migration python db_schema_sync.py --create --name add_user_table \ --host localhost --port 3306 --user root --password xxx --database rag_flow \ --version v0.25.3 # Run all pending migrations python db_schema_sync.py --migrate \ --host localhost --port 3306 --user root --password xxx --database rag_flow \ --version v0.25.3 # Show schema differences (including removed fields) python db_schema_sync.py --diff \ --host localhost --port 3306 --user root --password xxx --database rag_flow \ --version v0.25.3 ``` ## How It Works 1. **Load Models**: Imports all model classes from `api/db/db_models.py` 2. **Connect Database**: Creates MySQL connection from command line arguments 3. **Detect Changes**: Compares model definitions with actual database schema: - New tables → `create_model` - New fields → `ALTER TABLE ADD COLUMN` - Field type changes → `ALTER TABLE MODIFY COLUMN` - Removed fields → `ALTER TABLE DROP COLUMN` (only with `--drop`) 4. **Generate Migration**: Creates Python migration file with `migrate()` and `rollback()` functions ### Rollback Behavior | Forward Operation | Rollback Operation | |-------------------|--------------------| | `CREATE TABLE` | `remove_model` | | `ADD COLUMN` | `DROP COLUMN` | | `MODIFY COLUMN` | `MODIFY COLUMN` (restore original type) | | `DROP COLUMN` | `ADD COLUMN` (restore column definition; **data is lost**) | > **Note**: Rolling back a `DROP COLUMN` will re-add the column structure, but the data that was in it cannot be recovered.