mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-05-20 16:26:42 +08:00
### What problem does this PR solve? - Update version tags in README files (including translations) from v0.25.2 to v0.25.3 - Modify Docker image references and documentation to reflect new version - Update version badges and image descriptions - Maintain consistency across all language variants of README files ### Type of change - [x] Documentation Update
347 lines
12 KiB
Markdown
347 lines
12 KiB
Markdown
# Database Scripts
|
|
|
|
This directory contains database-related utility scripts for RAGFlow.
|
|
|
|
- **mysql_migration.py**: Data migration between tables with stage-based execution
|
|
- **db_schema_sync.py**: Database schema synchronization using peewee-migrate
|
|
|
|
---
|
|
|
|
# mysql_migration.py
|
|
|
|
A flexible MySQL data migration tool for migrating data between tables with stage-based execution.
|
|
|
|
## Overview
|
|
|
|
This script provides stage-based data migration between MySQL tables. Currently supports:
|
|
- `tenant_model_provider`
|
|
- `tenant_model_instance`
|
|
- `tenant_model`
|
|
|
|
### Migration Stages
|
|
|
|
| Stage | Source Table | Target Table | Description |
|
|
|-------|-------------|--------------|-------------|
|
|
| `tenant_model_provider` | `tenant_llm` | `tenant_model_provider` | Extracts distinct `(tenant_id, llm_factory)` pairs |
|
|
| `tenant_model_instance` | `tenant_llm` + `tenant_model_provider` | `tenant_model_instance` | Creates instances with distinct `(tenant_id, llm_factory, api_key)` |
|
|
| `tenant_model` | `tenant_llm` + `tenant_model_provider` + `tenant_model_instance` | `tenant_model` | Migrates model configurations (only `status='0'` records) |
|
|
|
|
### Stage Dependencies
|
|
|
|
```
|
|
tenant_model_provider (no dependencies)
|
|
↓
|
|
tenant_model_instance (depends on tenant_model_provider)
|
|
↓
|
|
tenant_model (depends on tenant_model_provider and tenant_model_instance)
|
|
```
|
|
|
|
### Field Mapping Rules
|
|
|
|
#### tenant_model_provider
|
|
|
|
| Target Field | Source | Rule |
|
|
|--------------|--------|------|
|
|
| `id` | - | Random 32-character UUID1 |
|
|
| `provider_name` | `tenant_llm.llm_factory` | Direct mapping |
|
|
| `tenant_id` | `tenant_llm.tenant_id` | Direct mapping |
|
|
|
|
- **Deduplication**: Groups by `(tenant_id, llm_factory)` and takes distinct pairs
|
|
|
|
#### tenant_model_instance
|
|
|
|
| Target Field | Source | Rule |
|
|
|--------------|--------|------|
|
|
| `id` | - | Random 32-character UUID1 |
|
|
| `instance_name` | `tenant_llm.llm_factory` | Direct mapping |
|
|
| `provider_id` | `tenant_model_provider.id` | JOIN on `tenant_id` and `provider_name=llm_factory` |
|
|
| `api_key` | `tenant_llm.api_key` | Direct mapping |
|
|
| `status` | `tenant_llm.status` | Direct mapping |
|
|
|
|
- **Deduplication**: Groups by `(tenant_id, llm_factory, api_key)` and takes distinct records
|
|
|
|
#### tenant_model
|
|
|
|
| Target Field | Source | Rule |
|
|
|--------------|--------|------|
|
|
| `id` | - | Random 32-character UUID1 |
|
|
| `model_name` | `tenant_llm.llm_name` | Direct mapping |
|
|
| `provider_id` | `tenant_model_provider.id` | JOIN on `tenant_id` and `provider_name=llm_factory` |
|
|
| `instance_id` | `tenant_model_instance.id` | JOIN on `provider_id` and `api_key` |
|
|
| `model_type` | `tenant_llm.model_type` | Direct mapping |
|
|
| `status` | `tenant_llm.status` | Direct mapping |
|
|
|
|
- **Filter**: Only migrates records where `tenant_llm.status='0'`
|
|
|
|
## Usage
|
|
|
|
### Command Line Arguments
|
|
|
|
```
|
|
python mysql_migration.py [OPTIONS]
|
|
```
|
|
|
|
| Option | Short | Description | Default |
|
|
|--------|-------|-------------|---------|
|
|
| `--host` | - | MySQL host | `localhost` |
|
|
| `--port` | - | MySQL port | `3306` |
|
|
| `--user` | - | MySQL user | `root` |
|
|
| `--password` | - | MySQL password | (empty) |
|
|
| `--database` | - | MySQL database name | `rag_flow` |
|
|
| `--config` | `-c` | Path to YAML config file | - |
|
|
| `--stages` | `-s` | Comma-separated list of stages to run | - |
|
|
| `--list-stages` | `-l` | List available stages and exit | - |
|
|
| `--execute` | `-e` | Execute full migration (create tables and migrate data) | `False` |
|
|
| `--create-table-only` | - | Only create target tables, skip data migration | `False` |
|
|
|
|
> **Note**: MySQL connection can be configured via command line arguments (`--host`, `--port`, `--user`, `--password`, `--database`) or via a YAML config file (`--config`). Command line arguments take precedence over config file values.
|
|
|
|
### Execution Modes
|
|
|
|
The script has three mutually exclusive modes:
|
|
|
|
1. **Dry-Run Mode** (default): Check only, no database writes
|
|
```bash
|
|
# Using config file
|
|
python mysql_migration.py --stages tenant_model_provider --config config.yaml
|
|
|
|
# Using command line MySQL connection
|
|
python mysql_migration.py --stages tenant_model_provider --host localhost --port 3306 --user root
|
|
```
|
|
|
|
2. **Create Table Only Mode**: Create target tables without migrating data
|
|
```bash
|
|
python mysql_migration.py --stages tenant_model_provider --config config.yaml --create-table-only
|
|
```
|
|
|
|
3. **Execute Mode**: Create tables and migrate data
|
|
```bash
|
|
python mysql_migration.py --stages tenant_model_provider --config config.yaml --execute
|
|
```
|
|
|
|
### Configuration File
|
|
|
|
Create a YAML configuration file with MySQL connection settings:
|
|
|
|
```yaml
|
|
database:
|
|
host: localhost
|
|
port: 3306
|
|
user: root
|
|
password: your_password
|
|
name: rag_flow
|
|
```
|
|
|
|
Alternative keys are also supported:
|
|
|
|
```yaml
|
|
mysql:
|
|
host: localhost
|
|
port: 3306
|
|
user: root
|
|
password: your_password
|
|
database: rag_flow
|
|
```
|
|
|
|
### Examples
|
|
|
|
```bash
|
|
# List all available stages
|
|
python mysql_migration.py --list-stages
|
|
|
|
# Dry run single stage using command line MySQL connection
|
|
python mysql_migration.py --stages tenant_model_provider --host localhost --port 3306 --user root --password secret
|
|
|
|
# Dry run single stage using config file
|
|
python mysql_migration.py --stages tenant_model_provider --config /path/to/config.yaml
|
|
|
|
# Create tables only for multiple stages
|
|
python mysql_migration.py --stages tenant_model_provider,tenant_model_instance --config /path/to/config.yaml --create-table-only
|
|
|
|
# Execute full migration for all stages (in dependency order)
|
|
python mysql_migration.py --stages tenant_model_provider,tenant_model_instance,tenant_model --config /path/to/config.yaml --execute
|
|
|
|
# Use config file with command line password override
|
|
python mysql_migration.py --stages tenant_model_provider --config /path/to/config.yaml --password mypassword --execute
|
|
```
|
|
|
|
## Output Interpretation
|
|
|
|
### Stage Execution Log
|
|
|
|
Each stage displays a header showing progress:
|
|
|
|
```
|
|
============================================================
|
|
Stage [1/3]: tenant_model_provider
|
|
============================================================
|
|
```
|
|
|
|
The stage then performs:
|
|
1. Check phase: Verifies source/target tables exist and counts records to migrate
|
|
2. Execute phase: Creates tables (if needed) and migrates data in batches
|
|
|
|
### Dry-Run Output
|
|
|
|
In dry-run mode, the script outputs what it would do without writing:
|
|
|
|
```
|
|
[DRY RUN] Would insert 150 records
|
|
instance_name=OpenAI, provider_id=abc123, api_key=***
|
|
... and 145 more records
|
|
```
|
|
|
|
### Migration Summary
|
|
|
|
After all stages complete, a summary is printed:
|
|
|
|
```
|
|
============================================================
|
|
Migration Summary
|
|
============================================================
|
|
Total Duration: 2.45s
|
|
Total Rows Processed: 350
|
|
Tables Operated: tenant_model_provider, tenant_model_instance
|
|
------------------------------------------------------------
|
|
Stage Details:
|
|
[tenant_model_provider] Tables: tenant_model_provider, Rows: 50, Duration: 0.82s
|
|
[tenant_model_instance] Tables: tenant_model_instance, Rows: 300, Duration: 1.63s
|
|
============================================================
|
|
```
|
|
|
|
### Common Messages
|
|
|
|
| Message | Meaning |
|
|
|---------|-------------------------------------------------------------------------|
|
|
| `No new data to migrate` | All records already exist in target table |
|
|
| `[DRY RUN] Target table does not exist` | Target table missing, use `--execute` or `--create-table-only`to create |
|
|
| `Dependency table does not exist` | Required table from previous stage missing |
|
|
| `Inserted batch X: Y records` | Successfully inserted batch of records |
|
|
|
|
---
|
|
|
|
# db_schema_sync.py
|
|
|
|
A database schema synchronization tool that uses peewee-migrate to detect and manage schema changes.
|
|
|
|
## Overview
|
|
|
|
This script:
|
|
1. Reads model definitions from `api/db/db_models.py`
|
|
2. Compares with existing database tables specified via command line
|
|
3. Generates migration files in `tools/migrate/{version}/`
|
|
|
|
### Detected Change Types
|
|
|
|
| Change Type | Description | Auto-included? |
|
|
|-------------|-------------|----------------|
|
|
| New table | Model class with no corresponding DB table | Yes |
|
|
| New field | Model field not present in DB table | Yes |
|
|
| Field type change | Model field type differs from DB column type | Yes |
|
|
| Removed field | DB column not present in model definition | No (requires `--drop`) |
|
|
|
|
> **Warning**: Removed fields are **not** included in migrations by default. You must explicitly use `--drop` to generate `DROP COLUMN` statements, as this operation permanently deletes data.
|
|
|
|
## Prerequisites
|
|
|
|
Install peewee-migrate:
|
|
```bash
|
|
pip install peewee-migrate
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Command Line Arguments
|
|
|
|
```
|
|
python db_schema_sync.py [OPTIONS]
|
|
```
|
|
|
|
| Option | Short | Description |
|
|
|--------|-------|-------------|
|
|
| `--host` | - | MySQL host (required) |
|
|
| `--port` | - | MySQL port (default: 3306) |
|
|
| `--user` | - | MySQL user (required) |
|
|
| `--password` | - | MySQL password (required) |
|
|
| `--database` | - | MySQL database name (required) |
|
|
| `--version` | `-v` | Version number in format `vxx.xx.xx` (required) |
|
|
| `--list` | `-l` | List all migrations |
|
|
| `--create` | - | Create a new migration (auto-detect changes) |
|
|
| `--migrate` | `-m` | Run pending migrations |
|
|
| `--diff` | `-d` | Show schema differences |
|
|
| `--name` | `-n` | Migration name (default: auto) |
|
|
| `--drop` | - | Include `DROP COLUMN` for fields removed from models (destructive - permanently deletes data!) |
|
|
|
|
### Version Format
|
|
|
|
Version must be in format `vxx.xx.xx` where `xx` are digits:
|
|
- Valid: `v0.25.3`, `v1.0.0`, `v10.20.30`
|
|
- Invalid: `0.25.3`, `v0.25`, `v0.25.3.1`
|
|
|
|
### Migration File Location
|
|
|
|
Migration files are stored in:
|
|
```
|
|
tools/migrate/{version_dir}/
|
|
```
|
|
|
|
Where `{version_dir}` is the version with `.` replaced by `_`.
|
|
|
|
Example: Version `v0.25.3` → Directory `tools/migrate/v0_25_3/`
|
|
|
|
### Examples
|
|
|
|
```bash
|
|
# List all migrations
|
|
python db_schema_sync.py --list \
|
|
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
|
--version v0.25.3
|
|
|
|
# Create a new auto-detected migration (new tables, new fields, type changes only)
|
|
python db_schema_sync.py --create \
|
|
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
|
--version v0.25.3
|
|
|
|
# Create a migration including dropped fields (destructive!)
|
|
python db_schema_sync.py --create --drop \
|
|
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
|
--version v0.25.3
|
|
|
|
# Create a named migration
|
|
python db_schema_sync.py --create --name add_user_table \
|
|
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
|
--version v0.25.3
|
|
|
|
# Run all pending migrations
|
|
python db_schema_sync.py --migrate \
|
|
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
|
--version v0.25.3
|
|
|
|
# Show schema differences (including removed fields)
|
|
python db_schema_sync.py --diff \
|
|
--host localhost --port 3306 --user root --password xxx --database rag_flow \
|
|
--version v0.25.3
|
|
```
|
|
|
|
## How It Works
|
|
|
|
1. **Load Models**: Imports all model classes from `api/db/db_models.py`
|
|
2. **Connect Database**: Creates MySQL connection from command line arguments
|
|
3. **Detect Changes**: Compares model definitions with actual database schema:
|
|
- New tables → `create_model`
|
|
- New fields → `ALTER TABLE ADD COLUMN`
|
|
- Field type changes → `ALTER TABLE MODIFY COLUMN`
|
|
- Removed fields → `ALTER TABLE DROP COLUMN` (only with `--drop`)
|
|
4. **Generate Migration**: Creates Python migration file with `migrate()` and `rollback()` functions
|
|
|
|
### Rollback Behavior
|
|
|
|
| Forward Operation | Rollback Operation |
|
|
|-------------------|--------------------|
|
|
| `CREATE TABLE` | `remove_model` |
|
|
| `ADD COLUMN` | `DROP COLUMN` |
|
|
| `MODIFY COLUMN` | `MODIFY COLUMN` (restore original type) |
|
|
| `DROP COLUMN` | `ADD COLUMN` (restore column definition; **data is lost**) |
|
|
|
|
> **Note**: Rolling back a `DROP COLUMN` will re-add the column structure, but the data that was in it cannot be recovered.
|