mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-05-28 11:43:06 +08:00
### What problem does this PR solve?
This PR implement TTS for FishAudio and MiniMax provider and ASR for
FishAudio
**The following functionalities are now supported:**
**FishAudio:**
- [x] Text To Speech
- [x] Stream Text To Speech
- [x] Audio To Text
**OpenRouter:**
- [x] Text To Speech
**Verified examples from the CLI:**
```plaintext
**FishAudio**
RAGFlow(user)> tts with 's1@test@fishaudio' text 'He who desires but acts not, breeds pestilence.' play format 'wav' save './internal' param '{"reference_id": "90e65eaaf50e4470b8e6d43ee6afd7d5", "temperature": 0.7, "top_p": 0.7, "prosody": {"speed": 1, "volume": 0, "normalize_loudness": true}, "chunk_length": 300, "normalize": true, "sample_rate": 44100, "mp3_bitrate": 128, "latency": "normal", "max_new_tokens": 1024, "repetition_penalty": 1.2, "min_chunk_length": 50, "condition_on_previous_chunks": true, "early_stop_threshold": 1}'
Saved to directory: /home/infiniflow/Documents/development/ragflow/internal/s1_output.wav
SUCCESS
RAGFlow(user)> stream tts with 's1@test@fishaudio' text 'He who desires but acts not, breeds pestilence.' play format 'wav' save './internal' param '{"reference_id": "90e65eaaf50e4470b8e6d43ee6afd7d5", "temperature": 0.7, "top_p": 0.7, "prosody": {"speed": 1, "volume": 0, "normalize_loudness": true}, "chunk_length": 300, "normalize": true, "sample_rate": 44100, "mp3_bitrate": 128, "latency": "normal", "max_new_tokens": 1024, "repetition_penalty": 1.2, "min_chunk_length": 50, "condition_on_previous_chunks": true, "early_stop_threshold": 1}'
Saved to directory: /home/infiniflow/Documents/development/ragflow/internal/s1_output.wav
SUCCESS
RAGFlow(user)> asr with 'transcribe-1@test@fishaudio' audio './internal/test.wav' param '{"language": "en", "ignore_timestamps": true}'
+----------------------------------------------------------------------------------------------------------------------+
| text |
+----------------------------------------------------------------------------------------------------------------------+
| The examination and testimony of the experts enabled the commission to conclude that five shots may have been fired. |
+----------------------------------------------------------------------------------------------------------------------+
```
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
36 lines
567 B
JSON
36 lines
567 B
JSON
{
|
|
"name": "FishAudio",
|
|
"url": {
|
|
"default": "https://api.fish.audio"
|
|
},
|
|
"url_suffix": {
|
|
"models": "model",
|
|
"balance": "self/package",
|
|
"tts": "v1/tts",
|
|
"asr": "v1/asr"
|
|
},
|
|
"class": "fishaudio",
|
|
"models": [
|
|
{
|
|
"name": "s2-pro",
|
|
"max_tokens": 8192,
|
|
"model_types": [
|
|
"tts"
|
|
]
|
|
},
|
|
{
|
|
"name": "s1",
|
|
"max_tokens": 8192,
|
|
"model_types": [
|
|
"tts"
|
|
]
|
|
},
|
|
{
|
|
"name": "transcribe-1",
|
|
"max_tokens": 8192,
|
|
"model_types": [
|
|
"asr"
|
|
]
|
|
}
|
|
]
|
|
} |