2025-03-27 16:06:26 +08:00
# OpenAI Compatible API Server For StackFlow
2025-03-27 14:51:44 +08:00
## Overview
2025-03-27 16:06:26 +08:00
This server provides an OpenAI-compatible API interface supporting multiple AI model backends including LLMs, vision models, speech synthesis (TTS), and speech recognition (ASR).
2025-03-27 14:51:44 +08:00
2025-03-27 16:06:26 +08:00
## Quick Start
2025-03-27 14:51:44 +08:00
2025-03-27 16:06:26 +08:00
1. Install dependencies:
``` bash
pip install -r requirements.txt
```
2025-03-27 14:51:44 +08:00
2025-03-27 16:06:26 +08:00
3. Start the server:
``` bash
2025-04-16 10:51:33 +09:00
python3 api_server.py
2025-03-27 16:06:26 +08:00
```
## Supported Endpoints
### Chat Completions
- **Endpoint**: `POST /v1/chat/completions`
- **Request Format**: OpenAI-compatible chat completion request
- **Streaming**: Supported
### Text Completions
- **Endpoint**: `POST /v1/completions`
- **Request Format**: OpenAI-compatible completion request
- **Streaming**: Supported
### Speech Synthesis (TTS)
- **Endpoint**: `POST /v1/audio/speech`
- **Parameters**:
- `model` : TTS model name
- `input` : Text to synthesize
- `voice` : Voice type
- `response_format` : Audio format (mp3, wav, etc.)
### Speech Recognition (ASR)
- **Transcription**: `POST /v1/audio/transcriptions`
- Converts speech to text in the same language
- **Translation**: `POST /v1/audio/translations`
- Converts speech to English text
- **Parameters**:
- `file` : Audio file
- `model` : ASR model name
- `language` (transcription only): Source language
- `prompt` : Optional prompt
### List Models
- **Endpoint**: `GET /v1/models`
- **Returns**: List of available models
## FAQ
### Q: Why am I getting "Unsupported model" errors?
A: The model name must exactly match one of the configured models in your config file.
### Q: How do I enable streaming responses?
A: Set `"stream": true` in your request body for chat/completion endpoints.
### Q: What audio formats are supported for ASR?
A: The supported formats depend on your ASR backend implementation.
### Q: How do I manage model memory usage?
A: The server implements a pool system for LLM models - adjust `pool_size` in the config to control concurrent instances.
## Troubleshooting
- **Logs**: Check server logs for detailed error messages
- **Model Initialization**: Verify all required backend services are running
- **Configuration**: Double-check model names and parameters in config.yaml
## Example Requests
### Chat Completion
``` bash
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_KEY" \
-d '{
2025-04-08 18:07:54 +08:00
"model": "qwen2.5-0.5B-p256-ax630c",
2025-03-27 16:06:26 +08:00
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7
}'
```
### Speech Synthesis
``` bash
2025-04-16 11:33:20 +09:00
curl -X POST "http://localhost:8001/v1/audio/speech" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_KEY" \
-d '{
"model": "melotts_zh-cn",
"input": "Hello world!",
"voice": "alloy"
}' \
--output output.mp3
2025-03-27 16:06:26 +08:00
```
2025-03-27 14:51:44 +08:00
## Required Libraries:
- [StackFlow ](https://github.com/m5stack/StackFlow )
## License
- [M5Module-LLM_OpenAI_API- MIT ](LICENSE )