AIGC Telegram Bot
AI-powered voice cover generation bot for Telegram. Upload a song or paste a YouTube link, pick a voice model, and get an AI-generated cover delivered directly in your chat.
Based on AICoverGen by SociallyIneptWeeb.
Features
- YouTube & Audio Upload β Paste a YouTube URL or upload MP3/WAV/OGG directly
- RVC Voice Models β Swap vocals using any RVC-trained voice model
- MDX Vocal Separation β Isolate vocals from instrumentals using MDX-Net
- Audio Effects β Reverb, compression, noise reduction, pitch shifting
- Customizable Settings β Pitch, index rate, F0 method, reverb, output format, and more
- 3 Inference Modes β Full pipeline, RVC only, or MDX separation only
- Admin Controls β User management, stats, and cleanup commands
- Docker Ready β Full Docker + docker-compose deployment
Project Structure
aigc-telegram-bot/
βββ src/
β βββ __init__.py
β βββ bot.py # Telegram bot β commands, handlers, conversation flow
β βββ config.py # Configuration management (.env + dataclasses)
β βββ state.py # Thread-safe user session / state manager
β βββ pipeline.py # Async pipeline runner (ThreadPoolExecutor wrapper)
β βββ core/ # AI Cover Generation pipeline
β βββ __init__.py
β βββ cover_pipeline.py # Main pipeline: download β separate β convert β mix
β βββ mdx.py # MDX-Net vocal separation (ONNX Runtime)
β βββ rvc_voice.py # RVC voice conversion (SawitProject/rvc)
β βββ my_utils.py # Audio utility functions (ffmpeg wrapper)
βββ assets/
β βββ mdxnet_models/ # Bundled MDX-Net ONNX models (~234 MB)
β βββ UVR-MDX-NET-Voc_FT.onnx
β βββ UVR_MDXNET_KARA_2.onnx
β βββ Reverb_HQ_By_FoxJoy.onnx
β βββ UVR-MDX-NET-Inst_HQ_4.onnx
β βββ model_data.json
βββ data/
β βββ models/ # Place RVC voice models here
β βββ temp/ # Temporary audio files (auto-cleaned)
β βββ output/ # Generated covers
βββ Dockerfile
βββ docker-compose.yml
βββ requirements.txt
βββ .env.example # Configuration template
βββ README.md
How It Works
The bot runs a multi-stage AI audio pipeline:
Input (YouTube URL / Audio File)
β
βΌ
βββββββββββββββββββββββββββ
β 1. Download / Load β yt-dlp for YouTube, pydub for local files
βββββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β 2. MDX Vocal β UVR-MDX-NET-Voc_FT β vocals + instrumentals
β Separation β UVR_MDXNET_KARA_2 β main + backup vocals
β β Reverb_HQ_By_FoxJoy β de-reverb
βββββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β 3. RVC Voice β SawitProject/rvc β voice cloning
β Conversion β Configurable pitch, index rate, F0 method
βββββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β 4. Audio Effects β Reverb, compression, HPF (pedalboard)
β & Mixing β Mix AI vocals + backup + instrumentals
βββββββββββββ¬ββββββββββββββ
β
βΌ
Output (MP3/WAV) β Sent to Telegram chat
Inference Modes
| Mode | Description |
|---|---|
| Full | MDX separation β RVC conversion β effects β mix |
| RVC Only | Voice conversion only (skip vocal separation) |
| MDX Only | Vocal separation only (skip voice conversion) |
Quick Start
1. Prerequisites
- Python 3.10 or 3.11
- FFmpeg (
sudo apt install ffmpeg) - SoX (
sudo apt install sox) - A Telegram Bot Token from @BotFather
- At least one RVC voice model (
.pthfile + optional.indexfile)
2. Install
git clone https://huggingface.co/R-Kentaren/AIGC-Telegram-Bot
cd AIGC-Telegram-Bot
pip install -r requirements.txt
# Install RVC voice conversion engine
pip install git+https://github.com/SawitProject/rvc.git
3. Configure
cp .env.example .env
Edit .env and set at minimum:
BOT_TOKEN=your_telegram_bot_token_here
ADMIN_IDS=your_telegram_user_id
4. Add Voice Models
Place each RVC voice model in its own folder under data/models/:
data/models/
βββ my_voice_model/
β βββ model.pth # Required: RVC model weights
β βββ model.index # Optional: FAISS feature index
βββ another_voice/
β βββ model.pth
β βββ model.index
βββ ...
5. Run
python -m src.bot
The bot will start polling for updates. Open Telegram and send /start to your bot.
Bot Commands
| Command | Description |
|---|---|
/start |
Welcome message and quick actions |
/cover |
Start a new AI cover generation |
/models |
List available voice models |
/settings |
Adjust generation parameters |
/status |
Check bot status and active jobs |
/cancel |
Cancel the current operation |
/help |
Detailed usage guide |
/admin |
Admin commands (stats, cleanup) |
Cover Generation Flow
/cover
β Choose input: [YouTube URL] or [Upload File]
β Paste URL or send audio file
β Select voice model from list
β Review settings & confirm
β Wait for progress updates
β Receive generated cover as audio file
Configuration Reference
All settings are managed through .env. Copy .env.example for the full list.
Bot Settings
| Variable | Default | Description |
|---|---|---|
BOT_TOKEN |
(required) | Telegram bot token from @BotFather |
ADMIN_IDS |
(required) | Comma-separated admin Telegram user IDs |
ALLOWED_USER_IDS |
(empty = public) | Restrict bot to specific users |
BOT_NAME |
AIGC Cover Bot |
Display name in bot messages |
MAX_CONCURRENT_JOBS |
2 |
Maximum simultaneous cover generations |
MAX_FILE_SIZE_MB |
20 |
Maximum uploaded audio file size |
LOG_LEVEL |
INFO |
Logging verbosity (DEBUG/INFO/WARNING/ERROR) |
Pipeline Defaults
| Variable | Default | Description |
|---|---|---|
DEFAULT_OUTPUT_FORMAT |
mp3 |
Output format (mp3, wav, flac) |
DEFAULT_PITCH_CHANGE |
0 |
Pitch shift in octaves (-12 to +12) |
DEFAULT_INDEX_RATE |
0.5 |
RVC index rate (0.0β1.0) |
DEFAULT_F0_METHOD |
rmvpe |
Pitch detection algorithm |
DEFAULT_FILTER_RADIUS |
3 |
Median filter for pitch (0β7) |
DEFAULT_PROTECT |
0.33 |
Voiceless consonant protection (0β0.5) |
DEFAULT_REVERB_SIZE |
0.15 |
Reverb room size (0β1) |
DEFAULT_REVERB_WET |
0.2 |
Reverb wet level (0β1) |
DEFAULT_INFERENCE_MODE |
full |
Pipeline mode: full / mdx / rvc |
Docker Deployment
Using Docker Compose (Recommended)
# Clone and configure
git clone https://huggingface.co/R-Kentaren/AIGC-Telegram-Bot
cd AIGC-Telegram-Bot
cp .env.example .env
# Edit .env with your BOT_TOKEN and ADMIN_IDS
# Add voice models
cp -r /path/to/your/models/* data/models/
# Start the bot
docker compose up -d
# View logs
docker compose logs -f
Using Docker Directly
docker build -t aigc-bot .
docker run -d \
--name aigc-bot \
--env BOT_TOKEN=your_token \
--env ADMIN_IDS=your_id \
-v ./data/models:/app/data/models \
-v ./data/output:/app/data/output \
aigc-bot
GPU Support
Uncomment the GPU section in docker-compose.yml if you have NVIDIA GPUs:
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
RVC Voice Models
Where to Get Models
- AI Hub Discord β Community voice models
- RVC Models Collection β HuggingFace
- Train your own using RVC WebUI
Model Format
Each model is a folder containing:
your_model/
βββ model.pth # Required β trained RVC model weights
βββ model.index # Optional β FAISS index for timbre retrieval
Place them in data/models/<model_name>/.
Pitch Guidelines
| Source β Target | Pitch Change |
|---|---|
| Male β Female | +1 (or +2 for deeper voices) |
| Female β Male | -1 (or -2 for higher voices) |
| Same gender, similar range | 0 |
| Octave up | +12 |
| Octave down | -12 |
Adding Custom Voice Models via Telegram
Users can simply add new models by placing .pth (and optionally .index) files in data/models/<name>/ on the server. The bot automatically detects new models when /models is called.
Troubleshooting
Bot doesn't start
- Verify
BOT_TOKENis set correctly in.env - Ensure
python-telegram-botis installed:pip install python-telegram-bot - Check logs for import errors
"No voice models available"
- Ensure RVC model
.pthfiles are indata/models/<model_name>/ - Check file permissions
RVC inference fails
- Install RVC:
pip install git+https://github.com/SawitProject/rvc.git - If FP16 errors occur, set
RVC_HALF_PRECISION=0in your environment - Ensure you have enough RAM/VRAM (minimum 4 GB, recommended 8 GB+)
YouTube download fails
- Ensure
yt-dlpis up to date:pip install -U yt_dlp - Some videos may be region-restricted or age-gated
- For YouTube Music, cookies may be needed (place in
assets/config.txt)
OOM (Out of Memory)
- Reduce
MAX_CONCURRENT_JOBSto1 - Use shorter songs or lower quality settings
- Enable CPU mode by not having CUDA installed
Tech Stack
| Component | Technology |
|---|---|
| Bot Framework | python-telegram-bot 21.x |
| Vocal Separation | MDX-Net (ONNX Runtime) |
| Voice Conversion | SawitProject/rvc |
| Audio Effects | pedalboard (Spotify) |
| Audio Processing | librosa, soundfile, pydub, sox |
| YouTube Download | yt-dlp |
| Noise Reduction | noisereduce |
| ML Framework | PyTorch |
License
This project is licensed under the MIT License β see LICENSE for details.
The underlying AI models (MDX-Net, RVC) have their own licenses. Please refer to their respective repositories for more information.
Credits
- SociallyIneptWeeb/AICoverGen β Original AI Cover Generation pipeline
- SawitProject/rvc β RVC voice conversion engine
- openvpi/MDX-Net β Vocal separation models
- Spotify/pedalboard β Audio effects