AIGC Telegram Bot

AI-powered voice cover generation bot for Telegram. Upload a song or paste a YouTube link, pick a voice model, and get an AI-generated cover delivered directly in your chat.

Based on AICoverGen by SociallyIneptWeeb.

Features

YouTube & Audio Upload — Paste a YouTube URL or upload MP3/WAV/OGG directly
RVC Voice Models — Swap vocals using any RVC-trained voice model
MDX Vocal Separation — Isolate vocals from instrumentals using MDX-Net
Audio Effects — Reverb, compression, noise reduction, pitch shifting
Customizable Settings — Pitch, index rate, F0 method, reverb, output format, and more
3 Inference Modes — Full pipeline, RVC only, or MDX separation only
Admin Controls — User management, stats, and cleanup commands
Docker Ready — Full Docker + docker-compose deployment

Project Structure

aigc-telegram-bot/
├── src/
│   ├── __init__.py
│   ├── bot.py                # Telegram bot — commands, handlers, conversation flow
│   ├── config.py             # Configuration management (.env + dataclasses)
│   ├── state.py              # Thread-safe user session / state manager
│   ├── pipeline.py           # Async pipeline runner (ThreadPoolExecutor wrapper)
│   └── core/                 # AI Cover Generation pipeline
│       ├── __init__.py
│       ├── cover_pipeline.py # Main pipeline: download → separate → convert → mix
│       ├── mdx.py            # MDX-Net vocal separation (ONNX Runtime)
│       ├── rvc_voice.py      # RVC voice conversion (SawitProject/rvc)
│       └── my_utils.py       # Audio utility functions (ffmpeg wrapper)
├── assets/
│   └── mdxnet_models/        # Bundled MDX-Net ONNX models (~234 MB)
│       ├── UVR-MDX-NET-Voc_FT.onnx
│       ├── UVR_MDXNET_KARA_2.onnx
│       ├── Reverb_HQ_By_FoxJoy.onnx
│       ├── UVR-MDX-NET-Inst_HQ_4.onnx
│       └── model_data.json
├── data/
│   ├── models/               # Place RVC voice models here
│   ├── temp/                 # Temporary audio files (auto-cleaned)
│   └── output/               # Generated covers
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .env.example              # Configuration template
└── README.md

How It Works

The bot runs a multi-stage AI audio pipeline:

Input (YouTube URL / Audio File)
        │
        ▼
┌─────────────────────────┐
│  1. Download / Load     │  yt-dlp for YouTube, pydub for local files
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  2. MDX Vocal           │  UVR-MDX-NET-Voc_FT → vocals + instrumentals
│     Separation          │  UVR_MDXNET_KARA_2 → main + backup vocals
│                         │  Reverb_HQ_By_FoxJoy → de-reverb
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  3. RVC Voice           │  SawitProject/rvc — voice cloning
│     Conversion          │  Configurable pitch, index rate, F0 method
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  4. Audio Effects       │  Reverb, compression, HPF (pedalboard)
│     & Mixing            │  Mix AI vocals + backup + instrumentals
└───────────┬─────────────┘
            │
            ▼
    Output (MP3/WAV) → Sent to Telegram chat

Inference Modes

Mode	Description
Full	MDX separation → RVC conversion → effects → mix
RVC Only	Voice conversion only (skip vocal separation)
MDX Only	Vocal separation only (skip voice conversion)

Quick Start

1. Prerequisites

Python 3.10 or 3.11
FFmpeg (sudo apt install ffmpeg)
SoX (sudo apt install sox)
A Telegram Bot Token from @BotFather
At least one RVC voice model (.pth file + optional .index file)

2. Install

git clone https://huggingface.co/R-Kentaren/AIGC-Telegram-Bot
cd AIGC-Telegram-Bot

pip install -r requirements.txt

# Install RVC voice conversion engine
pip install git+https://github.com/SawitProject/rvc.git

3. Configure

cp .env.example .env

Edit .env and set at minimum:

BOT_TOKEN=your_telegram_bot_token_here
ADMIN_IDS=your_telegram_user_id

4. Add Voice Models

Place each RVC voice model in its own folder under data/models/:

data/models/
├── my_voice_model/
│   ├── model.pth          # Required: RVC model weights
│   └── model.index        # Optional: FAISS feature index
├── another_voice/
│   ├── model.pth
│   └── model.index
└── ...

5. Run

python -m src.bot

The bot will start polling for updates. Open Telegram and send /start to your bot.

Bot Commands

Command	Description
`/start`	Welcome message and quick actions
`/cover`	Start a new AI cover generation
`/models`	List available voice models
`/settings`	Adjust generation parameters
`/status`	Check bot status and active jobs
`/cancel`	Cancel the current operation
`/help`	Detailed usage guide
`/admin`	Admin commands (stats, cleanup)

Cover Generation Flow

/cover
  → Choose input: [YouTube URL] or [Upload File]
  → Paste URL or send audio file
  → Select voice model from list
  → Review settings & confirm
  → Wait for progress updates
  → Receive generated cover as audio file

Configuration Reference

All settings are managed through .env. Copy .env.example for the full list.

Bot Settings

Variable	Default	Description
`BOT_TOKEN`	(required)	Telegram bot token from @BotFather
`ADMIN_IDS`	(required)	Comma-separated admin Telegram user IDs
`ALLOWED_USER_IDS`	(empty = public)	Restrict bot to specific users
`BOT_NAME`	`AIGC Cover Bot`	Display name in bot messages
`MAX_CONCURRENT_JOBS`	`2`	Maximum simultaneous cover generations
`MAX_FILE_SIZE_MB`	`20`	Maximum uploaded audio file size
`LOG_LEVEL`	`INFO`	Logging verbosity (DEBUG/INFO/WARNING/ERROR)

Pipeline Defaults

Variable	Default	Description
`DEFAULT_OUTPUT_FORMAT`	`mp3`	Output format (mp3, wav, flac)
`DEFAULT_PITCH_CHANGE`	`0`	Pitch shift in octaves (-12 to +12)
`DEFAULT_INDEX_RATE`	`0.5`	RVC index rate (0.0–1.0)
`DEFAULT_F0_METHOD`	`rmvpe`	Pitch detection algorithm
`DEFAULT_FILTER_RADIUS`	`3`	Median filter for pitch (0–7)
`DEFAULT_PROTECT`	`0.33`	Voiceless consonant protection (0–0.5)
`DEFAULT_REVERB_SIZE`	`0.15`	Reverb room size (0–1)
`DEFAULT_REVERB_WET`	`0.2`	Reverb wet level (0–1)
`DEFAULT_INFERENCE_MODE`	`full`	Pipeline mode: full / mdx / rvc

Docker Deployment

Using Docker Compose (Recommended)

# Clone and configure
git clone https://huggingface.co/R-Kentaren/AIGC-Telegram-Bot
cd AIGC-Telegram-Bot
cp .env.example .env
# Edit .env with your BOT_TOKEN and ADMIN_IDS

# Add voice models
cp -r /path/to/your/models/* data/models/

# Start the bot
docker compose up -d

# View logs
docker compose logs -f

Using Docker Directly

docker build -t aigc-bot .
docker run -d \
  --name aigc-bot \
  --env BOT_TOKEN=your_token \
  --env ADMIN_IDS=your_id \
  -v ./data/models:/app/data/models \
  -v ./data/output:/app/data/output \
  aigc-bot

GPU Support

Uncomment the GPU section in docker-compose.yml if you have NVIDIA GPUs:

deploy:
  resources:
    reservations:
      devices:
        - capabilities: [gpu]

RVC Voice Models

Where to Get Models

AI Hub Discord — Community voice models
RVC Models Collection — HuggingFace
Train your own using RVC WebUI

Model Format

Each model is a folder containing:

your_model/
├── model.pth            # Required — trained RVC model weights
└── model.index          # Optional — FAISS index for timbre retrieval

Place them in data/models/<model_name>/.

Pitch Guidelines

Source → Target	Pitch Change
Male → Female	+1 (or +2 for deeper voices)
Female → Male	-1 (or -2 for higher voices)
Same gender, similar range	0
Octave up	+12
Octave down	-12

Adding Custom Voice Models via Telegram

Users can simply add new models by placing .pth (and optionally .index) files in data/models/<name>/ on the server. The bot automatically detects new models when /models is called.

Troubleshooting

Bot doesn't start

Verify BOT_TOKEN is set correctly in .env
Ensure python-telegram-bot is installed: pip install python-telegram-bot
Check logs for import errors

"No voice models available"

Ensure RVC model .pth files are in data/models/<model_name>/
Check file permissions

RVC inference fails

Install RVC: pip install git+https://github.com/SawitProject/rvc.git
If FP16 errors occur, set RVC_HALF_PRECISION=0 in your environment
Ensure you have enough RAM/VRAM (minimum 4 GB, recommended 8 GB+)

YouTube download fails

Ensure yt-dlp is up to date: pip install -U yt_dlp
Some videos may be region-restricted or age-gated
For YouTube Music, cookies may be needed (place in assets/config.txt)

OOM (Out of Memory)

Reduce MAX_CONCURRENT_JOBS to 1
Use shorter songs or lower quality settings
Enable CPU mode by not having CUDA installed

Tech Stack

Component	Technology
Bot Framework	python-telegram-bot 21.x
Vocal Separation	MDX-Net (ONNX Runtime)
Voice Conversion	SawitProject/rvc
Audio Effects	pedalboard (Spotify)
Audio Processing	librosa, soundfile, pydub, sox
YouTube Download	yt-dlp
Noise Reduction	noisereduce
ML Framework	PyTorch

License

This project is licensed under the MIT License — see LICENSE for details.

The underlying AI models (MDX-Net, RVC) have their own licenses. Please refer to their respective repositories for more information.

Credits

SociallyIneptWeeb/AICoverGen — Original AI Cover Generation pipeline
SawitProject/rvc — RVC voice conversion engine
openvpi/MDX-Net — Vocal separation models
Spotify/pedalboard — Audio effects

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support