AIGC Telegram Bot

AI-powered voice cover generation bot for Telegram. Upload a song or paste a YouTube link, pick a voice model, and get an AI-generated cover delivered directly in your chat.

Based on AICoverGen by SociallyIneptWeeb.


Features

  • YouTube & Audio Upload β€” Paste a YouTube URL or upload MP3/WAV/OGG directly
  • RVC Voice Models β€” Swap vocals using any RVC-trained voice model
  • MDX Vocal Separation β€” Isolate vocals from instrumentals using MDX-Net
  • Audio Effects β€” Reverb, compression, noise reduction, pitch shifting
  • Customizable Settings β€” Pitch, index rate, F0 method, reverb, output format, and more
  • 3 Inference Modes β€” Full pipeline, RVC only, or MDX separation only
  • Admin Controls β€” User management, stats, and cleanup commands
  • Docker Ready β€” Full Docker + docker-compose deployment

Project Structure

aigc-telegram-bot/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ bot.py                # Telegram bot β€” commands, handlers, conversation flow
β”‚   β”œβ”€β”€ config.py             # Configuration management (.env + dataclasses)
β”‚   β”œβ”€β”€ state.py              # Thread-safe user session / state manager
β”‚   β”œβ”€β”€ pipeline.py           # Async pipeline runner (ThreadPoolExecutor wrapper)
β”‚   └── core/                 # AI Cover Generation pipeline
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ cover_pipeline.py # Main pipeline: download β†’ separate β†’ convert β†’ mix
β”‚       β”œβ”€β”€ mdx.py            # MDX-Net vocal separation (ONNX Runtime)
β”‚       β”œβ”€β”€ rvc_voice.py      # RVC voice conversion (SawitProject/rvc)
β”‚       └── my_utils.py       # Audio utility functions (ffmpeg wrapper)
β”œβ”€β”€ assets/
β”‚   └── mdxnet_models/        # Bundled MDX-Net ONNX models (~234 MB)
β”‚       β”œβ”€β”€ UVR-MDX-NET-Voc_FT.onnx
β”‚       β”œβ”€β”€ UVR_MDXNET_KARA_2.onnx
β”‚       β”œβ”€β”€ Reverb_HQ_By_FoxJoy.onnx
β”‚       β”œβ”€β”€ UVR-MDX-NET-Inst_HQ_4.onnx
β”‚       └── model_data.json
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ models/               # Place RVC voice models here
β”‚   β”œβ”€β”€ temp/                 # Temporary audio files (auto-cleaned)
β”‚   └── output/               # Generated covers
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example              # Configuration template
└── README.md

How It Works

The bot runs a multi-stage AI audio pipeline:

Input (YouTube URL / Audio File)
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. Download / Load     β”‚  yt-dlp for YouTube, pydub for local files
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  2. MDX Vocal           β”‚  UVR-MDX-NET-Voc_FT β†’ vocals + instrumentals
β”‚     Separation          β”‚  UVR_MDXNET_KARA_2 β†’ main + backup vocals
β”‚                         β”‚  Reverb_HQ_By_FoxJoy β†’ de-reverb
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  3. RVC Voice           β”‚  SawitProject/rvc β€” voice cloning
β”‚     Conversion          β”‚  Configurable pitch, index rate, F0 method
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  4. Audio Effects       β”‚  Reverb, compression, HPF (pedalboard)
β”‚     & Mixing            β”‚  Mix AI vocals + backup + instrumentals
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    Output (MP3/WAV) β†’ Sent to Telegram chat

Inference Modes

Mode Description
Full MDX separation β†’ RVC conversion β†’ effects β†’ mix
RVC Only Voice conversion only (skip vocal separation)
MDX Only Vocal separation only (skip voice conversion)

Quick Start

1. Prerequisites

  • Python 3.10 or 3.11
  • FFmpeg (sudo apt install ffmpeg)
  • SoX (sudo apt install sox)
  • A Telegram Bot Token from @BotFather
  • At least one RVC voice model (.pth file + optional .index file)

2. Install

git clone https://huggingface.co/R-Kentaren/AIGC-Telegram-Bot
cd AIGC-Telegram-Bot

pip install -r requirements.txt

# Install RVC voice conversion engine
pip install git+https://github.com/SawitProject/rvc.git

3. Configure

cp .env.example .env

Edit .env and set at minimum:

BOT_TOKEN=your_telegram_bot_token_here
ADMIN_IDS=your_telegram_user_id

4. Add Voice Models

Place each RVC voice model in its own folder under data/models/:

data/models/
β”œβ”€β”€ my_voice_model/
β”‚   β”œβ”€β”€ model.pth          # Required: RVC model weights
β”‚   └── model.index        # Optional: FAISS feature index
β”œβ”€β”€ another_voice/
β”‚   β”œβ”€β”€ model.pth
β”‚   └── model.index
└── ...

5. Run

python -m src.bot

The bot will start polling for updates. Open Telegram and send /start to your bot.


Bot Commands

Command Description
/start Welcome message and quick actions
/cover Start a new AI cover generation
/models List available voice models
/settings Adjust generation parameters
/status Check bot status and active jobs
/cancel Cancel the current operation
/help Detailed usage guide
/admin Admin commands (stats, cleanup)

Cover Generation Flow

/cover
  β†’ Choose input: [YouTube URL] or [Upload File]
  β†’ Paste URL or send audio file
  β†’ Select voice model from list
  β†’ Review settings & confirm
  β†’ Wait for progress updates
  β†’ Receive generated cover as audio file

Configuration Reference

All settings are managed through .env. Copy .env.example for the full list.

Bot Settings

Variable Default Description
BOT_TOKEN (required) Telegram bot token from @BotFather
ADMIN_IDS (required) Comma-separated admin Telegram user IDs
ALLOWED_USER_IDS (empty = public) Restrict bot to specific users
BOT_NAME AIGC Cover Bot Display name in bot messages
MAX_CONCURRENT_JOBS 2 Maximum simultaneous cover generations
MAX_FILE_SIZE_MB 20 Maximum uploaded audio file size
LOG_LEVEL INFO Logging verbosity (DEBUG/INFO/WARNING/ERROR)

Pipeline Defaults

Variable Default Description
DEFAULT_OUTPUT_FORMAT mp3 Output format (mp3, wav, flac)
DEFAULT_PITCH_CHANGE 0 Pitch shift in octaves (-12 to +12)
DEFAULT_INDEX_RATE 0.5 RVC index rate (0.0–1.0)
DEFAULT_F0_METHOD rmvpe Pitch detection algorithm
DEFAULT_FILTER_RADIUS 3 Median filter for pitch (0–7)
DEFAULT_PROTECT 0.33 Voiceless consonant protection (0–0.5)
DEFAULT_REVERB_SIZE 0.15 Reverb room size (0–1)
DEFAULT_REVERB_WET 0.2 Reverb wet level (0–1)
DEFAULT_INFERENCE_MODE full Pipeline mode: full / mdx / rvc

Docker Deployment

Using Docker Compose (Recommended)

# Clone and configure
git clone https://huggingface.co/R-Kentaren/AIGC-Telegram-Bot
cd AIGC-Telegram-Bot
cp .env.example .env
# Edit .env with your BOT_TOKEN and ADMIN_IDS

# Add voice models
cp -r /path/to/your/models/* data/models/

# Start the bot
docker compose up -d

# View logs
docker compose logs -f

Using Docker Directly

docker build -t aigc-bot .
docker run -d \
  --name aigc-bot \
  --env BOT_TOKEN=your_token \
  --env ADMIN_IDS=your_id \
  -v ./data/models:/app/data/models \
  -v ./data/output:/app/data/output \
  aigc-bot

GPU Support

Uncomment the GPU section in docker-compose.yml if you have NVIDIA GPUs:

deploy:
  resources:
    reservations:
      devices:
        - capabilities: [gpu]

RVC Voice Models

Where to Get Models

Model Format

Each model is a folder containing:

your_model/
β”œβ”€β”€ model.pth            # Required β€” trained RVC model weights
└── model.index          # Optional β€” FAISS index for timbre retrieval

Place them in data/models/<model_name>/.

Pitch Guidelines

Source β†’ Target Pitch Change
Male β†’ Female +1 (or +2 for deeper voices)
Female β†’ Male -1 (or -2 for higher voices)
Same gender, similar range 0
Octave up +12
Octave down -12

Adding Custom Voice Models via Telegram

Users can simply add new models by placing .pth (and optionally .index) files in data/models/<name>/ on the server. The bot automatically detects new models when /models is called.


Troubleshooting

Bot doesn't start

  • Verify BOT_TOKEN is set correctly in .env
  • Ensure python-telegram-bot is installed: pip install python-telegram-bot
  • Check logs for import errors

"No voice models available"

  • Ensure RVC model .pth files are in data/models/<model_name>/
  • Check file permissions

RVC inference fails

  • Install RVC: pip install git+https://github.com/SawitProject/rvc.git
  • If FP16 errors occur, set RVC_HALF_PRECISION=0 in your environment
  • Ensure you have enough RAM/VRAM (minimum 4 GB, recommended 8 GB+)

YouTube download fails

  • Ensure yt-dlp is up to date: pip install -U yt_dlp
  • Some videos may be region-restricted or age-gated
  • For YouTube Music, cookies may be needed (place in assets/config.txt)

OOM (Out of Memory)

  • Reduce MAX_CONCURRENT_JOBS to 1
  • Use shorter songs or lower quality settings
  • Enable CPU mode by not having CUDA installed

Tech Stack

Component Technology
Bot Framework python-telegram-bot 21.x
Vocal Separation MDX-Net (ONNX Runtime)
Voice Conversion SawitProject/rvc
Audio Effects pedalboard (Spotify)
Audio Processing librosa, soundfile, pydub, sox
YouTube Download yt-dlp
Noise Reduction noisereduce
ML Framework PyTorch

License

This project is licensed under the MIT License β€” see LICENSE for details.

The underlying AI models (MDX-Net, RVC) have their own licenses. Please refer to their respective repositories for more information.


Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support