TTS Dataset Generator

A web application for collecting high-quality voice datasets with support for CSV upload and multi-line text input, multiple projects, RTL language support, and export to Amazon S3 and Hugging Face.

Quick Start

# Clone and setup
git clone https://github.com/Oddadmix/Voice-Dataset-Collection
cd Voice-Dataset-Collection

# Backend (SQLite for quick start)
cd backend
pip install -r requirements.txt
uvicorn main:app --reload

# Frontend (in new terminal)
cd frontend
npm install
npm run dev

Visit http://localhost:5173 to start creating voice datasets!

Features

📁 Multi-Project Support: Upload multiple CSV files, each as a separate project
🎤 Audio Recording: Record audio for each prompt with keyboard controls
🗂️ Project Management: Create, delete, and manage projects independently
📊 Progress Tracking: Track recording progress and resume from last position
🎵 Audio Playback: Play previous recordings within projects
☁️ Export Options: Export datasets to Amazon S3 or Hugging Face
⚙️ Settings Management: Configure storage paths and API credentials
🗄️ Database Management: Clear entire database when needed
🌐 RTL Language Support: Full support for Right-to-Left languages (Arabic, Persian)
📝 Flexible Input Methods: CSV upload or multi-line text input
🎯 Smart UI: RTL text display with English interface

Tech Stack

Frontend: React + TypeScript + Vite + Tailwind CSS
Backend: FastAPI + Python + SQLAlchemy
Database: MySQL (with SQLite fallback for development)
Storage: Local filesystem + Amazon S3 + Hugging Face Datasets

Prerequisites

Python 3.8+
Node.js 16+
MySQL 8.0+ (optional - SQLite fallback available)

Installation

1. Clone the Repository

git clone https://github.com/Oddadmix/Voice-Dataset-Collection
cd Voice-Dataset-Collection

2. Backend Setup

Install Python Dependencies

cd backend
pip install -r requirements.txt

Database Setup

The application supports both MySQL and SQLite:

Option A: MySQL (Recommended for Production)

# 1. Install MySQL (if not already installed)
# macOS: brew install mysql
# Ubuntu/Debian: sudo apt install mysql-server
# Windows: Download from https://dev.mysql.com/downloads/mysql/

# 2. Configure database settings
cp env.example .env
# Edit .env with your MySQL credentials

# 3. Start MySQL and setup database
python start_mysql.py

# 4. Start the application
uvicorn main:app --reload

Option B: SQLite (Development/Testing)

# The application will automatically fall back to SQLite if MySQL is not available
# No additional setup required
uvicorn main:app --reload

Environment Variables (Optional)

Create a .env file in the backend directory:

MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_USER=root
MYSQL_PASSWORD=your_password_here
MYSQL_DATABASE=tts_dataset_generator
STORAGE_PATH=recordings
HF_EXPORT_TIMEOUT=300
S3_EXPORT_TIMEOUT=300

3. Frontend Setup

cd frontend
npm install

Running the Application

1. Start the Backend Server

cd backend
uvicorn main:app --reload

The server will start at http://localhost:8000

2. Start the Frontend Development Server

cd frontend
npm run dev

The application will be available at http://localhost:5173 (or the next available port)

Usage

Creating a Project

Click "New Project" on the main page
Enter a project name
Choose input method:
- CSV Upload: Select a CSV file with prompts (one prompt per row)
- Multi-line Text: Type or paste prompts directly (one per line)
Optional: Check "Right-to-Left (RTL) Language" for Arabic, Persian, etc.
Click "Create Project"

RTL Language Support

When creating projects for RTL languages:

Check the "Right-to-Left (RTL) Language" checkbox
The text input area will display in RTL format
Prompts will be properly formatted in the recording interface
UI labels remain in English for consistency

Recording Audio

Navigate to a project
Use keyboard controls:
- Enter: Start/Stop recording
- Left Arrow: Skip to next prompt
- Right Arrow: Go to previous prompt
- Space: Play/Stop current recording

RTL Text Display

For RTL projects, prompts are automatically displayed with proper RTL formatting:

Text flows from right to left
Proper text alignment for Arabic, Persian, etc.
Maintains readability in the recording interface

Exporting Datasets

Hugging Face Export:
- Configure your Hugging Face token in Settings
- Set your repository name
- Click "Export to Hugging Face"
Amazon S3 Export:
- Configure your AWS credentials in Settings
- Set your S3 bucket name
- Click "Export to S3"

Database Schema

Tables

settings: Application configuration
projects: Project information, prompts, and RTL settings
prompts: Individual prompts with order and project association
recordings: Audio recordings metadata with prompt association
interactions: User interaction logs

Key Features

Project Isolation: Each project has its own recordings
Progress Tracking: Resume recording from last position
Metadata Storage: Recording timestamps and file information
Audit Trail: Log all user interactions
RTL Support: Projects can be marked as RTL for proper text display
Prompt Management: Prompts are stored separately with order preservation

Configuration

Storage Path

Configure where audio files are stored:

Default: recordings/ directory
Can be changed in Settings

Export Settings

Hugging Face: Token and repository configuration
Amazon S3: Bucket name and credentials
Timeouts: Configurable export timeouts

Troubleshooting

Database Issues

MySQL Connection Problems:

# Check if MySQL is running
# macOS
brew services list | grep mysql

# Linux
sudo systemctl status mysql

# Test MySQL connection
python start_mysql.py

Automatic Fallback to SQLite:

If MySQL is not available, the application automatically falls back to SQLite
This is perfect for development and testing
You'll see a message: "⚠️ MySQL connection failed, falling back to SQLite for development..."

Migration from SQLite to MySQL:

# If you have existing data in SQLite and want to migrate to MySQL
python migrate_sqlite_to_mysql.py

Port Conflicts

If ports are already in use:

# Kill processes on specific ports
lsof -ti:8000 | xargs kill -9  # Backend
lsof -ti:5173 | xargs kill -9  # Frontend (Vite default)
lsof -ti:5174 | xargs kill -9  # Frontend (Vite fallback)

Note: Vite automatically finds the next available port if 5173 is in use.

Permission Issues

Ensure proper file permissions:

chmod +x backend/setup_database.py
chmod +x backend/start_mysql.py
chmod +x backend/migrate_sqlite_to_mysql.py
mkdir -p recordings
chmod 755 recordings

RTL Implementation

The application includes comprehensive RTL language support:

Database: Projects have an is_rtl field to mark RTL languages
Frontend: Text inputs display in RTL format when RTL is selected
Recording Interface: Prompts are displayed with proper RTL styling
UI Consistency: Interface labels remain in English for consistency

Input Methods

Two flexible input methods are supported:

CSV Upload: Traditional CSV file upload with one prompt per row
Multi-line Text: Direct text input with one prompt per line
- Supports RTL text input when RTL checkbox is selected
- Real-time prompt counting
- Automatic empty line filtering

Adding New Features

Backend: Add new endpoints in main.py
Frontend: Create new components in src/components/
Database: Update models and run migrations

License

MIT

Contributing

Feel free to contirbute and open a PR

Voice-Dataset-Collection Voice-Dataset-Collection copied to clipboard

Metadata

TTS Dataset Generator

Quick Start

Features

Tech Stack

Prerequisites

Installation

1. Clone the Repository

2. Backend Setup

Install Python Dependencies

Database Setup

Environment Variables (Optional)

3. Frontend Setup

Running the Application

1. Start the Backend Server

2. Start the Frontend Development Server

Usage

Creating a Project

RTL Language Support

Recording Audio

RTL Text Display

Exporting Datasets

Database Schema

Tables

Key Features

Configuration

Storage Path

Export Settings

Troubleshooting

Database Issues

Port Conflicts

Permission Issues

RTL Implementation

Input Methods

Adding New Features

License

Contributing

← Metadata

Owner

Metadata

Voice-Dataset-Collection
Voice-Dataset-Collection copied to clipboard