WeClone icon indicating copy to clipboard operation
WeClone copied to clipboard

Enhance Project Structure + Privacy Tools

Open elonmasai7 opened this issue 8 months ago • 0 comments

Summary: This PR introduces a new modular toolkit that improves the maintainability, privacy, and developer experience of the WeClone project. It includes a CLI tool for project initialization and chat anonymization, along with structured logging for better observability.


✅ Features Added

  • 🧱 Modular Project Structure Initialization

    • Automatically sets up directories like data_pipeline/, llm_training/, utils/, etc.
  • 🔐 Chat Data Anonymizer

    • Redacts personal info from chat logs (names, phone numbers, emails) before model training.
    • Helps align with privacy best practices.
  • 🧰 CLI Tool (weclone_cli.py)

    • --init: Sets up project structure
    • --anonymize: Cleans a given chat file and saves an anonymized version
  • 📋 Structured Logging

    • Uses Python’s logging module to provide detailed logs for all actions.

📂 Usage

Set up modular folders

python weclone_cli.py --init

🔍 Why This is Needed

Better organization enables faster onboarding and development. Privacy-aware preprocessing is essential for responsible AI.

  • A single CLI interface lays the foundation for automating training, evaluation, and deployment.

🛠️ Next Steps (Optional Improvements)

  • Convert to a Python package with setup.py
  • Add unit tests and CI
  • Extend CLI to support LLM fine-tuning
  • Add support for voice cloning and emotion tagging

elonmasai7 avatar May 14 '25 00:05 elonmasai7