WeClone
WeClone copied to clipboard
Enhance Project Structure + Privacy Tools
Summary: This PR introduces a new modular toolkit that improves the maintainability, privacy, and developer experience of the WeClone project. It includes a CLI tool for project initialization and chat anonymization, along with structured logging for better observability.
✅ Features Added
-
🧱 Modular Project Structure Initialization
- Automatically sets up directories like
data_pipeline/,llm_training/,utils/, etc.
- Automatically sets up directories like
-
🔐 Chat Data Anonymizer
- Redacts personal info from chat logs (names, phone numbers, emails) before model training.
- Helps align with privacy best practices.
-
🧰 CLI Tool (
weclone_cli.py)-
--init: Sets up project structure -
--anonymize: Cleans a given chat file and saves an anonymized version
-
-
📋 Structured Logging
- Uses Python’s
loggingmodule to provide detailed logs for all actions.
- Uses Python’s
📂 Usage
Set up modular folders
python weclone_cli.py --init
🔍 Why This is Needed
Better organization enables faster onboarding and development. Privacy-aware preprocessing is essential for responsible AI.
- A single CLI interface lays the foundation for automating training, evaluation, and deployment.
🛠️ Next Steps (Optional Improvements)
- Convert to a Python package with
setup.py - Add unit tests and CI
- Extend CLI to support LLM fine-tuning
- Add support for voice cloning and emotion tagging