layra
layra copied to clipboard
LAYRA—an enterprise-ready, out-of-the-box solution—unlocks next-generation intelligent systems powered by visual RAG and limitless visual multi-step agent workflow orchestration.
📢 Click to Expand WeChat Groups
🚀 User Discussion Group 1
💡 Official WeChat Account
🚀 New Jina-Embeddings-v4 API support eliminates local GPU requirements
LAYRA is the world’s first “visual-native” AI automation engine. It sees documents like a human, preserves layout and graphical elements, and executes arbitrarily complex workflows with full Python control. From vision-driven Retrieval-Augmented Generation (RAG) to multi-step agent workflow orchestration, LAYRA empowers you to build next-generation intelligent systems—no limits, no compromises.
Built for Enterprise-Grade deployment, LAYRA features:
- 🧑💻 Modern Frontend: Built with Next.js 15 (TypeScript) & TailwindCSS 4.0 for a snappy, developer-friendly UI.
- ⚡ High-Performance Backend: FastAPI-powered with async integration for Redis, MySQL, MongoDB, Kafka & MinIO – engineered for high concurrency.
- 🔩 Decoupled Service Architecture: Independent services deployed in dedicated containers, enabling scaling on demand and fault isolation.
- 🎯 Visual-Native Multimodal Document Understanding: Leverages ColQwen 2.5/Jina-Embeddings-v4 to transform documents into semantic vectors stored in Milvus.
- 🚀 Powerful Workflow Engine: Construct complex, loop-nested, and debuggable workflows with full Python execution and human-in-the-loop capabilities.
📚 Table of Contents
- 🖼️ Screenshots
- 🚀 Quick Start
- 📖 Tutorial Guide
- ❓ Why LAYRA?
- ⚡️ Core Superpowers
- 🚀 Latest Updates
- 🧠 System Architecture
- 🧰 Tech Stack
- ⚙️ Deployment
- 📦 Roadmap
- 🤝 Contributing
- 📫 Contact
- 🌟 Star History
- 📄 License
🖼️ Screenshots
-
LAYRA's web design consistently adheres to a minimalist philosophy, making it more accessible to new users.
Explore LAYRA's powerful interface and capabilities through these visuals:
-
Homepage - Your Gateway to LAYRA

-
Knowledge Base - Centralized Document Hub

-
Interactive Dialogue - Layout-Preserving Answers

-
Workflow Builder - Drag-and-Drop Agent Creation

-
Workflow Builder - MCP Example

🚀 Quick Start
📋 Prerequisites
Before starting, ensure your system meets these requirements:
- Docker and Docker Compose installed
- NVIDIA Container Toolkit configured (Ignore if not deploying ColQwen locally)
⚙️ Installation Steps
1. Configure Environment Variables
# Clone the repository
git clone https://github.com/liweiphys/layra.git
cd layra
# Edit configuration file (modify server IP/parameters as needed)
vim .env
# Key configuration options include:
# - SERVER_IP (server IP)
# - MODEL_BASE_URL (model download source)
For Jina (cloud API) Embeddings v4 users:
vim .env
EMBEDDING_IMAGE_DPI=100 # DPI for document-to-image conversion. Recommended: 100 - 200 (12.5k - 50k tokens/img)
EMBEDDING_MODEL=jina_embeddings_v4
JINA_API_KEY=your_jina_api_key
JINA_EMBEDDINGS_V4_URL=https://api.jina.ai/v1/embeddings
2. Build and Start Service
Option A: Local ColQwen deployment (recommended for GPUs with >16GB VRAM)
# Initial startup will download ~15GB model weights (be patient)
docker compose up -d --build
# Monitor logs in real-time (replace <container_name> with actual name)
docker compose logs -f <container_name>
Option B: Jina-embeddings-v4 API service (for limited/no GPU resources)
# Initial startup will not download any model weights (fast!)
docker compose -f docker-compose-no-local-embedding.yml up -d --build
# Monitor logs in real-time (replace <container_name> with actual name)
docker compose logs -f <container_name>
Note: If you encounter issues with
docker compose, try usingdocker-compose(with the dash) instead. Also, ensure that you're using Docker Compose v2, as older versions may not support all features. You can check your version withdocker compose versionordocker-compose version.
🎉 Enjoy LAYRA!
Your deployment is complete! Start creating with Layra now. 🚀✨
For detailed options, see the Deployment section.
📘 Essential Learning: We strongly recommend spending just 60 minutes with the tutorial before starting with LAYRA - this small investment will help you master its full potential and unlock advanced capabilities.
📖 Tutorial Guide
For step-by-step instructions and visual guides, visit our tutorial on GitHub Pages:
Tutorial Guide
❓ Why LAYRA?
🚀 Beyond RAG: The Power of Visual-First Workflows
While LAYRA's Visual RAG Engine revolutionizes document understanding, its true power lies in the Agent Workflow Engine - a visual-native platform for building complex AI agents that see, reason, and act. Unlike traditional RAG/Workflow systems limited to retrieval, LAYRA enables full-stack automation through:
⚙️ Advanced Workflow Capabilities
-
🔄 Cyclic & Nested Structures
Build recursive workflows with loop nesting, conditional branching, and custom Python logic - no structural limitations. -
🐞 Node-Level Debugging
Inspect variables, pause/resume execution, and modify state mid-workflow with visual breakpoint debugging. -
👤 Human-in-the-Loop Integration
Inject user approvals at critical nodes for collaborative AI-human decision making. -
🧠 Chat Memory & MCP Integration
Maintain context across nodes with chat memory and access live information via Model Context Protocol (MCP). -
🐍 Full Python Execution
Run arbitrary Python code withpipinstalls, HTTP requests, and custom libraries in sandboxed environments. -
🎭 Multimodal I/O Orchestration
Process and generate hybrid text/image outputs across workflow stages.
🔍 Visual RAG: The Seeing Engine
Traditional RAG systems fail because they:
- ❌ Lose layout fidelity (columns, tables, hierarchy collapse)
- ❌ Struggle with non-text visuals (charts, diagrams, figures)
- ❌ Break semantic continuity due to poor OCR segmentation
LAYRA changes this with pure visual embeddings:
🔍 It sees each page as a whole - just like a human reader - preserving:
- ✅ Layout structure (headers, lists, sections)
- ✅ Tabular integrity (rows, columns, merged cells)
- ✅ Embedded visuals (plots, graphs, stamps, handwriting)
- ✅ Multi-modal consistency between layout and content
Together, these engines form the first complete visual-native agent platform - where AI doesn't just retrieve information, but executes complex vision-driven workflows end-to-end.
⚡️ Core Superpowers
🔥 The Agent Workflow Engine: Infinite Execution Intelligence
Code Without Limits, Build Without Boundaries Our Agent Workflow Engine thinks in LLM, sees in visuals, and builds your logic in Python — no limits, just intelligence.
-
🔄 Unlimited Workflow Creation
Design complex custom workflows without structural constraints. Handle unique business logic, branching, loops, and conditions through an intuitive interface. -
⚡ Real-Time Streaming Execution (SSE)
Observe execution results streamed live – eliminate waiting times entirely. -
👥 Human-in-the-Loop Integration
Integrate user input at critical decision points to review, adjust, or direct model reasoning. Enables collaborative AI workflows with dynamic human oversight. -
👁️ Visual-First Multimodal RAG
Features LAYRA’s proprietary pure visual embedding system, delivering lossless document understanding across 100+ formats (PDF, DOCX, XLSX, PPTX, etc.). The AI actively "sees" your content. -
🧠 Chat Memory & MCP Integration
- MCP Integration Access and interact with live, evolving information beyond native context windows – enhancing adaptability for long-term tasks.
- ChatFlow Memory Maintain contextual continuity through chat memory, enabling personalized interactions and intelligent workflow evolution.
-
🐍 Full-Stack Python Control
- Drive logic with arbitrary Python expressions – conditions, loops, and more
- Execute unrestricted Python code in nodes (HTTP, AI calls, math, etc.)
- Sandboxed environments with secure pip installs and persistent runtime snapshots
-
🎨 Flexible Multimodal I/O
Process and generate text, images, or hybrid outputs – ideal for cross-modal applications. -
🔧 Advanced Development Suite
- Breakpoint Debugging: Inspect workflow states mid-execution
- Reusable Components: Import/export workflows and save custom nodes
- Nested Logic: Construct deeply dynamic task chains with loops and conditionals
-
🧩 Intelligent Data Utilities
- Extract variables from LLM outputs
- Parse JSON dynamically
- Template rendering engine
Essential tools for advanced AI reasoning and automation.
👁️ Visual RAG Engine: Beyond Text, Beyond OCR
Forget tokenization. Forget layout loss.
With pure visual embeddings, LAYRA understands documents like a human — page by page, structure and all.
LAYRA uses next-generation Retrieval-Augmented Generation (RAG) technology powered by pure visual embeddings. It treats documents not as sequences of tokens but as visually structured artifacts — preserving layout, semantics, and graphical elements like tables, figures, and charts.
🚀 Latest Updates
(2025.8.4) ✨ Expanded Embedding Model Support:
- More Embedding Model Support:
colqwen(Local GPU - high performance)jina-embeddings-v4(Cloud API - zero GPU requirements)
- New Chinese language support
(2025.6.2) Workflow Engine Now Available:
- Breakpoint Debugging: Debug workflows interactively with pause/resume functionality.
- Unrestricted Python Customization: Execute arbitrary Python code, including external
pipdependency installation, HTTP requests viarequests, and advanced logic. - Nested Loops & Python-Powered Conditions: Build complex workflows with loop nesting and Python-based conditional logic.
- LLM Integration:
- Automatic JSON output parsing for structured responses.
- Persistent conversation memory across nodes.
- File uploads and knowledge-base retrieve with multi-modal RAG supporting 100+ formats (PDF, DOCX, XLSX, PPTX, etc.).
(2025.4.6) First Trial Version Now Available:
The first testable version of LAYRA has been released! Users can now upload PDF documents, ask questions, and receive layout-aware answers. We’re excited to see how this feature can help with real-world document understanding.
- Current Features:
- PDF batch upload and parsing functionality
- Visual-first retrieval-augmented generation (RAG) for querying document content
- Backend fully optimized for scalable data flow with FastAPI, Milvus, Redis, MongoDB, and MinIO
Stay tuned for future updates and feature releases!
🧠 System Architecture
LAYRA’s pipeline is designed for async-first, visual-native, and scalable document retrieval and generation.
🔍 Query Flow
The query goes through embedding → vector retrieval → anser generation:

📤 Upload & Indexing Flow
PDFs are parsed into images and embedded visually via ColQwen2.5/Jina-Embeddings-v4, with metadata and files stored in appropriate databases:

📤 Execute Workflow (Chatflow)
The workflow execution follows an event-driven, stateful debugging pattern with granular control:
🔄 Execution Flow
-
Trigger & Debug Control
- Web UI submits workflow with configurable breakpoints for real-time inspection
- Backend validates workflow DAG before executing codes
-
Asynchronous Orchestration
- Kafka checks predefined breakpoints and triggers pause notifications
- Scanner performs AST-based code analysis with vulnerability detection
-
Secure Execution
- Sandbox spins up ephemeral containers with file system isolation
- Runtime state snapshots persisted to Redis/MongoDB for recovery
-
Observability
- Execution metrics streamed via Server-Sent Events (SSE)
- Users inject test inputs/resume execution through debug consoles

🧰 Tech Stack
Frontend:
Next.js,TypeScript,TailwindCSS,Zustand,xyflow
Backend & Infrastructure:
FastAPI,Kafka,Redis,MySQL,MongoDB,MinIO,Milvus,Docker
Models & RAG:
- Embedding:
colqwen2.5-v0.2jina-embeddings-v4 - LLM Serving:
Qwen2.5-VL series (or any OpenAI-compatible model)LOCAL DEPLOYMENT NOTE
⚙️ Deployment
📋 Prerequisites
Before starting, ensure your system meets these requirements:
- Docker and Docker Compose installed
- NVIDIA Container Toolkit configured (Ignore if not deploying ColQwen locally)
⚙️ Installation Steps
1. Configure Environment Variables
# Clone the repository
git clone https://github.com/liweiphys/layra.git
cd layra
# Edit configuration file (modify server IP/parameters as needed)
vim .env
# Key configuration options include:
# - SERVER_IP (public server IP)
# - MODEL_BASE_URL (model download source)
For Jina (cloud API) Embeddings v4 users:
vim .env
EMBEDDING_IMAGE_DPI=100 # DPI for document-to-image conversion. Recommended: 100 - 200 (12.5k - 50k tokens/img)
EMBEDDING_MODEL=jina_embeddings_v4
JINA_API_KEY=your_jina_api_key
JINA_EMBEDDINGS_V4_URL=https://api.jina.ai/v1/embeddings
2. Build and Start Service
Option A: Local ColQwen deployment (recommended for GPUs with >16GB VRAM)
# Initial startup will download ~15GB model weights (be patient)
docker compose up -d --build
# Monitor logs in real-time (replace <container_name> with actual name)
docker compose logs -f <container_name>
Option B: Jina-embeddings-v4 API service (for limited/no GPU resources)
# Initial startup will download ~15GB model weights (be patient)
docker compose -f docker-compose-no-local-embedding.yml up -d --build
# Monitor logs in real-time (replace <container_name> with actual name)
docker compose logs -f <container_name>
Note: If you encounter issues with
docker compose, try usingdocker-compose(with the dash) instead. Also, ensure that you're using Docker Compose v2, as older versions may not support all features. You can check your version withdocker compose versionordocker-compose version.
🔧 Troubleshooting Tips
If services fail to start:
# Check container logs:
docker compose logs <container name>
Common fixes:
nvidia-smi # Verify GPU detection
docker compose down && docker compose up --build # preserve data to rebuild
docker compose down -v && docker compose up --build # ⚠️ Caution: delete all data to full rebuild
🛠️ Service Management Commands
Choose the operation you need:
| Scenario | Command | Effect |
|---|---|---|
| Stop services (preserve data) |
docker compose stop |
Stops containers but keeps them intact |
| Restart after stop | docker compose start |
Restarts stopped containers |
| Rebuild after code changes | docker compose up -d --build |
Rebuilds images and recreates containers |
| Recreate containers (preserve data) |
docker compose downdocker compose up -d |
Destroys then recreates containers |
| Full cleanup (delete all data) |
docker compose down -v |
⚠️ Destroys containers and deletes volumes |
⚠️ Important Notes
-
Initial model download may take significant time (~15GB). Monitor progress:
docker compose logs -f model-weights-init -
After modifying
.envor code, always rebuild:docker compose up -d --build -
Verify NVIDIA toolkit installation:
nvidia-container-toolkit --version -
For network issues:
- Manually download model weights
- Copy to Docker volume: (typically at)
/var/lib/docker/volumes/layra_model_weights/_data/ - Create empty
complete.layrafile in both:colqwen2.5-basefoldercolqwen2.5-v0.2folder
- 🚨 Critical: Verify downloaded weights integrity!
🔑 Key Details
-
docker compose down -vpermanently deletes databases and model weights -
After code/config changes, always use
--buildflag -
GPU requirements:
- Latest NVIDIA drivers
- Working
nvidia-container-toolkit
-
Monitoring tools:
# Container status docker compose ps -a # Resource usage docker stats
🧪 Technical Note: All components run exclusively via Docker containers.
🎉 Enjoy Your Deployment!
Now that everything is running smoothly, happy building with Layra! 🚀✨
▶️ Future Deployment Options
In the future, we will support multiple deployment methods including Kubernetes (K8s), and other environments. More details will be provided when these deployment options are available.
📦 Roadmap
Short-term:
- Add API Support (coming soon)
Long-term:
- Our evolving roadmap adapts to user needs and AI breakthroughs. New technologies and features will be deployed continuously.
🤝 Contributing
Contributions are welcome! Feel free to open an issue or pull request if you’d like to contribute.
We are in the process of creating a CONTRIBUTING.md file, which will provide guidelines for code contributions, issue reporting, and best practices. Stay tuned!
📫 Contact
liweiphys
📧 [email protected]
🐙 github.com/liweiphys/layra
📺 bilibili: Biggestbiaoge
🔍 Wechat Official Account:LAYRA 项目
💡 Wechat group: see below the title at the top
💼 Exploring Impactful Opportunities - Feel Free To Contact Me!
🌟 Star History
📄 License
This project is licensed under the Apache License 2.0. See the LICENSE file for more details.
Endlessly Customizable Agent Workflow Engine - Code Without Limits, Build Without Boundaries.