deep-research icon indicating copy to clipboard operation
deep-research copied to clipboard

The Deep Research Assistant is meticulously crafted on Mastra's modular, scalable architecture, designed for intelligent orchestration and seamless human-AI interaction. It's built to tackle complex r...

Deep Research Assistant + Graph RAG with Mastra πŸš€πŸ§ 

Project Logo

Deep Research Assistant + Graph RAG with Mastra πŸš€πŸ§ 

CI Status License Release Stars
Node.js React Vite Tailwind CSS shadcn/ui TypeScript React Router Mastra Google AI LibSQL
Advanced RAG Multi-Index Vector Search Memory Processors Human-in-the-Loop MCP OpenTelemetry Google AI Gemini 2.5 LibSQL Dual Database Zod Validation

**Beyond Basic Search: Intelligent, Autonomous Research & Reporting**
An cutting-edge, human-in-the-loop AI system designed for advanced deep research, leveraging Mastra's powerful orchestration, agent, and network capabilities. Inspired by lotus wisdomβ€”embracing purity of intent, enlightenment through insight, and adaptive growthβ€”this project fosters ethical AI development, harmonious collaboration, and continuous evolution toward enlightened knowledge discovery.


Table of Contents 🧭
  1. Features
  2. Architecture
  3. AI Agents
  4. Workflows
  5. Tools
  6. Agent Networks
  7. RAG Pipeline
  8. Memory Processors
  9. Tracing & Observability
  10. MCP Integration
  11. Installation
  12. Quick Start
  13. Usage Examples
  14. Configuration
  15. API Reference
  16. Contributing
  17. License

Features ✨

πŸ€– Advanced AI Agents

  • Research Agent: Multi-phase research with web search, evaluation, and learning extraction
  • Report Agent: Comprehensive report generation from research data
  • Evaluation Agent: Content relevance and quality assessment
  • Learning Extraction Agent: Key insights and follow-up questions generation
  • Web Summarization Agent: Content condensation and synthesis
  • RAG Agent: Vector search and retrieval-augmented generation
  • GitHub Agent: Complete GitHub repository and project management with advanced Copilot integration
  • Monitor Agent: System health monitoring, performance tracking, and observability specialist
  • Planning Agent: Strategic planning and project management with resource allocation and risk assessment
  • Quality Assurance Agent: Software quality management, testing coordination, and defect prevention

🎨 Modern React Frontend

  • React 19.1+: Latest React with concurrent features and automatic batching
  • Vite 7.1+: Lightning-fast build tool with HMR and optimized production builds
  • TypeScript 5.9+: Full type safety with advanced language features
  • Tailwind CSS v4.1: CSS-first configuration with OKLCH colors, text shadows, and modern utilities
  • shadcn/ui Components: 47 pre-built, accessible UI components with Radix UI primitives
  • React Router v7.8+: Modern routing with nested routes and data loading
  • Responsive Design: Mobile-first approach with dark mode support
  • Modern CSS Features: Container queries, safe area utilities, and browser compatibility

πŸ”„ Intelligent Workflows

  • Comprehensive Research Workflow: End-to-end research process with human-in-the-loop approval
  • Generate Report Workflow: Automated report creation from research data
  • Research Workflow: Basic research orchestration
  • GitHub Planning Workflow: Multi-agent GitHub project management with planning, monitoring, and task generation
  • GitHub Quality Workflow: Quality-focused workflow composition with planning and QA analysis

πŸ› οΈ Specialized Tools

  • Web Search Tool: Intelligent web scraping and content extraction
  • Web Scraper Tool: Enhanced web scraping with marked.js integration for markdown output and file saving
  • Vector Query Tool: Semantic search over embedded content
  • Chunker Tool: Document segmentation for RAG
  • Rerank Tool: Result relevance optimization
  • Evaluation Tool: Content quality assessment
  • Learning Extraction Tool: Insight mining from research data
  • Data File Manager: 8 comprehensive data management tools (copy, move, search, archive, backup, etc.)

🌐 Agent Networks

  • Complex Research Network: Multi-agent collaboration for advanced research tasks
  • Dynamic agent communication and coordination

πŸ”Œ MCP Integration

  • Model Context Protocol: Standardized AI tool integration
  • Extensible tool ecosystem
  • Real-time communication capabilities

πŸ“Š Advanced Capabilities

  • Advanced RAG Pipeline: Multi-stage retrieval with vector search, chunking, and reranking
  • Graph RAG: Knowledge graph-based retrieval and reasoning
  • Vector Search: Semantic similarity search with multiple indexes
  • Memory Processors: 11 specialized processors for context optimization
  • Memory Management: Persistent conversation context with hierarchical memory
  • Human-in-the-Loop: Interactive approval and refinement
  • OpenTelemetry Tracing: Comprehensive observability with child spans
  • Zod Validation: Type-safe data processing
  • Google AI Gemini 2.5: Advanced features with search grounding and caching
  • LibSQL Storage: Dual-database architecture for research and vector data

Architecture πŸ—οΈ

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Deep Research System                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚   Agents    β”‚  β”‚  Workflows  β”‚  β”‚   Tools     β”‚  β”‚   Memory    β”‚ β”‚
β”‚  β”‚             β”‚  β”‚             β”‚  β”‚             β”‚  β”‚ Processors  β”‚ β”‚
β”‚  β”‚ β€’ Research  β”‚  β”‚ β€’ Research  β”‚  β”‚ β€’ Web       β”‚  β”‚ β€’ Token     β”‚ β”‚
β”‚  β”‚ β€’ Report    β”‚  β”‚ β€’ Report    β”‚  β”‚ β€’ Vector    β”‚  β”‚ β€’ Personal  β”‚ β”‚
β”‚  β”‚ β€’ Evaluationβ”‚  β”‚ β€’ Generate  β”‚  β”‚ β€’ Chunking  β”‚  β”‚ β€’ Error     β”‚ β”‚
β”‚  β”‚ β€’ Learning  β”‚  β”‚ β€’ Complex   β”‚  β”‚ β€’ Rerank    β”‚  β”‚ β€’ Circuit   β”‚ β”‚
β”‚  β”‚ β€’ RAG       β”‚  β”‚ β€’ GitHub    β”‚  β”‚ β€’ MCP       β”‚  β”‚ β€’ Bayesian  β”‚ β”‚
β”‚  β”‚ β€’ GitHub    β”‚  β”‚             β”‚  β”‚ β€’ GitHub    β”‚  β”‚             β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚   RAG Pipeline  β”‚  β”‚   Storage   β”‚  β”‚   Networks  β”‚  β”‚ Tracing β”‚ β”‚
β”‚  β”‚                 β”‚  β”‚             β”‚  β”‚             β”‚  β”‚         β”‚ β”‚
β”‚  β”‚ β€’ Vector Search β”‚  β”‚ β€’ LibSQL    β”‚  β”‚ β€’ Complex   β”‚  β”‚ β€’ Open  β”‚ β”‚
β”‚  β”‚ β€’ Chunking      β”‚  β”‚ β€’ Vector DB β”‚  β”‚ β€’ Research  β”‚  β”‚ β€’ Tel   β”‚ β”‚
β”‚  β”‚ β€’ Reranking     β”‚  β”‚ β€’ Dual DB   β”‚  β”‚ β€’ Multi-    β”‚  β”‚ β€’ Child β”‚ β”‚
β”‚  β”‚ β€’ Graph RAG     β”‚  β”‚ β€’ Graph DB  β”‚  β”‚ β€’ Agent     β”‚  β”‚ β€’ Spans β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚   AI/ML     β”‚  β”‚ Integration β”‚  β”‚   APIs      β”‚  β”‚   Caching   β”‚ β”‚
β”‚  β”‚             β”‚  β”‚             β”‚  β”‚             β”‚  β”‚             β”‚ β”‚
β”‚  β”‚ β€’ Google AI β”‚  β”‚ β€’ MCP       β”‚  β”‚ β€’ REST      β”‚  β”‚ β€’ Explicit  β”‚ β”‚
β”‚  β”‚ β€’ Gemini 2.5β”‚  β”‚ β€’ WebSocket β”‚  β”‚ β€’ GraphQL   β”‚  β”‚ β€’ Implicit  β”‚ β”‚
β”‚  β”‚ β€’ Embeddingsβ”‚  β”‚ β€’ OpenTel   β”‚  β”‚ β€’ GitHub    β”‚  β”‚ β€’ Search    β”‚ β”‚
β”‚  β”‚ β€’ Copilot   β”‚  β”‚ β€’ GitHub    β”‚  β”‚ β€’ Streaming β”‚  β”‚             β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

πŸ€– AI Agents

Specialized AI assistants with specific roles and capabilities:

  • Research Agent: Handles web research, evaluation, and learning extraction
  • Report Agent: Generates comprehensive reports from research data
  • Evaluation Agent: Assesses content relevance and quality
  • Learning Extraction Agent: Extracts key insights and generates follow-up questions
  • Web Summarization Agent: Condenses and synthesizes web content
  • RAG Agent: Manages vector search and retrieval-augmented generation
  • GitHub Agent: Complete GitHub repository and project management with Copilot integration
  • Monitor Agent: System health monitoring, performance tracking, and observability specialist
  • Planning Agent: Strategic planning and project management with resource allocation and risk assessment
  • Quality Assurance Agent: Software quality management, testing coordination, and defect prevention

🎨 Frontend Architecture

Modern React application with seamless Mastra backend integration:

Core Frontend Components:

  • App Router: React Router v7 with nested routes and data loading
  • Layout System: Responsive navigation with dark mode support
  • Page Components: Home, Research, Agents, and Workflows pages
  • UI Components: 47 shadcn/ui components with Radix UI primitives
  • Styling: Tailwind CSS v4.1 with custom design system
  • State Management: React hooks with TypeScript integration
  • API Integration: Mastra client for backend communication

Frontend-Backend Integration:

  • Mastra Client: TypeScript client for API communication
  • Environment Configuration: Secure API key management
  • Real-time Updates: Live workflow status and agent responses
  • Error Handling: Comprehensive error boundaries and user feedback
  • Loading States: Skeleton components and progressive loading

πŸ”„ Workflows

Orchestrated sequences of operations:

  • Comprehensive Research Workflow: Complete research pipeline with human approval
  • Generate Report Workflow: Automated report creation
  • Research Workflow: Basic research orchestration
  • GitHub Planning Workflow: Multi-agent GitHub project management with planning, monitoring, and task generation
  • GitHub Quality Workflow: Quality-focused workflow composition with planning and QA analysis

πŸ› οΈ Tools

Specialized utilities for specific tasks:

  • Web Search Tool: Intelligent web content discovery
  • Web Scraper Tool: Enhanced web scraping with marked.js integration for markdown output and file saving
  • Vector Query Tool: Semantic search capabilities
  • Chunker Tool: Document segmentation for RAG
  • Rerank Tool: Result optimization
  • Evaluation Tool: Content assessment
  • Learning Extraction Tool: Insight mining
  • Data File Manager: 8 comprehensive data management tools (copy, move, search, archive, backup, etc.)
  • GitHub Tools: 14 specialized tools for complete GitHub API integration

🌐 Networks

Multi-agent collaboration systems:

  • Complex Research Network: Advanced multi-agent research coordination

🧠 Memory Processors

Advanced context optimization with 11 specialized processors:

  • TokenLimiterProcessor: Filters messages exceeding token limits
  • PersonalizationProcessor: Boosts user-relevant messages
  • ErrorCorrectionProcessor: Deduplicates content via checksum
  • HierarchicalMemoryProcessor: Filters episodic vs semantic content
  • CitationExtractorProcessor: Prioritizes messages with citations
  • MultiPerspectiveProcessor: Scores messages from multiple viewpoints
  • TemporalReasoningProcessor: Handles time-based relationships
  • UncertaintyQuantificationProcessor: Assigns confidence scores
  • KnowledgeGraphProcessor: Builds knowledge graphs
  • BayesianBeliefProcessor: Implements Bayesian belief updating
  • CircuitBreakerProcessor: Provides fault tolerance

πŸ” RAG Pipeline

Multi-stage retrieval and generation system:

  • Vector Search: Semantic similarity search with multiple indexes
  • Chunking: Intelligent document segmentation with overlap
  • Reranking: Result relevance optimization
  • Graph RAG: Knowledge graph-based reasoning
  • Hybrid Search: Combines vector and keyword search
  • Context Augmentation: Dynamic context retrieval and ranking

πŸ“Š Tracing & Observability

Comprehensive monitoring with OpenTelemetry:

  • Child Spans: Detailed operation tracking
  • Performance Metrics: Processing time and resource usage
  • Error Tracking: Comprehensive error monitoring
  • Health Checks: System health monitoring
  • Cost Tracking: API usage and cost optimization

Agents πŸ€–

Research Agent

Conducts thorough multi-phase research with web search, content evaluation, and learning extraction.

Capabilities:

  • Multi-phase research process (initial + follow-up)
  • Web search integration
  • Content relevance evaluation
  • Learning extraction and synthesis
  • Follow-up question generation

Report Agent

Generates comprehensive, well-structured reports from research data.

Capabilities:

  • Research data synthesis
  • Structured report generation
  • Markdown formatting
  • Executive summaries
  • Detailed analysis sections

Evaluation Agent

Assesses content quality, relevance, and usefulness for research purposes.

Capabilities:

  • Relevance scoring
  • Content quality assessment
  • Source credibility evaluation
  • Recency analysis
  • Comprehensive evaluation metrics

Learning Extraction Agent

Extracts key insights, patterns, and follow-up questions from research content.

Capabilities:

  • Key learning identification
  • Pattern recognition
  • Follow-up question generation
  • Content categorization
  • Insight synthesis

Web Summarization Agent

Condenses lengthy web content into actionable summaries while preserving critical information.

Capabilities:

  • Content analysis and categorization
  • Intelligent summarization (80-95% reduction)
  • Key insight preservation
  • Source context maintenance
  • Token optimization

RAG Agent

Manages vector search and retrieval-augmented generation for enhanced context and responses.

Capabilities:

  • Vector embedding generation
  • Semantic similarity search
  • Context retrieval and ranking
  • Knowledge augmentation
  • Memory integration

GitHub Agent

Advanced AI-powered GitHub Assistant for complete repository and project management with Copilot integration.

Capabilities:

  • Repository Management: Create, list, update, and delete repositories
  • Issue Management: Create, update, list, and manage GitHub issues with comments
  • Pull Request Management: Handle PR creation, updates, merging, and reviews
  • Branch Management: Create, list, and delete repository branches
  • User & Organization Management: Search users, manage organization members
  • Advanced Copilot Integration: Delegate complex coding tasks to GitHub Copilot
  • GitHub API Integration: Full access to GitHub REST API via Octokit

Copilot Integration Features:

  • Task Delegation: Assign coding tasks to @github-copilot for automated implementation
  • Code Analysis: Request Copilot analysis and suggestions on pull requests
  • Automated PR Creation: Copilot generates code and creates pull requests automatically

Prerequisites:

  • GitHub Copilot Enterprise subscription for advanced features
  • Valid GITHUB_API_KEY environment variable
  • Appropriate repository permissions

Workflows πŸ”„

Comprehensive Research Workflow

End-to-end research pipeline with human-in-the-loop approval.

Process:

  1. Query Input: Interactive user query collection
  2. Web Research: Multi-phase research execution
  3. Content Evaluation: Relevance and quality assessment
  4. Learning Extraction: Key insights and follow-up questions
  5. Data Consolidation: Research data synthesis
  6. RAG Processing: Vector search and retrieval
  7. Content Synthesis: Final content generation
  8. Report Creation: Comprehensive report generation
  9. Human Approval: Interactive approval workflow

Generate Report Workflow

Automated report generation from research data.

Features:

  • Research data processing
  • Structured report formatting
  • Content organization
  • Executive summaries
  • Detailed analysis sections

Research Workflow

Basic research orchestration for simpler use cases.

Capabilities:

  • Query processing
  • Web search execution
  • Basic result evaluation
  • Simple report generation

Tools πŸ› οΈ

Web Search Tool

Intelligent web content discovery and extraction.

Features:

  • Advanced web scraping
  • Content relevance filtering
  • Source credibility assessment
  • Structured data extraction
  • Rate limiting and error handling

Vector Query Tool

Semantic search over embedded content.

Capabilities:

  • Vector similarity search
  • Content ranking and scoring
  • Multi-index support
  • Query optimization
  • Result filtering and sorting

Chunker Tool

Document segmentation for RAG applications.

Features:

  • Intelligent text segmentation
  • Overlap configuration
  • Metadata preservation
  • Format support (text, markdown, HTML)
  • Chunk size optimization

Rerank Tool

Result relevance optimization and ranking.

Capabilities:

  • Multi-criteria ranking
  • Semantic relevance scoring
  • Position-based weighting
  • Custom ranking algorithms
  • Performance optimization

Evaluation Tool

Content quality and relevance assessment.

Features:

  • Multi-dimensional evaluation
  • Scoring algorithms
  • Quality metrics
  • Relevance assessment
  • Automated evaluation pipelines

Learning Extraction Tool

Insight mining from research data.

Capabilities:

  • Pattern recognition
  • Key learning identification
  • Follow-up question generation
  • Content categorization
  • Insight synthesis and organization

GitHub Tools

Comprehensive suite of 14 specialized tools for complete GitHub API integration.

Repository Management Tools:

  • createRepository: Create new repositories with custom settings
  • getRepository: Retrieve detailed repository information
  • updateRepository: Modify repository settings and metadata
  • deleteRepository: Remove repositories (with confirmation)
  • listRepositories: List user repositories with filtering options

Branch Management Tools:

  • listBranches: List all branches in a repository
  • getBranch: Get detailed branch information
  • createBranch: Create new branches from existing commits
  • deleteBranch: Remove branches safely

Issue Management Tools:

  • createIssue: Create new issues with titles and descriptions
  • getIssue: Retrieve issue details and metadata
  • updateIssue: Modify issue status, title, and body
  • listIssues: List repository issues with filtering (open/closed/all)

Pull Request Management Tools:

  • createPullRequest: Create pull requests between branches
  • getPullRequest: Get detailed PR information
  • updatePullRequest: Modify PR title, body, and state
  • mergePullRequest: Merge PRs with various merge methods
  • listPullRequests: List PRs with status filtering

Additional Tools:

  • search: Advanced GitHub search across repositories
  • getUser: Retrieve user profile information
  • listOrganizations: List user organizations
  • createComment: Add comments to issues and PRs

All tools support:

  • Full GitHub API integration via Octokit
  • Comprehensive error handling and logging
  • Type-safe input validation with Zod schemas
  • Rate limiting and API quota management

Networks 🌐

Complex Research Network

Advanced multi-agent collaboration system for complex research tasks.

Features:

  • Multi-agent coordination
  • Dynamic task distribution
  • Inter-agent communication
  • Collaborative problem-solving
  • Network optimization
  • Scalable architecture

MCP Integration πŸ”Œ

Model Context Protocol

Standardized integration with AI tools and services.

Capabilities:

  • Tool discovery and registration
  • Real-time communication
  • Standardized interfaces
  • Extensible architecture
  • Cross-platform compatibility

MCP Server

Custom MCP server implementation for enhanced tool integration.

Features:

  • Tool registration and management
  • Real-time communication channels
  • Error handling and recovery
  • Performance monitoring
  • Security and authentication

GitHub Tools Integration:

  • All 14 GitHub tools available through MCP
  • Standardized tool interface for GitHub operations
  • Type-safe API interactions
  • Rate limiting and error handling
  • Full GitHub API coverage via Octokit

RAG Pipeline πŸ”

Advanced Retrieval-Augmented Generation System

The system implements a sophisticated multi-stage RAG pipeline with the following components:

Vector Search & Indexing

  • Multiple Vector Indexes: Separate indexes for research documents, web content, learnings, and reports
  • Semantic Similarity: Advanced embedding-based search with configurable similarity thresholds
  • Hybrid Search: Combines vector similarity with keyword-based search for optimal results
  • Index Management: Automatic index creation, optimization, and health monitoring

Document Processing Pipeline

  • Intelligent Chunking: Overlapping text segmentation with metadata preservation
  • Content Extraction: Advanced parsing for various document formats (PDF, HTML, markdown)
  • Metadata Enrichment: Automatic extraction of titles, authors, dates, and content categories
  • Quality Filtering: Content validation and relevance scoring

Retrieval & Ranking

  • Multi-stage Retrieval: Initial broad search followed by focused reranking
  • Context Augmentation: Dynamic context window expansion based on query complexity
  • Relevance Scoring: Multi-criteria ranking combining semantic similarity, recency, and authority
  • Deduplication: Intelligent removal of redundant information

Graph RAG Integration

  • Knowledge Graph Construction: Automatic relationship extraction and graph building
  • Graph-based Reasoning: Path finding and relationship traversal for complex queries
  • Entity Resolution: Named entity recognition and disambiguation
  • Temporal Reasoning: Time-based relationship analysis and event sequencing

Memory Processors 🧠

Advanced Context Optimization System

The system features 11 specialized memory processors designed to optimize context management and information retrieval:

Core Processing Categories

Content Optimization:

  • TokenLimiterProcessor: Prevents context overflow by filtering messages exceeding token limits
  • ErrorCorrectionProcessor: Eliminates duplicate content using checksum-based deduplication
  • CircuitBreakerProcessor: Provides fault tolerance and prevents cascading failures

Semantic Enhancement:

  • PersonalizationProcessor: Boosts relevance of user-specific content and preferences
  • HierarchicalMemoryProcessor: Distinguishes between episodic and semantic memory content
  • CitationExtractorProcessor: Prioritizes messages containing citations and references

Advanced Reasoning:

  • MultiPerspectiveProcessor: Evaluates content from multiple analytical viewpoints
  • TemporalReasoningProcessor: Handles time-based relationships and chronological ordering
  • UncertaintyQuantificationProcessor: Assigns confidence scores to information reliability

Knowledge Integration:

  • KnowledgeGraphProcessor: Constructs and maintains knowledge graphs from conversation data
  • BayesianBeliefProcessor: Implements probabilistic reasoning for belief updating

Performance Optimizations

  • WeakMap Caching: Efficient memory usage with garbage collection-friendly caching
  • Pre-compiled Regex: Optimized pattern matching for content analysis
  • SIMD-like Processing: Batch processing optimizations for token estimation
  • Lazy Evaluation: Deferred computation with memoization for expensive operations

Tracing & Observability πŸ“Š

Comprehensive Monitoring System

The system implements advanced observability using OpenTelemetry for complete system monitoring:

Tracing Infrastructure

  • OpenTelemetry Integration: Industry-standard tracing with vendor-neutral APIs
  • Child Spans: Detailed operation tracking with hierarchical span relationships
  • Distributed Tracing: End-to-end request tracking across all system components
  • Custom Span Attributes: Rich metadata for debugging and performance analysis

Performance Monitoring

  • Processing Time Tracking: Detailed timing for all operations and workflows
  • Resource Usage Metrics: Memory, CPU, and I/O utilization monitoring
  • API Call Tracking: Request/response monitoring with error classification
  • Cache Hit Analysis: Explicit and implicit caching performance metrics

Error Tracking & Diagnostics

  • Comprehensive Error Monitoring: Structured error logging with context
  • Exception Propagation: Error tracking through the entire call stack
  • Failure Pattern Analysis: Automated detection of recurring issues
  • Health Check Integration: System health monitoring with automated recovery

Cost Optimization

  • API Usage Tracking: Detailed cost analysis for external API calls
  • Cache Efficiency Metrics: Cost savings calculations for cached content
  • Resource Optimization: Identification of performance bottlenecks
  • Usage Analytics: Historical usage patterns and trend analysis

Installation πŸ“¦

Prerequisites

  • Node.js 20.9+
  • npm or pnpm
  • Google AI API key
  • Database URL (LibSQL)
  • GitHub Personal Access Token (for GitHub features)

Setup

# Clone the repository
git clone https://github.com/ssdeanx/deep-research.git
cd deep-research

# Install all dependencies (backend + frontend)
npm install

# Copy environment file
cp .env.example .env

# Configure environment variables
# Edit .env with your API keys and configuration

Frontend Setup

The project includes a modern React frontend built with Vite, TypeScript, and Tailwind CSS:

# Install shadcn/ui components (already included in package.json)
# All 47 UI components are pre-configured and ready to use

# Start the frontend development server
npm run dev:frontend

# Or start both backend and frontend together
npm run dev:full

# Build for production
npm run build:frontend

Backend Setup

The Mastra backend provides the AI orchestration layer:

# Start the Mastra development server
npm run dev

# Build for production
npm run build

# Start production server
npm run start

Environment Configuration

# Google AI Configuration
GOOGLE_GENERATIVE_AI_API_KEY=your_google_ai_api_key

# Database Configuration
DATABASE_URL=file:./mastra.db

# GitHub API Configuration
GITHUB_API_KEY=your_github_personal_access_token

# Optional: Additional API keys
EXA_API_KEY=your_exa_api_key

# Logging and Monitoring
LOG_LEVEL=info

GitHub API Setup

To use GitHub features, you need to configure a GitHub Personal Access Token:

  1. Create a Personal Access Token:

    • Go to GitHub Settings β†’ Developer settings β†’ Personal access tokens β†’ Tokens (classic)
    • Generate a new token with the following permissions:
      • repo (Full control of private repositories)
      • public_repo (Access public repositories)
      • read:org (Read org and team membership)
      • read:user (Read ALL user profile data)
      • read:project (Read project boards)
  2. Add to Environment:

    GITHUB_API_KEY=ghp_your_token_here
    
  3. Copilot Enterprise (Optional):

    • For advanced Copilot integration features
    • Requires GitHub Copilot Enterprise subscription
    • Enables automated task delegation and code generation

Quick Start πŸš€

Backend API Usage

import { mastra } from './src/mastra';

// Start comprehensive research workflow
const workflow = mastra.getWorkflow('comprehensive-research-workflow');
const run = await workflow.createRunAsync();

const result = await run.start({
  inputData: {}
});

// Handle workflow suspension for user input
if (result.status === 'suspended') {
  // Resume with user query
  const resumeResult = await run.resume({
    step: 'get-user-query',
    resumeData: { query: 'quantum computing advancements' }
  });
}

Frontend Usage

# Start the full-stack application
npm run dev:full

# Frontend will be available at http://localhost:4000
# Backend API will be available at http://localhost:4111

Frontend Features:

  • Interactive Research Interface: Start research workflows through the web UI
  • Real-time Workflow Monitoring: Track workflow progress and status updates
  • Agent Management Dashboard: View and interact with all available agents
  • Responsive Design: Works seamlessly on desktop and mobile devices
  • Dark Mode Support: Automatic theme switching based on system preferences

Using Individual Agents

// Research agent example
const researchAgent = mastra.getAgent('researchAgent');
const researchResult = await researchAgent.generate([
  { role: 'user', content: 'Research AI advancements in healthcare' }
]);

// Report generation
const reportAgent = mastra.getAgent('reportAgent');
const report = await reportAgent.generate([
  { role: 'user', content: `Generate report from: ${researchResult.text}` }
]);

MCP Integration

# Start MCP server
npm run mcp:server

# The server will be available for MCP-compatible clients

Usage Examples πŸ“š

Complete Research Pipeline

import { mastra } from './src/mastra';

async function conductResearch(query: string) {
  // Initialize workflow
  const workflow = mastra.getWorkflow('comprehensive-research-workflow');
  const run = await workflow.createRunAsync();

  // Start research process
  let result = await run.start({ inputData: {} });

  // Handle user query input
  if (result.status === 'suspended') {
    result = await run.resume({
      step: 'get-user-query',
      resumeData: { query }
    });
  }

  // Process through research phases
  // (additional resume calls for other suspension points)

  // Get final result
  if (result.status === 'success') {
    return result.output;
  }
}

Custom Agent Usage

// Custom research with specific parameters
const researchAgent = mastra.getAgent('researchAgent');

const result = await researchAgent.generate([
  {
    role: 'user',
    content: `Conduct deep research on: ${topic}
    Focus areas: ${focusAreas.join(', ')}
    Depth level: ${depth}
    Include sources from: ${sources.join(', ')}`
  }
], {
  // Custom generation options
  temperature: 0.7,
  maxTokens: 4000
});

Tool Integration

// Direct tool usage
const webSearchTool = mastra.getTool('webSearch');
const searchResults = await webSearchTool.execute({
  context: {
    query: 'artificial intelligence trends 2024',
    limit: 10
  }
});

// Vector search
const vectorTool = mastra.getTool('vectorQuery');
const vectorResults = await vectorTool.execute({
  context: {
    query: 'machine learning algorithms',
    topK: 5,
    indexName: 'research-data'
  }
});

GitHub Agent Usage

// Using the GitHub Agent for repository management
const githubAgent = mastra.getAgent('githubAgent');

// Create a new repository
const createResult = await githubAgent.generate([
  {
    role: 'user',
    content: 'Create a new repository called "my-awesome-project" with description "An awesome project" and make it private'
  }
]);

// List user repositories
const listResult = await githubAgent.generate([
  {
    role: 'user',
    content: 'List all my repositories'
  }
]);

// Create and manage issues
const issueResult = await githubAgent.generate([
  {
    role: 'user',
    content: 'Create a new issue in repository "my-org/my-repo" with title "Bug: Login not working" and description "Users cannot log in after recent update"'
  }
]);

GitHub Tools Direct Usage

// Direct GitHub tool usage
const createRepoTool = mastra.getTool('createRepository');
const repoResult = await createRepoTool.execute({
  context: {
    name: 'my-new-repo',
    description: 'A new repository created via API',
    private: true
  }
});

// List pull requests
const listPRsTool = mastra.getTool('listPullRequests');
const prsResult = await listPRsTool.execute({
  context: {
    owner: 'my-org',
    repo: 'my-repo',
    state: 'open'
  }
});

// Create a pull request
const createPRTool = mastra.getTool('createPullRequest');
const prResult = await createPRTool.execute({
  context: {
    owner: 'my-org',
    repo: 'my-repo',
    title: 'Add new feature',
    head: 'feature-branch',
    base: 'main',
    body: 'This PR adds a new feature...'
  }
});

Copilot Integration Examples

// Delegate a coding task to GitHub Copilot
const copilotResult = await githubAgent.generate([
  {
    role: 'user',
    content: 'Have Copilot implement a user authentication system for repository "my-org/my-app". The system should include login, logout, and password reset functionality.'
  }
]);

// Request Copilot analysis of a pull request
const analysisResult = await githubAgent.generate([
  {
    role: 'user',
    content: 'Ask Copilot to analyze PR #123 in repository "my-org/my-repo" and provide suggestions for improvement'
  }
]);

Configuration βš™οΈ

Agent Configuration

// Custom agent configuration
const customResearchAgent = new Agent({
  name: 'Custom Research Agent',
  instructions: 'Custom research instructions...',
  model: createGemini25Provider('gemini-2.5-flash-lite-preview-06-17', {
    responseModalities: ["TEXT"],
    thinkingConfig: {
      thinkingBudget: -1,
      includeThoughts: false,
    },
    useSearchGrounding: true,
    dynamicRetrieval: true,
    safetyLevel: 'OFF',
    structuredOutputs: true,
  }),
  tools: [webSearchTool, evaluateResultTool],
  memory: createResearchMemory(),
});

Workflow Configuration

// Custom workflow configuration
const customWorkflow = createWorkflow({
  id: 'custom-research-workflow',
  inputSchema: z.object({
    query: z.string(),
    depth: z.enum(['shallow', 'medium', 'deep'])
  }),
  outputSchema: z.object({
    results: z.array(z.any()),
    summary: z.string()
  })
})
.then(searchStep)
.then(analyzeStep)
.commit();

Storage Configuration

// Advanced LibSQL storage with dual databases
import { createLibSQLStore } from './src/mastra/config/libsql-storage';

const storage = createLibSQLStore({
  tracingContext: {
    context: {},
    runtimeContext: {
      userId: 'user123',
      sessionId: 'session456'
    }
  }
});

// Vector search configuration
const vectorSearch = {
  researchDocuments: {
    indexName: 'research_documents',
    dimensions: 1536,
    metric: 'cosine'
  },
  webContent: {
    indexName: 'web_content',
    dimensions: 1536,
    metric: 'cosine'
  },
  learnings: {
    indexName: 'learnings',
    dimensions: 1536,
    metric: 'cosine'
  }
};

Memory Processor Configuration

// Advanced memory configuration with specialized processors
import { createMemoryWithProcessors } from './src/mastra/config/memory-processors';

const memory = createMemoryWithProcessors({
  storage,
  processors: [
    new TokenLimiterProcessor({ maxTokens: 4000 }),
    new PersonalizationProcessor({ userId: 'user123' }),
    new ErrorCorrectionProcessor({ enableChecksum: true }),
    new CircuitBreakerProcessor({ failureThreshold: 5 }),
    new BayesianBeliefProcessor({ confidenceThreshold: 0.8 }),
    new KnowledgeGraphProcessor({ enableRelationships: true })
  ],
  options: {
    lastMessages: 100,
    semanticRecall: {
      topK: 10,
      threshold: 0.7
    }
  }
});

Google AI Configuration

// Advanced Google AI configuration with Gemini 2.5 features
import { createMastraGoogleProvider } from './src/mastra/config/googleProvider';

const googleModel = createMastraGoogleProvider('gemini-2.5-flash-lite-preview-06-17', {
  useSearchGrounding: true,
  dynamicRetrieval: true,
  cachedContent: 'cached-research-context',
  safetyLevel: 'MODERATE',
  structuredOutputs: true
});

// Explicit caching setup
const cacheManager = createCacheManager();
const cachedModel = await createCachedGoogleModel('gemini-2.5-flash-preview-05-20', {
  cacheManager,
  cacheContents: [{ role: 'user', parts: [{ text: 'Research context...' }] }],
  cacheTtlSeconds: 300
});

API Reference πŸ“–

Core Classes

Mastra

Main application class managing agents, workflows, and tools.

Methods:

  • getAgent(name: string): Agent
  • getWorkflow(name: string): Workflow
  • getTool(name: string): Tool
  • getStorage(): Storage

Agent

AI agent with specific capabilities and tools.

Methods:

  • generate(messages: Message[], options?: GenerateOptions): Promise<GenerateResult>
  • stream(messages: Message[], options?: StreamOptions): Promise<StreamResult>

Workflow

Orchestrated sequence of operations.

Methods:

  • createRunAsync(): Promise<WorkflowRun>
  • start(inputData: any): Promise<WorkflowResult>
  • resume(step: string, resumeData: any): Promise<WorkflowResult>

Tool Interface

Standardized tool interface for extensibility.

Methods:

  • execute(context: ToolContext): Promise<ToolResult>

Contributing 🀝

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Fork and clone
git clone https://github.com/your-username/deep-research.git
cd deep-research

# Install dependencies
npm install

# Set up development environment
cp .env.example .env
# Configure your environment variables

# Run tests
npm test

# Start development server
npm run dev

Code Standards

  • TypeScript for type safety
  • ESLint for code quality
  • Prettier for code formatting
  • Comprehensive test coverage
  • Clear documentation

License πŸ“„

This project is licensed under the ISC License - see the LICENSE file for details.


Acknowledgments πŸ™

  • Mastra Framework: For the powerful AI orchestration capabilities
  • Google AI: For the Gemini models and AI services
  • LibSQL: For the high-performance database solution
  • OpenTelemetry: For observability and tracing
  • Zod: For runtime type validation

Built with ❀️ using Mastra, Google AI, and modern web technologies. Inspired by the pursuit of enlightened knowledge discovery through ethical AI development.

Codacy Badge wakatime