deep-research
deep-research copied to clipboard
The Deep Research Assistant is meticulously crafted on Mastra's modular, scalable architecture, designed for intelligent orchestration and seamless human-AI interaction. It's built to tackle complex r...
Deep Research Assistant + Graph RAG with Mastra ππ§
**Beyond Basic Search: Intelligent, Autonomous Research & Reporting**
An cutting-edge, human-in-the-loop AI system designed for advanced deep research, leveraging Mastra's powerful orchestration, agent, and network capabilities. Inspired by lotus wisdomβembracing purity of intent, enlightenment through insight, and adaptive growthβthis project fosters ethical AI development, harmonious collaboration, and continuous evolution toward enlightened knowledge discovery.
Table of Contents π§
- Features
- Architecture
- AI Agents
- Workflows
- Tools
- Agent Networks
- RAG Pipeline
- Memory Processors
- Tracing & Observability
- MCP Integration
- Installation
- Quick Start
- Usage Examples
- Configuration
- API Reference
- Contributing
- License
Features β¨
π€ Advanced AI Agents
- Research Agent: Multi-phase research with web search, evaluation, and learning extraction
- Report Agent: Comprehensive report generation from research data
- Evaluation Agent: Content relevance and quality assessment
- Learning Extraction Agent: Key insights and follow-up questions generation
- Web Summarization Agent: Content condensation and synthesis
- RAG Agent: Vector search and retrieval-augmented generation
- GitHub Agent: Complete GitHub repository and project management with advanced Copilot integration
- Monitor Agent: System health monitoring, performance tracking, and observability specialist
- Planning Agent: Strategic planning and project management with resource allocation and risk assessment
- Quality Assurance Agent: Software quality management, testing coordination, and defect prevention
π¨ Modern React Frontend
- React 19.1+: Latest React with concurrent features and automatic batching
- Vite 7.1+: Lightning-fast build tool with HMR and optimized production builds
- TypeScript 5.9+: Full type safety with advanced language features
- Tailwind CSS v4.1: CSS-first configuration with OKLCH colors, text shadows, and modern utilities
- shadcn/ui Components: 47 pre-built, accessible UI components with Radix UI primitives
- React Router v7.8+: Modern routing with nested routes and data loading
- Responsive Design: Mobile-first approach with dark mode support
- Modern CSS Features: Container queries, safe area utilities, and browser compatibility
π Intelligent Workflows
- Comprehensive Research Workflow: End-to-end research process with human-in-the-loop approval
- Generate Report Workflow: Automated report creation from research data
- Research Workflow: Basic research orchestration
- GitHub Planning Workflow: Multi-agent GitHub project management with planning, monitoring, and task generation
- GitHub Quality Workflow: Quality-focused workflow composition with planning and QA analysis
π οΈ Specialized Tools
- Web Search Tool: Intelligent web scraping and content extraction
- Web Scraper Tool: Enhanced web scraping with marked.js integration for markdown output and file saving
- Vector Query Tool: Semantic search over embedded content
- Chunker Tool: Document segmentation for RAG
- Rerank Tool: Result relevance optimization
- Evaluation Tool: Content quality assessment
- Learning Extraction Tool: Insight mining from research data
- Data File Manager: 8 comprehensive data management tools (copy, move, search, archive, backup, etc.)
π Agent Networks
- Complex Research Network: Multi-agent collaboration for advanced research tasks
- Dynamic agent communication and coordination
π MCP Integration
- Model Context Protocol: Standardized AI tool integration
- Extensible tool ecosystem
- Real-time communication capabilities
π Advanced Capabilities
- Advanced RAG Pipeline: Multi-stage retrieval with vector search, chunking, and reranking
- Graph RAG: Knowledge graph-based retrieval and reasoning
- Vector Search: Semantic similarity search with multiple indexes
- Memory Processors: 11 specialized processors for context optimization
- Memory Management: Persistent conversation context with hierarchical memory
- Human-in-the-Loop: Interactive approval and refinement
- OpenTelemetry Tracing: Comprehensive observability with child spans
- Zod Validation: Type-safe data processing
- Google AI Gemini 2.5: Advanced features with search grounding and caching
- LibSQL Storage: Dual-database architecture for research and vector data
Architecture ποΈ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Deep Research System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Agents β β Workflows β β Tools β β Memory β β
β β β β β β β β Processors β β
β β β’ Research β β β’ Research β β β’ Web β β β’ Token β β
β β β’ Report β β β’ Report β β β’ Vector β β β’ Personal β β
β β β’ Evaluationβ β β’ Generate β β β’ Chunking β β β’ Error β β
β β β’ Learning β β β’ Complex β β β’ Rerank β β β’ Circuit β β
β β β’ RAG β β β’ GitHub β β β’ MCP β β β’ Bayesian β β
β β β’ GitHub β β β β β’ GitHub β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββ β
β β RAG Pipeline β β Storage β β Networks β β Tracing β β
β β β β β β β β β β
β β β’ Vector Search β β β’ LibSQL β β β’ Complex β β β’ Open β β
β β β’ Chunking β β β’ Vector DB β β β’ Research β β β’ Tel β β
β β β’ Reranking β β β’ Dual DB β β β’ Multi- β β β’ Child β β
β β β’ Graph RAG β β β’ Graph DB β β β’ Agent β β β’ Spans β β
β βββββββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β AI/ML β β Integration β β APIs β β Caching β β
β β β β β β β β β β
β β β’ Google AI β β β’ MCP β β β’ REST β β β’ Explicit β β
β β β’ Gemini 2.5β β β’ WebSocket β β β’ GraphQL β β β’ Implicit β β
β β β’ Embeddingsβ β β’ OpenTel β β β’ GitHub β β β’ Search β β
β β β’ Copilot β β β’ GitHub β β β’ Streaming β β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Components
π€ AI Agents
Specialized AI assistants with specific roles and capabilities:
- Research Agent: Handles web research, evaluation, and learning extraction
- Report Agent: Generates comprehensive reports from research data
- Evaluation Agent: Assesses content relevance and quality
- Learning Extraction Agent: Extracts key insights and generates follow-up questions
- Web Summarization Agent: Condenses and synthesizes web content
- RAG Agent: Manages vector search and retrieval-augmented generation
- GitHub Agent: Complete GitHub repository and project management with Copilot integration
- Monitor Agent: System health monitoring, performance tracking, and observability specialist
- Planning Agent: Strategic planning and project management with resource allocation and risk assessment
- Quality Assurance Agent: Software quality management, testing coordination, and defect prevention
π¨ Frontend Architecture
Modern React application with seamless Mastra backend integration:
Core Frontend Components:
- App Router: React Router v7 with nested routes and data loading
- Layout System: Responsive navigation with dark mode support
- Page Components: Home, Research, Agents, and Workflows pages
- UI Components: 47 shadcn/ui components with Radix UI primitives
- Styling: Tailwind CSS v4.1 with custom design system
- State Management: React hooks with TypeScript integration
- API Integration: Mastra client for backend communication
Frontend-Backend Integration:
- Mastra Client: TypeScript client for API communication
- Environment Configuration: Secure API key management
- Real-time Updates: Live workflow status and agent responses
- Error Handling: Comprehensive error boundaries and user feedback
- Loading States: Skeleton components and progressive loading
π Workflows
Orchestrated sequences of operations:
- Comprehensive Research Workflow: Complete research pipeline with human approval
- Generate Report Workflow: Automated report creation
- Research Workflow: Basic research orchestration
- GitHub Planning Workflow: Multi-agent GitHub project management with planning, monitoring, and task generation
- GitHub Quality Workflow: Quality-focused workflow composition with planning and QA analysis
π οΈ Tools
Specialized utilities for specific tasks:
- Web Search Tool: Intelligent web content discovery
- Web Scraper Tool: Enhanced web scraping with marked.js integration for markdown output and file saving
- Vector Query Tool: Semantic search capabilities
- Chunker Tool: Document segmentation for RAG
- Rerank Tool: Result optimization
- Evaluation Tool: Content assessment
- Learning Extraction Tool: Insight mining
- Data File Manager: 8 comprehensive data management tools (copy, move, search, archive, backup, etc.)
- GitHub Tools: 14 specialized tools for complete GitHub API integration
π Networks
Multi-agent collaboration systems:
- Complex Research Network: Advanced multi-agent research coordination
π§ Memory Processors
Advanced context optimization with 11 specialized processors:
- TokenLimiterProcessor: Filters messages exceeding token limits
- PersonalizationProcessor: Boosts user-relevant messages
- ErrorCorrectionProcessor: Deduplicates content via checksum
- HierarchicalMemoryProcessor: Filters episodic vs semantic content
- CitationExtractorProcessor: Prioritizes messages with citations
- MultiPerspectiveProcessor: Scores messages from multiple viewpoints
- TemporalReasoningProcessor: Handles time-based relationships
- UncertaintyQuantificationProcessor: Assigns confidence scores
- KnowledgeGraphProcessor: Builds knowledge graphs
- BayesianBeliefProcessor: Implements Bayesian belief updating
- CircuitBreakerProcessor: Provides fault tolerance
π RAG Pipeline
Multi-stage retrieval and generation system:
- Vector Search: Semantic similarity search with multiple indexes
- Chunking: Intelligent document segmentation with overlap
- Reranking: Result relevance optimization
- Graph RAG: Knowledge graph-based reasoning
- Hybrid Search: Combines vector and keyword search
- Context Augmentation: Dynamic context retrieval and ranking
π Tracing & Observability
Comprehensive monitoring with OpenTelemetry:
- Child Spans: Detailed operation tracking
- Performance Metrics: Processing time and resource usage
- Error Tracking: Comprehensive error monitoring
- Health Checks: System health monitoring
- Cost Tracking: API usage and cost optimization
Agents π€
Research Agent
Conducts thorough multi-phase research with web search, content evaluation, and learning extraction.
Capabilities:
- Multi-phase research process (initial + follow-up)
- Web search integration
- Content relevance evaluation
- Learning extraction and synthesis
- Follow-up question generation
Report Agent
Generates comprehensive, well-structured reports from research data.
Capabilities:
- Research data synthesis
- Structured report generation
- Markdown formatting
- Executive summaries
- Detailed analysis sections
Evaluation Agent
Assesses content quality, relevance, and usefulness for research purposes.
Capabilities:
- Relevance scoring
- Content quality assessment
- Source credibility evaluation
- Recency analysis
- Comprehensive evaluation metrics
Learning Extraction Agent
Extracts key insights, patterns, and follow-up questions from research content.
Capabilities:
- Key learning identification
- Pattern recognition
- Follow-up question generation
- Content categorization
- Insight synthesis
Web Summarization Agent
Condenses lengthy web content into actionable summaries while preserving critical information.
Capabilities:
- Content analysis and categorization
- Intelligent summarization (80-95% reduction)
- Key insight preservation
- Source context maintenance
- Token optimization
RAG Agent
Manages vector search and retrieval-augmented generation for enhanced context and responses.
Capabilities:
- Vector embedding generation
- Semantic similarity search
- Context retrieval and ranking
- Knowledge augmentation
- Memory integration
GitHub Agent
Advanced AI-powered GitHub Assistant for complete repository and project management with Copilot integration.
Capabilities:
- Repository Management: Create, list, update, and delete repositories
- Issue Management: Create, update, list, and manage GitHub issues with comments
- Pull Request Management: Handle PR creation, updates, merging, and reviews
- Branch Management: Create, list, and delete repository branches
- User & Organization Management: Search users, manage organization members
- Advanced Copilot Integration: Delegate complex coding tasks to GitHub Copilot
- GitHub API Integration: Full access to GitHub REST API via Octokit
Copilot Integration Features:
- Task Delegation: Assign coding tasks to @github-copilot for automated implementation
- Code Analysis: Request Copilot analysis and suggestions on pull requests
- Automated PR Creation: Copilot generates code and creates pull requests automatically
Prerequisites:
- GitHub Copilot Enterprise subscription for advanced features
- Valid GITHUB_API_KEY environment variable
- Appropriate repository permissions
Workflows π
Comprehensive Research Workflow
End-to-end research pipeline with human-in-the-loop approval.
Process:
- Query Input: Interactive user query collection
- Web Research: Multi-phase research execution
- Content Evaluation: Relevance and quality assessment
- Learning Extraction: Key insights and follow-up questions
- Data Consolidation: Research data synthesis
- RAG Processing: Vector search and retrieval
- Content Synthesis: Final content generation
- Report Creation: Comprehensive report generation
- Human Approval: Interactive approval workflow
Generate Report Workflow
Automated report generation from research data.
Features:
- Research data processing
- Structured report formatting
- Content organization
- Executive summaries
- Detailed analysis sections
Research Workflow
Basic research orchestration for simpler use cases.
Capabilities:
- Query processing
- Web search execution
- Basic result evaluation
- Simple report generation
Tools π οΈ
Web Search Tool
Intelligent web content discovery and extraction.
Features:
- Advanced web scraping
- Content relevance filtering
- Source credibility assessment
- Structured data extraction
- Rate limiting and error handling
Vector Query Tool
Semantic search over embedded content.
Capabilities:
- Vector similarity search
- Content ranking and scoring
- Multi-index support
- Query optimization
- Result filtering and sorting
Chunker Tool
Document segmentation for RAG applications.
Features:
- Intelligent text segmentation
- Overlap configuration
- Metadata preservation
- Format support (text, markdown, HTML)
- Chunk size optimization
Rerank Tool
Result relevance optimization and ranking.
Capabilities:
- Multi-criteria ranking
- Semantic relevance scoring
- Position-based weighting
- Custom ranking algorithms
- Performance optimization
Evaluation Tool
Content quality and relevance assessment.
Features:
- Multi-dimensional evaluation
- Scoring algorithms
- Quality metrics
- Relevance assessment
- Automated evaluation pipelines
Learning Extraction Tool
Insight mining from research data.
Capabilities:
- Pattern recognition
- Key learning identification
- Follow-up question generation
- Content categorization
- Insight synthesis and organization
GitHub Tools
Comprehensive suite of 14 specialized tools for complete GitHub API integration.
Repository Management Tools:
- createRepository: Create new repositories with custom settings
- getRepository: Retrieve detailed repository information
- updateRepository: Modify repository settings and metadata
- deleteRepository: Remove repositories (with confirmation)
- listRepositories: List user repositories with filtering options
Branch Management Tools:
- listBranches: List all branches in a repository
- getBranch: Get detailed branch information
- createBranch: Create new branches from existing commits
- deleteBranch: Remove branches safely
Issue Management Tools:
- createIssue: Create new issues with titles and descriptions
- getIssue: Retrieve issue details and metadata
- updateIssue: Modify issue status, title, and body
- listIssues: List repository issues with filtering (open/closed/all)
Pull Request Management Tools:
- createPullRequest: Create pull requests between branches
- getPullRequest: Get detailed PR information
- updatePullRequest: Modify PR title, body, and state
- mergePullRequest: Merge PRs with various merge methods
- listPullRequests: List PRs with status filtering
Additional Tools:
- search: Advanced GitHub search across repositories
- getUser: Retrieve user profile information
- listOrganizations: List user organizations
- createComment: Add comments to issues and PRs
All tools support:
- Full GitHub API integration via Octokit
- Comprehensive error handling and logging
- Type-safe input validation with Zod schemas
- Rate limiting and API quota management
Networks π
Complex Research Network
Advanced multi-agent collaboration system for complex research tasks.
Features:
- Multi-agent coordination
- Dynamic task distribution
- Inter-agent communication
- Collaborative problem-solving
- Network optimization
- Scalable architecture
MCP Integration π
Model Context Protocol
Standardized integration with AI tools and services.
Capabilities:
- Tool discovery and registration
- Real-time communication
- Standardized interfaces
- Extensible architecture
- Cross-platform compatibility
MCP Server
Custom MCP server implementation for enhanced tool integration.
Features:
- Tool registration and management
- Real-time communication channels
- Error handling and recovery
- Performance monitoring
- Security and authentication
GitHub Tools Integration:
- All 14 GitHub tools available through MCP
- Standardized tool interface for GitHub operations
- Type-safe API interactions
- Rate limiting and error handling
- Full GitHub API coverage via Octokit
RAG Pipeline π
Advanced Retrieval-Augmented Generation System
The system implements a sophisticated multi-stage RAG pipeline with the following components:
Vector Search & Indexing
- Multiple Vector Indexes: Separate indexes for research documents, web content, learnings, and reports
- Semantic Similarity: Advanced embedding-based search with configurable similarity thresholds
- Hybrid Search: Combines vector similarity with keyword-based search for optimal results
- Index Management: Automatic index creation, optimization, and health monitoring
Document Processing Pipeline
- Intelligent Chunking: Overlapping text segmentation with metadata preservation
- Content Extraction: Advanced parsing for various document formats (PDF, HTML, markdown)
- Metadata Enrichment: Automatic extraction of titles, authors, dates, and content categories
- Quality Filtering: Content validation and relevance scoring
Retrieval & Ranking
- Multi-stage Retrieval: Initial broad search followed by focused reranking
- Context Augmentation: Dynamic context window expansion based on query complexity
- Relevance Scoring: Multi-criteria ranking combining semantic similarity, recency, and authority
- Deduplication: Intelligent removal of redundant information
Graph RAG Integration
- Knowledge Graph Construction: Automatic relationship extraction and graph building
- Graph-based Reasoning: Path finding and relationship traversal for complex queries
- Entity Resolution: Named entity recognition and disambiguation
- Temporal Reasoning: Time-based relationship analysis and event sequencing
Memory Processors π§
Advanced Context Optimization System
The system features 11 specialized memory processors designed to optimize context management and information retrieval:
Core Processing Categories
Content Optimization:
- TokenLimiterProcessor: Prevents context overflow by filtering messages exceeding token limits
- ErrorCorrectionProcessor: Eliminates duplicate content using checksum-based deduplication
- CircuitBreakerProcessor: Provides fault tolerance and prevents cascading failures
Semantic Enhancement:
- PersonalizationProcessor: Boosts relevance of user-specific content and preferences
- HierarchicalMemoryProcessor: Distinguishes between episodic and semantic memory content
- CitationExtractorProcessor: Prioritizes messages containing citations and references
Advanced Reasoning:
- MultiPerspectiveProcessor: Evaluates content from multiple analytical viewpoints
- TemporalReasoningProcessor: Handles time-based relationships and chronological ordering
- UncertaintyQuantificationProcessor: Assigns confidence scores to information reliability
Knowledge Integration:
- KnowledgeGraphProcessor: Constructs and maintains knowledge graphs from conversation data
- BayesianBeliefProcessor: Implements probabilistic reasoning for belief updating
Performance Optimizations
- WeakMap Caching: Efficient memory usage with garbage collection-friendly caching
- Pre-compiled Regex: Optimized pattern matching for content analysis
- SIMD-like Processing: Batch processing optimizations for token estimation
- Lazy Evaluation: Deferred computation with memoization for expensive operations
Tracing & Observability π
Comprehensive Monitoring System
The system implements advanced observability using OpenTelemetry for complete system monitoring:
Tracing Infrastructure
- OpenTelemetry Integration: Industry-standard tracing with vendor-neutral APIs
- Child Spans: Detailed operation tracking with hierarchical span relationships
- Distributed Tracing: End-to-end request tracking across all system components
- Custom Span Attributes: Rich metadata for debugging and performance analysis
Performance Monitoring
- Processing Time Tracking: Detailed timing for all operations and workflows
- Resource Usage Metrics: Memory, CPU, and I/O utilization monitoring
- API Call Tracking: Request/response monitoring with error classification
- Cache Hit Analysis: Explicit and implicit caching performance metrics
Error Tracking & Diagnostics
- Comprehensive Error Monitoring: Structured error logging with context
- Exception Propagation: Error tracking through the entire call stack
- Failure Pattern Analysis: Automated detection of recurring issues
- Health Check Integration: System health monitoring with automated recovery
Cost Optimization
- API Usage Tracking: Detailed cost analysis for external API calls
- Cache Efficiency Metrics: Cost savings calculations for cached content
- Resource Optimization: Identification of performance bottlenecks
- Usage Analytics: Historical usage patterns and trend analysis
Installation π¦
Prerequisites
- Node.js 20.9+
- npm or pnpm
- Google AI API key
- Database URL (LibSQL)
- GitHub Personal Access Token (for GitHub features)
Setup
# Clone the repository
git clone https://github.com/ssdeanx/deep-research.git
cd deep-research
# Install all dependencies (backend + frontend)
npm install
# Copy environment file
cp .env.example .env
# Configure environment variables
# Edit .env with your API keys and configuration
Frontend Setup
The project includes a modern React frontend built with Vite, TypeScript, and Tailwind CSS:
# Install shadcn/ui components (already included in package.json)
# All 47 UI components are pre-configured and ready to use
# Start the frontend development server
npm run dev:frontend
# Or start both backend and frontend together
npm run dev:full
# Build for production
npm run build:frontend
Backend Setup
The Mastra backend provides the AI orchestration layer:
# Start the Mastra development server
npm run dev
# Build for production
npm run build
# Start production server
npm run start
Environment Configuration
# Google AI Configuration
GOOGLE_GENERATIVE_AI_API_KEY=your_google_ai_api_key
# Database Configuration
DATABASE_URL=file:./mastra.db
# GitHub API Configuration
GITHUB_API_KEY=your_github_personal_access_token
# Optional: Additional API keys
EXA_API_KEY=your_exa_api_key
# Logging and Monitoring
LOG_LEVEL=info
GitHub API Setup
To use GitHub features, you need to configure a GitHub Personal Access Token:
-
Create a Personal Access Token:
- Go to GitHub Settings β Developer settings β Personal access tokens β Tokens (classic)
- Generate a new token with the following permissions:
repo(Full control of private repositories)public_repo(Access public repositories)read:org(Read org and team membership)read:user(Read ALL user profile data)read:project(Read project boards)
-
Add to Environment:
GITHUB_API_KEY=ghp_your_token_here -
Copilot Enterprise (Optional):
- For advanced Copilot integration features
- Requires GitHub Copilot Enterprise subscription
- Enables automated task delegation and code generation
Quick Start π
Backend API Usage
import { mastra } from './src/mastra';
// Start comprehensive research workflow
const workflow = mastra.getWorkflow('comprehensive-research-workflow');
const run = await workflow.createRunAsync();
const result = await run.start({
inputData: {}
});
// Handle workflow suspension for user input
if (result.status === 'suspended') {
// Resume with user query
const resumeResult = await run.resume({
step: 'get-user-query',
resumeData: { query: 'quantum computing advancements' }
});
}
Frontend Usage
# Start the full-stack application
npm run dev:full
# Frontend will be available at http://localhost:4000
# Backend API will be available at http://localhost:4111
Frontend Features:
- Interactive Research Interface: Start research workflows through the web UI
- Real-time Workflow Monitoring: Track workflow progress and status updates
- Agent Management Dashboard: View and interact with all available agents
- Responsive Design: Works seamlessly on desktop and mobile devices
- Dark Mode Support: Automatic theme switching based on system preferences
Using Individual Agents
// Research agent example
const researchAgent = mastra.getAgent('researchAgent');
const researchResult = await researchAgent.generate([
{ role: 'user', content: 'Research AI advancements in healthcare' }
]);
// Report generation
const reportAgent = mastra.getAgent('reportAgent');
const report = await reportAgent.generate([
{ role: 'user', content: `Generate report from: ${researchResult.text}` }
]);
MCP Integration
# Start MCP server
npm run mcp:server
# The server will be available for MCP-compatible clients
Usage Examples π
Complete Research Pipeline
import { mastra } from './src/mastra';
async function conductResearch(query: string) {
// Initialize workflow
const workflow = mastra.getWorkflow('comprehensive-research-workflow');
const run = await workflow.createRunAsync();
// Start research process
let result = await run.start({ inputData: {} });
// Handle user query input
if (result.status === 'suspended') {
result = await run.resume({
step: 'get-user-query',
resumeData: { query }
});
}
// Process through research phases
// (additional resume calls for other suspension points)
// Get final result
if (result.status === 'success') {
return result.output;
}
}
Custom Agent Usage
// Custom research with specific parameters
const researchAgent = mastra.getAgent('researchAgent');
const result = await researchAgent.generate([
{
role: 'user',
content: `Conduct deep research on: ${topic}
Focus areas: ${focusAreas.join(', ')}
Depth level: ${depth}
Include sources from: ${sources.join(', ')}`
}
], {
// Custom generation options
temperature: 0.7,
maxTokens: 4000
});
Tool Integration
// Direct tool usage
const webSearchTool = mastra.getTool('webSearch');
const searchResults = await webSearchTool.execute({
context: {
query: 'artificial intelligence trends 2024',
limit: 10
}
});
// Vector search
const vectorTool = mastra.getTool('vectorQuery');
const vectorResults = await vectorTool.execute({
context: {
query: 'machine learning algorithms',
topK: 5,
indexName: 'research-data'
}
});
GitHub Agent Usage
// Using the GitHub Agent for repository management
const githubAgent = mastra.getAgent('githubAgent');
// Create a new repository
const createResult = await githubAgent.generate([
{
role: 'user',
content: 'Create a new repository called "my-awesome-project" with description "An awesome project" and make it private'
}
]);
// List user repositories
const listResult = await githubAgent.generate([
{
role: 'user',
content: 'List all my repositories'
}
]);
// Create and manage issues
const issueResult = await githubAgent.generate([
{
role: 'user',
content: 'Create a new issue in repository "my-org/my-repo" with title "Bug: Login not working" and description "Users cannot log in after recent update"'
}
]);
GitHub Tools Direct Usage
// Direct GitHub tool usage
const createRepoTool = mastra.getTool('createRepository');
const repoResult = await createRepoTool.execute({
context: {
name: 'my-new-repo',
description: 'A new repository created via API',
private: true
}
});
// List pull requests
const listPRsTool = mastra.getTool('listPullRequests');
const prsResult = await listPRsTool.execute({
context: {
owner: 'my-org',
repo: 'my-repo',
state: 'open'
}
});
// Create a pull request
const createPRTool = mastra.getTool('createPullRequest');
const prResult = await createPRTool.execute({
context: {
owner: 'my-org',
repo: 'my-repo',
title: 'Add new feature',
head: 'feature-branch',
base: 'main',
body: 'This PR adds a new feature...'
}
});
Copilot Integration Examples
// Delegate a coding task to GitHub Copilot
const copilotResult = await githubAgent.generate([
{
role: 'user',
content: 'Have Copilot implement a user authentication system for repository "my-org/my-app". The system should include login, logout, and password reset functionality.'
}
]);
// Request Copilot analysis of a pull request
const analysisResult = await githubAgent.generate([
{
role: 'user',
content: 'Ask Copilot to analyze PR #123 in repository "my-org/my-repo" and provide suggestions for improvement'
}
]);
Configuration βοΈ
Agent Configuration
// Custom agent configuration
const customResearchAgent = new Agent({
name: 'Custom Research Agent',
instructions: 'Custom research instructions...',
model: createGemini25Provider('gemini-2.5-flash-lite-preview-06-17', {
responseModalities: ["TEXT"],
thinkingConfig: {
thinkingBudget: -1,
includeThoughts: false,
},
useSearchGrounding: true,
dynamicRetrieval: true,
safetyLevel: 'OFF',
structuredOutputs: true,
}),
tools: [webSearchTool, evaluateResultTool],
memory: createResearchMemory(),
});
Workflow Configuration
// Custom workflow configuration
const customWorkflow = createWorkflow({
id: 'custom-research-workflow',
inputSchema: z.object({
query: z.string(),
depth: z.enum(['shallow', 'medium', 'deep'])
}),
outputSchema: z.object({
results: z.array(z.any()),
summary: z.string()
})
})
.then(searchStep)
.then(analyzeStep)
.commit();
Storage Configuration
// Advanced LibSQL storage with dual databases
import { createLibSQLStore } from './src/mastra/config/libsql-storage';
const storage = createLibSQLStore({
tracingContext: {
context: {},
runtimeContext: {
userId: 'user123',
sessionId: 'session456'
}
}
});
// Vector search configuration
const vectorSearch = {
researchDocuments: {
indexName: 'research_documents',
dimensions: 1536,
metric: 'cosine'
},
webContent: {
indexName: 'web_content',
dimensions: 1536,
metric: 'cosine'
},
learnings: {
indexName: 'learnings',
dimensions: 1536,
metric: 'cosine'
}
};
Memory Processor Configuration
// Advanced memory configuration with specialized processors
import { createMemoryWithProcessors } from './src/mastra/config/memory-processors';
const memory = createMemoryWithProcessors({
storage,
processors: [
new TokenLimiterProcessor({ maxTokens: 4000 }),
new PersonalizationProcessor({ userId: 'user123' }),
new ErrorCorrectionProcessor({ enableChecksum: true }),
new CircuitBreakerProcessor({ failureThreshold: 5 }),
new BayesianBeliefProcessor({ confidenceThreshold: 0.8 }),
new KnowledgeGraphProcessor({ enableRelationships: true })
],
options: {
lastMessages: 100,
semanticRecall: {
topK: 10,
threshold: 0.7
}
}
});
Google AI Configuration
// Advanced Google AI configuration with Gemini 2.5 features
import { createMastraGoogleProvider } from './src/mastra/config/googleProvider';
const googleModel = createMastraGoogleProvider('gemini-2.5-flash-lite-preview-06-17', {
useSearchGrounding: true,
dynamicRetrieval: true,
cachedContent: 'cached-research-context',
safetyLevel: 'MODERATE',
structuredOutputs: true
});
// Explicit caching setup
const cacheManager = createCacheManager();
const cachedModel = await createCachedGoogleModel('gemini-2.5-flash-preview-05-20', {
cacheManager,
cacheContents: [{ role: 'user', parts: [{ text: 'Research context...' }] }],
cacheTtlSeconds: 300
});
API Reference π
Core Classes
Mastra
Main application class managing agents, workflows, and tools.
Methods:
getAgent(name: string): AgentgetWorkflow(name: string): WorkflowgetTool(name: string): ToolgetStorage(): Storage
Agent
AI agent with specific capabilities and tools.
Methods:
generate(messages: Message[], options?: GenerateOptions): Promise<GenerateResult>stream(messages: Message[], options?: StreamOptions): Promise<StreamResult>
Workflow
Orchestrated sequence of operations.
Methods:
createRunAsync(): Promise<WorkflowRun>start(inputData: any): Promise<WorkflowResult>resume(step: string, resumeData: any): Promise<WorkflowResult>
Tool Interface
Standardized tool interface for extensibility.
Methods:
execute(context: ToolContext): Promise<ToolResult>
Contributing π€
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
# Fork and clone
git clone https://github.com/your-username/deep-research.git
cd deep-research
# Install dependencies
npm install
# Set up development environment
cp .env.example .env
# Configure your environment variables
# Run tests
npm test
# Start development server
npm run dev
Code Standards
- TypeScript for type safety
- ESLint for code quality
- Prettier for code formatting
- Comprehensive test coverage
- Clear documentation
License π
This project is licensed under the ISC License - see the LICENSE file for details.
Acknowledgments π
- Mastra Framework: For the powerful AI orchestration capabilities
- Google AI: For the Gemini models and AI services
- LibSQL: For the high-performance database solution
- OpenTelemetry: For observability and tracing
- Zod: For runtime type validation
Built with β€οΈ using Mastra, Google AI, and modern web technologies. Inspired by the pursuit of enlightened knowledge discovery through ethical AI development.