airavata icon indicating copy to clipboard operation
airavata copied to clipboard

Reimplementation of Orchestration, Scheduling, and Credential Management in Golang

Open yasithdev opened this issue 6 months ago β€’ 0 comments

🎯 Conceptual Model

This PR is more of a RFC to think through what the fundamental problem that airavata should tackle. This POC is built around 6 core domain interfaces that represent the fundamental operations of distributed task execution. The POC uses a gRPC-based worker system for task execution:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Core Domain Interfaces                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  ResourceRegistry    β”‚  CredentialVault     β”‚  ExperimentOrch   β”‚
β”‚  β€’ Register compute  β”‚  β€’ Secure storage    β”‚  β€’ Create exper   β”‚
β”‚  β€’ Register storage  β”‚  β€’ Unix permissions  β”‚  β€’ Generate tasks β”‚
β”‚  β€’ Validate access   β”‚  β€’ Encrypt/decrypt   β”‚  β€’ Submit for execβ”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  TaskScheduler       β”‚  DataMover          β”‚  WorkerLifecycle  β”‚
β”‚  β€’ Cost optimization β”‚  β€’ 3-hop staging    β”‚  β€’ Spawn workers  β”‚
β”‚  β€’ Worker distrib    β”‚  β€’ Persistent cache β”‚  β€’ gRPC workers   β”‚
β”‚  β€’ Atomic assignment β”‚  β€’ Lineage tracking β”‚  β€’ Task execution β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    gRPC Worker System                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Worker Binary      β”‚  Script Generation   β”‚  Task Execution   β”‚
β”‚  β€’ Standalone exec  β”‚  β€’ SLURM scripts     β”‚  β€’ Poll for tasks β”‚
β”‚  β€’ gRPC client      β”‚  β€’ K8s manifests     β”‚  β€’ Execute tasks  β”‚
β”‚  β€’ Auto-deployment  β”‚  β€’ Bare metal scriptsβ”‚  β€’ Report results β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ” Credential Management Architecture

The Airavata Scheduler implements a three-layer credential architecture that separates authorization logic from storage for maximum security and scalability:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Application Layer                        β”‚
β”‚   (Experiments, Resources, Users, Groups)                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
             β”‚                  β”‚                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   PostgreSQL     β”‚  β”‚    SpiceDB      β”‚  β”‚   OpenBao      β”‚
β”‚                  β”‚  β”‚                 β”‚  β”‚                β”‚
β”‚  Domain Data     β”‚  β”‚  Authorization  β”‚  β”‚  Secrets       β”‚
β”‚  - Users         β”‚  β”‚  - Permissions  β”‚  β”‚  - SSH Keys    β”‚
β”‚  - Groups        β”‚  β”‚  - Ownership    β”‚  β”‚  - Passwords   β”‚
β”‚  - Experiments   β”‚  β”‚  - Sharing      β”‚  β”‚  - Tokens      β”‚
β”‚  - Resources     β”‚  β”‚  - Hierarchies  β”‚  β”‚  (Encrypted)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Benefits

  • πŸ”’ Separation of Concerns: Authorization (SpiceDB) separate from secret storage (OpenBao)
  • πŸ›‘οΈ Fine-grained Permissions: Read/write/delete permissions with hierarchical group inheritance
  • πŸ“‹ Complete Audit Trail: All operations logged across all three systems
  • πŸ”„ Credential Rotation: Support for automatic key rotation with zero downtime
  • πŸ‘₯ Group Management: Groups can contain groups with transitive permission inheritance
  • πŸ”— Resource Binding: Credentials bound to specific compute/storage resources

yasithdev avatar Oct 29 '25 19:10 yasithdev