airavata
airavata copied to clipboard
Reimplementation of Orchestration, Scheduling, and Credential Management in Golang
π― Conceptual Model
This PR is more of a RFC to think through what the fundamental problem that airavata should tackle. This POC is built around 6 core domain interfaces that represent the fundamental operations of distributed task execution. The POC uses a gRPC-based worker system for task execution:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Core Domain Interfaces β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β ResourceRegistry β CredentialVault β ExperimentOrch β
β β’ Register compute β β’ Secure storage β β’ Create exper β
β β’ Register storage β β’ Unix permissions β β’ Generate tasks β
β β’ Validate access β β’ Encrypt/decrypt β β’ Submit for execβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β TaskScheduler β DataMover β WorkerLifecycle β
β β’ Cost optimization β β’ 3-hop staging β β’ Spawn workers β
β β’ Worker distrib β β’ Persistent cache β β’ gRPC workers β
β β’ Atomic assignment β β’ Lineage tracking β β’ Task execution β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β gRPC Worker System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Worker Binary β Script Generation β Task Execution β
β β’ Standalone exec β β’ SLURM scripts β β’ Poll for tasks β
β β’ gRPC client β β’ K8s manifests β β’ Execute tasks β
β β’ Auto-deployment β β’ Bare metal scriptsβ β’ Report results β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Credential Management Architecture
The Airavata Scheduler implements a three-layer credential architecture that separates authorization logic from storage for maximum security and scalability:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Application Layer β
β (Experiments, Resources, Users, Groups) β
ββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββ¬βββββββββββββββββββ
β β β
ββββββββββββββΌββββββ ββββββββββΌβββββββββ βββββββΌβββββββββββ
β PostgreSQL β β SpiceDB β β OpenBao β
β β β β β β
β Domain Data β β Authorization β β Secrets β
β - Users β β - Permissions β β - SSH Keys β
β - Groups β β - Ownership β β - Passwords β
β - Experiments β β - Sharing β β - Tokens β
β - Resources β β - Hierarchies β β (Encrypted) β
ββββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββββ
Key Benefits
- π Separation of Concerns: Authorization (SpiceDB) separate from secret storage (OpenBao)
- π‘οΈ Fine-grained Permissions: Read/write/delete permissions with hierarchical group inheritance
- π Complete Audit Trail: All operations logged across all three systems
- π Credential Rotation: Support for automatic key rotation with zero downtime
- π₯ Group Management: Groups can contain groups with transitive permission inheritance
- π Resource Binding: Credentials bound to specific compute/storage resources