ADR: Key Management Improvements
Background
OpenTDF's current key management implementation requires administrators to follow different procedures for different types of keys. KAS keys must be defined in two separate locations within the opentdf config, and other keys have their own unique management requirements. While some of this fragmentation stems from the separation between OpenTDF and vendor specific capabilities, it creates unnecessary complexity and increases the risk of configuration errors.
The platform also lacks a standardized configuration structure for accessing key material from modern key management solutions such as:
- Cloud provider KMS services (AWS, GCP)
- Vault/OpenBAO
- Hardware Security Modules (HSM)
Goals
- Establish a unified interface for key management across all OpenTDF components
- Simplify key configuration and reduce potential for misconfiguration
- Enable seamless integration with external key management systems
- Support future scalability through standardized key handling
- Maintain backward compatibility during transition
Proposal Overview
This document proposes creating a consistent key management and retrieval mechanism that can be used across both OpenTDF and any vendor specific offerings. The solution will eliminate the need for multiple configuration points while providing flexible integration with various key storage solutions.
Current State
Types of keys
KAS Keys
KAS (Key Access Server) keys can be:
- RSA or EC format
- Raw key material used locally via Go's standard crypto libraries
- References to keys backed by KMS or HSM
Encrypted Search Keys
- Symmetric keys used for secure search functionality
- Currently managed through environment variables
Bundle Signing Keys
- RSA 2048 keys
- Used to cryptographically sign deployment bundles
- Currently managed through Cosign
Policy Signing Keys
- RSA 2048 keys
- Used to sign policy artifacts
- Enables trusted import/export of platform configuration
Future Key Requirements
The platform needs to accommodate additional key types:
- EPOP (Entity Proof of Possession) keys
- Platform policy root keys
- Identity Provider (IDP) and PKI root certificates
Current Management Approaches
Each key type currently requires its own management approach:
-
KAS Keys:
- Stored as PEM-encoded files in the filesystem
- Referenced in opentdf config.yaml
- Requires configuration in multiple locations
-
Vendor specific keys:
- Imported and managed via environment variables
- No standardized key rotation mechanism
-
Signing Keys (Bundle and Policy):
- Managed through Cosign tooling
- Requires separate key generation and management workflows
These disparate approaches create several challenges:
- Increased operational complexity
- No unified key rotation strategy
- Limited integration with modern key management systems
- Difficult to maintain consistent security policies
Proposed Solution
Overview
We propose eliminating key pre-provisioning in config.yaml in favor of a unified key management system that provides:
- CLI-based key management for automated processes
- Web-based administration through the upcoming admin UI
- Standardized interface for all key operations
- Flexible integration with external key management systems
Key Storage and Identification
The solution consists of two main components:
1. Key Storage
- Primary storage in Platform's data store (Postgres by default)
- Alternative storage in signed policy artifacts
- Support for external key management systems:
- Hardware Security Modules (HSM)
- Cloud KMS services
- Vault/OpenBAO
2. Key Identification
- Internal key IDs generated and managed by the platform
- Controlled format and size
- Optimized for nanoTDF compatibility
- External key references maintained separately
- Maps to provider-specific identifiers (e.g., Vault-generated UUIDs)
- Preserves isolation between internal and external systems
Key Retrieval Process
Example: Key retrieval for rewrap operation
- KAS receives rewrap request
- Validates request and permissions
- Leverages AccessPDP for access decision
- Extracts internal key ID (
k1) from key access object - Retrieves key configuration via
keyProvider.getKeyConfiguration(k1) - Uses configuration to fetch key material from appropriate backend
- Performs rewrap operation
Performance Considerations
To ensure optimal performance:
- KeyProvider implements aggressive caching of frequently used keys
- Preloading mechanism for most recent
nkeys - Configurable cache size and retention policies
Implementation Plan
Phase 1: Core Infrastructure
- Create new
keystable schema Reference - Implement CLI for key management operations
- Standarize key configuration in config.yaml
Phase 2: Provider Integration
- Update platform code to use key ID references exclusively
- Develop crypto provider framework:
- Build on @dmihalcik-virtru's existing provider work
- Implement provider interface, An example:
standardCrypto.encrypt(clearText, keyReference) - Create providers for:
- Standard crypto operations
- HSM integration (e.g., Thales)
- Cloud KMS (e.g., GCP KMS)
Risks and Mitigations
Operational Risks
1. Migration Complexity
Risk: Existing deployments may face disruption during migration to the new key management system. Mitigation:
- Provide backward compatibility during transition period
- Create automated migration tools for existing key configurations
- Document step-by-step migration procedures for different deployment types
- Enable gradual migration by supporting both old and new systems simultaneously
2. Performance Impact
Risk: Additional abstraction layers and external key fetching could impact system performance. Mitigation:
- Implement aggressive caching strategy
- Allow configuration of cache sizes and retention policies
3. Configuration Errors
Risk: While simpler, the new system still requires proper configuration of external key management systems. Mitigation:
- Implement validation checks for key provider configurations
- Provide clear error messages for misconfiguration
- Include configuration examples for common scenarios
Technical Risks
1. Integration Complexity
Risk: Different key management systems have varying APIs and capabilities. Mitigation:
- Design flexible provider interface
- Implement comprehensive provider testing
- Document provider-specific limitations
- Maintain test suite for provider implementations
I'm a fan. Key ID's can be any length right?
cc: @biscoe916 @jrschumacher The following updates are a high level overview of what we think is needed to support this ADR. Nothing is set in stone and the below could change and evolve overtime as work begins. For example this ADR will impact the work done on https://github.com/opentdf/platform/issues/1485 and will require slight re-organization of resources.
Updated ERD to support key management ADR.
erDiagram
key_access_server {
uuid id
varchar uri
varchar name
varchar source_type
}
key_access_server_keys {
uuid id
uuid key_access_server_id
}
asym_keys {
uuid id
varchar key_id
varchar algorithm
varchar key_status
varchar key_mode
jsonb public_key_ctx
jsonb private_key_ctx
uuid provider_config_id
jsonb metadata
}
sym_keys {
uuid id
varchar key
varchar key_id
varchar key_status
varchar key_mode
uuid provider_config_id
}
provider_configuration {
uuid id
varchar provider_type
jsonb config_json
jsonb metadata
}
namespace_public_key_mappings {
uuid namespace_id
uuid kas_key_id
}
definition_public_key_mappings {
uuid definition_id
uuid kas_key_id
}
value_public_key_mappings {
uuid value_id
uuid kas_key_id
}
key_access_server ||--o{ key_access_server_keys : has
key_access_server_keys ||--|| asym_keys : inherits
asym_keys }o--|| provider_configuration : uses
sym_keys }o--|| provider_configuration : uses
asym_keys ||--o{ namespace_public_key_mappings : maps_to
asym_keys ||--o{ definition_public_key_mappings : maps_to
asym_keys ||--o{ value_public_key_mappings : maps_to
Proto Changes introduce a new Key Management Service
// Supported key algorithms.
enum Algorithm {
ALGORITHM_UNSPECIFIED = 0;
ALGORITHM_RSA_2048 = 1;
ALGORITHM_RSA_4096 = 2;
ALGORITHM_EC_P256 = 3;
ALGORITHM_EC_P384 = 4;
ALGORITHM_EC_P521 = 5;
}
// The status of the key.
enum KeyStatus {
KEY_STATUS_UNSPECIFIED = 0;
KEY_STATUS_ACTIVE = 1;
KEY_STATUS_INACTIVE = 2;
KEY_STATUS_COMPROMISED = 3;
KEY_STATUS_EXPIRED = 4;
}
// Describe how the kas private key is managed.
// If the key mode is LOCAL, then the kas private key is stored in the database.
// This could be encrypted or unencrypted.
// Remote means that the kas private key is stored in a remote key system like KMS or HSM
// and all operations are done by the remote key system.
enum KeyMode {
KEY_MODE_UNSPECIFIED = 0;
KEY_MODE_LOCAL = 1;
KEY_MODE_REMOTE = 2;
}
// Describes whether this kas is managed by the organization or if they imported
// the kas information from an external party. These two modes are necessary in order
// to encrypt a tdf dek with an external parties kas public key.
enum SourceType {
SOURCE_TYPE_UNSPECIFIED = 0;
// The kas is managed by the organization.
SOURCE_TYPE_INTERNAL = 1;
// The kas is managed by an external party.
SOURCE_TYPE_EXTERNAL = 2;
}
message KeyAccessServer {
string id = 1;
string name = 2;
string uri = 3;
SourceType source_type = 4;
// Common metadata
common.Metadata metadata = 100;
}
message AsymKey {
string id = 1;
string key_id = 2;
Algorithm algorithm = 3;
KeyStatus status = 4;
KeyMode mode = 5;
string public_key_ctx = 6;
string private_key_ctx = 7;
ProviderConfig provider_config = 8;
// Common metadata
common.Metadata metadata = 100;
}
message SymKey {
string id = 1;
string key_id = 2;
KeyStatus status = 3;
KeyMode mode = 4;
ProviderConfig provider_config = 6;
// Common metadata
common.Metadata metadata = 100;
}
service KeyManagementService {
// Key Access Server Management
rpc CreateKeyAccessServer(CreateKeyAccessServerRequest) returns (CreateKeyAccessServerResponse) {}
rpc GetKeyAccessServer(GetKeyAccessServerRequest) returns (GetKeyAccessServerResponse) {}
rpc ListKeyAccessServers(ListKeyAccessServersRequest) returns (ListKeyAccessServersResponse) {}
rpc UpdateKeyAccessServer(UpdateKeyAccessServerRequest) returns (UpdateKeyAccessServerResponse) {}
// Key Management
rpc CreateKey(CreateKeyRequest) returns (CreateKeyResponse) {}
rpc GetKey(GetKeyRequest) returns (GetKeyResponse) {}
rpc ListKeys(ListKeysRequest) returns (ListKeysResponse) {}
rpc UpdateKey(UpdateKeyRequest) returns (UpdateKeyResponse) {}
rpc RotateKey(RotateKeyRequest) returns (RotateKeyResponse) {}
}
// Additions to UnsafeService
service UnsafeService {
rpc UnsafeDeleteKeyAccessService(UnsafeDeleteKeyAccessServerRequest) returns (UnsafeDeleteKeyAccessServerResponse) {}
rpc UnsafeDeleteKey(UnsafeDeleteKeyRequest) returns (UnsafeDeleteKeyResponse) {}
}
Work will also need to be done to expose this crypto provider interface in order to acommodate key providers such as openbao, aws/gcp kms, etc...