zos Support cloud-init config in workload data

Since we're using cloud-init, we can give grid users the ability to pass cloud config in the workload data. This is already proposed as a TODO for user config, but I think we should support accepting a full user-data file with the workload data.

This would give users the ability to define customizations to base cloud images that are applied at first boot, rather than needing to upload a new flist. We can avoid bloat in the flist hub, Zos cache, and network lines between them, since currently any variations require duplication of the base image along with user changes.

An interface can then support any subset of cloud config features, like configuring user accounts, and generate user data files on behalf of the user, as well as allowing the user to pass their own.

Sep 23 '22 23:09 scottyeager

Agree, already have a few use cases for this.

Sep 26 '22 07:09 coesensbert

It shouldn't be hard to be implemented, the only reason it wasn't is to avoid supporting different config types for both VMs and Containers workloads.

Sep 27 '22 12:09 muhamadazmy

Unifying around cloud-init configs is one option. Looks like some work was already done to support this in cloud-container. Specifying environment variables and entry point for containers isn't so clean in this case though, since cloud-init doesn't natively support these fields.

I guess the simplest approach is accepting user-data file contents as an environment variable, then check if it exists while setting up full VMs and if so use it instead of generating a new file. In case no user-data is provided, can proceed with existing process, maintaining backwards compatibility. Creating a new workload data type requires slightly more work in updating clients, but maybe is preferred for some reason.

Sep 28 '22 23:09 scottyeager

@scottyeager I need to clarify few things first:

Full cloud-init support is done by the VM image used. ZOS only pass some config to the VM and it's up to the cloud-init instance pre-installed in the VM image to actually use those config. Hence only the official VM images are granted to work with the config that ZOS set for the VM.
cloud-container on the other hand is a thin wrapper that supports "containers" containers are not full virtual machines they are like docker images so only contain the APP binaries. Hence cloud-container only include some "minimal" set of cloud-init config files (note that cloud-init is a big beast that is also mainly python). the cloud-container implements scripts that understand ONLY the subset of config that is passed by ZOS to create the "VM" (it uses a custom kernel and initramfs image plus the scripts to process cloud-init config)

This means the following:

Not all cloud-init config make sense in both VM and Container context. so it was decided to control what the user can set to be compatible with both types.
If a user passes some "user-data" that is not supported by the cloud-container thin layer it will not get applied hence it will act like the system is not behaving correctly.
again cloud-container still need to "feel" like a container (think docker) hence env variables to pass to your app makes sense (in that context), but user-data don't.

Oct 17 '22 07:10 muhamadazmy

@muhamadazmy, we already have a similar situation with divergence between the configuration of full vms and micro vms. That is, environment variables and entry point are already not supported by full vms. The solution is simply to hide these options from the user in the frontend, or clearly document the behavior for users of the libraries and Terraform.

Allowing users to construct their own user-data file for a micro vm is indeed probably not a good idea. So I'd propose something like this as the simplest, backwards compatible approach:

User passes in their user-data as an environment variable (USER_DATA)
VM manager checks while running VM: if full VM and USER_DATA is not empty, then write USER_DATA contents to user-data file in cloud-init volume
Else, proceed with construction of user-data per existing process

So, if someone sets USER_DATA for a micro vm, it will just be passed in as an environment variable with no effect, so there should be no confusion around a supporting a subset of features.

The same could be achieved with properties on the workload data, one for ssh key and one for user-data with the same logic. Unless we plan to support anything beyond setting the root ssh key for micro vms, this should cover all use cases in a clean way.

Oct 19 '22 01:10 scottyeager