Feature request: option for container-based builds for specific resources
Describe your idea/feature/enhancement
Building resources - LayerVersions and Functions - can currently occur either on the host natively or with sam build --use-container to build within a Docker container, but this an an all-or-nothing option - either everything builds in a container, or nothing does.
This causes an issue when a resource consists of or depends on native code, which necessitates that it must be built in a container (matching the runtime environment), and attempting to use it together with other resources that needs to be run on the host or otherwise do not work with container builds (such as go1.x on provided.al2/provided.al2023). In real terms, this problem occurs with a Go Function attempting to depend on a native-code LayerVersion - the Function must be built on the host but the LayerVersion can only be built correctly in an container matching the Amazon Linux runtime. Similarly, the same issue occurs when intermixing multiple Functions in a configuration, such as Node.js, where some functions rely on native modules (requiring container-based builds) and others which do not.
If SAM supported the ability to explicitly set how a given resource should be to be built, this would resolve the issue and allow a single call to sam build to build all resources correctly.
Proposal
Propose adding another Metadata resource attribute to the SAM specification like so:
Metadata:
BuildMethod: makefile
BuildArchitecture: arm64
BuildType: [host|container]
or alternatively, implementing a CLI flag and adding support to samconfig.toml, similar to build_image, where individual resources can be set to explicitly use or not use container builds.
Additional Details
It is possible to work around this issue by manually building each Layer and Function individually with the correct settings (e.g. running sam build --use-container ResourceName for parts that need it and then without --use-container for other resources), but this relies on the user mentally construct the dependency chains and effectively managing the caching performed by SAM. If one forgets to fully build all the layers required by a function with the correct settings or if SAM considers one cache element out of date, then the build will produce incompatible resources and fail to work.
Additionally, there are challenges when mixing container-based and host-based builds since build processes may differ between build types. For example, with nodejs20.x runtimes, a container-based build will store cached artefacts into .aws-sam/cache/{uuid} but a host-based build looks at .aws-sam/deps/{uuid} for dependencies and assess whether a build's cache is valid. In order to mix build types for this runtime and get a successful SAM build I can deploy, I've had to manually copy .aws-sam/build/{LogicalFunctionID}/node_modules over to .aws-sam/cache/{uuid}/node_modules after running sam build --use-container LogicalFunctionID. After this, sam build considers the cache valid and runs an incremental Node build, copying the deps folder into the build directory (equating to an expensive noop since the node_modules dir was already there from the container-based build).
Edit: there appears to be a change in caching in either SAM 1.91.0 or 1.92.0 - my caching workaround is no longer functioning. In other words, even if you build a layer with --use-container, subsequently it isn't considered as being cached by SAM with sam build without --use-container.
Edit2: the caching workaround requires creation of entries in .aws-sam/build.toml as per https://github.com/aws/aws-sam-cli/issues/5585#issuecomment-1659384801. Once a build definition is present, builds will update the definition in build.toml correctly.
Hey @davidjb Thanks for submitting the feature request and for explaining why the current workaround is not satisfying. I agree with you. Both approaches in your proposal sound reasonable to me. I'll raise that to team and see how we will prioritize it.
Hey @hawflau any update on this Feature request?
I made an edit to my original post above but thought it worth commenting to highlight the scope and severity of this issue further. Previously, I'd just been working with LayerVersions with native code but I've shifted to running Functions with Node.js runtimes that require native and platform-specific code.
The above workaround that relies on caching is fairly simple with LayerVersions - in order to get SAM to cache a LayerVersion, you need to create a mock entry in .aws-sam/build.toml (as per my comment at https://github.com/aws/aws-sam-cli/issues/5585#issuecomment-1659384801) and then run sam build --cached --use-container LogicalID, which caches the build reseults and updates the source_hash, allowing a subsequent sam build --cached to recognise the cache correctly.
However, things get more complicated when a Function is involved, especially one which is using a Node.js (nodejs20.x) runtime. In Node's case, there are both source_hash and manifest_hash to be matched and a non-container build also requires .aws-sam/deps/{uuid} to be present for SAM to consider a cache valid. Building in a container will update source_hash but not manifest_hash in build.toml, and also won't create .aws-sam/deps/{uuid} as it doesn't run CopyDependencies. So, end-to-end, to successfully mix container and non-container builds:
-
Add fake entries for all resources to
build.tomlas per https://github.com/aws/aws-sam-cli/issues/5585#issuecomment-1659384801 -
Run
sam build --cached --use-container LogicalIDfor each resource that needs a container.source_hashinbuild.tomlwill be updated and cache created.- If any dependencies cannot be built in a container, then build them separately and order your
sam buildcommands correctly.
- If any dependencies cannot be built in a container, then build them separately and order your
-
For resources with a runtime that requires
deps, manually copy dependencies like so:RESOURCE_ID=LogicalID # Or manually get the UUID out of build.toml BUILD_UUID=$(yq '.function_build_definitions[] | select(.functions[0] == "$RESOURCE_ID") | key' .aws-sam/build.toml) mkdir -p .aws-sam/deps/$BUILD_UUID # Change the following based on runtime; this is for Node.js cp .aws-sam/build/$RESOURCE_ID/node_modules .aws-sam/deps/$BUILD_UUID/ -
Manually calculate
manifest_hash(MD5 hash) for each resource, such as for Node.js:md5sum .aws-sam/build/$RESOURCE_ID/package.jsonand now enter this into
.aws-sam/build.tomlFor this step, you can also cheat and have SAM update the
manifest_hashes for you all at once by runningsam build --cached, taking care that none of the container-built artefacts get affected. In my case, this command fails mid-build as certainMakefile-based resources aren't designed for a host-based build. That said, themanifest_hashvalues still updated, even on failure. -
Run
sam build --cachedto build all resources. Observe your container-based resources will be considered to be cached and won't be re-built now thatsource_hashandmanifest_hashare correct and.aws-sam/deps/{uuid}is present.
At the conclusion of the workaround, you're left with a set of container-built resources and host-built resources which works, but is quite complex. Could you, @hawflau, look at prioritising a solution or re-triaging this as a bug, please?