aws-lambda-powershell-runtime icon indicating copy to clipboard operation
aws-lambda-powershell-runtime copied to clipboard

Compress runtime/modules to reduce deployment package unzipped size

Open sean-r-williams opened this issue 7 months ago • 9 comments

AWS Lambda limits ZIP deployment packages to 250 MB on-disk. The Linux PowerShell release builds are ~170 MB in the current release, which takes up ~70% of that cap. (Community members' attempts at producing "slim" PowerShell containers only reduce sizes by ~30MB, which isn't a ton in broader context.)

Pure PowerShell script modules are generally quite small, but compiled CLR module assemblies or autogenerated modules (e.g. Microsoft.Graph.*, Az.*, AWS.Tools.*) can get very large.

The current architecture only allows for ~80MB of module dependencies - AWS.Tools.S3 and Microsoft.Graph.Users (with respective dependencies) breaches the 250MB cap.

Compressing the PowerShell runtime itself would:

  • Free up >100MB of deployment package size (172MB vs 66 MB)
  • Unblock build artifact verification (PowerShell publishes SHAs of release artifacts)

Supporting compressed module dependencies (e.g. /opt/modules-packed.zip or /opt/modules-packed/*.nupkg gets unpacked to /tmp/modules-unpacked and added to PSModulePath) could help the situation as well.

sean-r-williams avatar Jun 20 '25 18:06 sean-r-williams

FWIW, Packed modules would be ostensibly easier to support as there's no runtime changes required (plus PSResourceGet can handle installing a directory full of nupkgs on its own), but slimming down the runtime proper would be a big win here.

sean-r-williams avatar Jun 20 '25 18:06 sean-r-williams

#38 (currently a draft) handles packed module support, but support for compressing the runtime will need more expertise.

It seems like provided.al2023 has limited support for tar (though libarchive is mentioned, which should have BSD tar?). Not currently clear if there's a better way to consume tar.gz files with what's in-box on a Lambda runner container.

sean-r-williams avatar Jun 24 '25 03:06 sean-r-williams

@sean-r-williams - before your PR moves further, thought I'd provide my high level thoughts on the design here, and detail what I'm thinking. Might be quicker to iterate here to ensure alignment.

Use $env:PSModulePath for the folders to search

The reason here is existing build scripts can be used, but simply add a module compression to them, ensuring .zip files are in the root of the module path being used.

This will also allow more than one layer to provide compressed modules. For example, I created two layers with zip files at the root containing modules and listed the zip files.

# Runtime
$files = $env:PSModulePath.Split(':') |
    Where-Object { [System.IO.Directory]::Exists($_) } |
    ForEach-Object -Parallel {
        $resultList = [System.Collections.Generic.List[string]]::new()
        $null = $resultList.AddRange([System.IO.Directory]::GetFiles($_, "*.nupkg"))
        $null = $resultList.AddRange([System.IO.Directory]::GetFiles($_, "*.zip"))
        $resultList
    } -ThrottleLimit 10
ConvertTo-Json -InputObject $files -Compress

# Output
[
  "/opt/modules/AWS.Tools.Common.zip",
  "/opt/modules/AWS.Tools.S3.zip"
]

To support this, the directories used in Set-PSModulePath could be set to a module scoped variable for use. Eg:

$script:MODULE_PATH_PWSH = '/opt/powershell/modules' # Modules supplied with pwsh
$script:MODULE_PATH_LAMBDA_LAYERS = '/opt/modules' # User supplied modules as part of Lambda Layers
$script:MODULE_PATH_LAMBDA_FUNCTION = [System.IO.Path]::Combine($env:LAMBDA_TASK_ROOT, 'modules') # User supplied modules as part of function package
$script:MODULE_PATH_UNPACKED_MODULES = '/tmp/_unpacked_modules/' # Modules that have been unpacked during bootstrap

Zip file format

Customers may compress modules into zip files with the .psd1 file in various locations. If found, before moving the module to the appropriate folder, the .psd1 may need to be found and processed to obtain the correct module name and module version.

Eg:

# Top level
module_name.psd1

# Module level
module_name/module_name.psd1

# Version level
module_version/module_name.psd1

# Module and version
module_name/module_version/module_name.psd1

Improved testing by moving all logic into a single wrapped module function

In the bootstrap file, the logic could have a single call that returns a bool if something was processed. Currently the tests don't have unit coverage of the bootstrap file; this is only handled when the integration tests run. Moving all logic into the module can simplify test coverage.

# Process any packed modules. Will have a max of three folders evaluated (non-recursively) for `.zip` and `.nupkg` files.
$packedModulesFound = Initialize-PackedModules

# Define whether they should be included
Set-PSModulePath -SupportPackedModules $packedModulesFound

Allow environment variable override to disable

This adds minimal overhead (using the test script above, executions with that in the actual Lambda handler took ~30-90ms), but if customers really didn't want this, they could prevent the evaluation entirely by setting an environment variable. Perhaps setting $env:DISABLE_PACKED_MODULE_PROCESSING to any value.

Eg:

# in bootstrap
if (-not $env:DISABLE_PACKED_MODULE_PROCESSING) {
  # Write the verbose log statement indicating processing of packed modules
  $packedModulesFound = Initialize-PackedModules
}

austoonz avatar Jul 21 '25 15:07 austoonz

@austoonz I haven't had an opportunity to read this in full, but I largely agree with being able to disable this functionality via envvar.

For the file formats, I think there's been a miscommunication. .nupkg and .zip were meant to fulfill two separate needs, as:

  • PSResourceGet added a -AsNuPkg param to Save-PSResource, so grabbing nuget packages from a feed is even easier.
  • Using nuget packages (versus directing people to zip up individual modules themselves) means we can defer version/directory normalization to PSResourceGet. All of the psd1 processing you're referring to is handled automatically by PSRG.
  • Multiple per-module archives (i.e. nupkgs) will invariably have lower efficiency than a single zip bundle.

There's an inherent trade-off between layer support/PSRG integration (for nupkg) and flexibility/space efficiency (for zip), which is why both are supported.

More explicitly:

  • nupkg contains an individual (module, version) artifact and is installed as an individual module version.
  • zip contains a collection of artifacts (just like how a folder in $env:PSModulePath does) and is "installed" by dumping that folder into a place we put on PSModulePath.
    • It's not intended for individual modules, though you could use it as such (just like how folders in PSModulePath work with just one module included)

The example provided (an archive per-module) would be expected to use .nupkgs. I'm open to changing where we store these (e.g. /opt/modules/AWS.Tools.Common.4.1.618.nupkg instead of /opt/module-nupkgs/AWS.Tools.Common.4.1.618.nupkg) or where modules.zip is pulled from, but the difference in behavior between the two is because they have different intended use cases.

sean-r-williams avatar Jul 22 '25 20:07 sean-r-williams

Agree on the two different use-cases, but I also think there's use-case for where multiple layers could each provide .zip files containing modules. Both use-cases should support multiple layers providing packed modules where:

  • One or more .nupkg files can exist, whether in the function root, or across n number of Lambda layers. Each .nupkg file contains only one module.
  • One or more .zip files can exist, whether in the function root, or across n number of Lambda layers. Each .zip file can contain one or more modules.

If not the same paths as $env:PSModulePath, then perhaps just a well named path that can be documented. Perhaps the task root folder and /opt/packed-modules are the only supported paths for packed modules.

For example, that could have:

$env:LAMBDA_TASK_ROOT/modules/MyCustomModule_1.0.nupkg
$env:LAMBDA_TASK_ROOT/modules/modules_from_function.zip
/opt/packed-modules/AWS.Tools.Common.4.1.618.nupkg
/opt/packed-modules/modules_from_layer_1.zip # with one or more modules
/opt/packed-modules/modules_from_layer_2.zip # with one or more modules

austoonz avatar Jul 22 '25 20:07 austoonz

Agree on the two different use-cases, but I also think there's use-case for where multiple layers could each provide .zip files containing modules. Both use-cases should support multiple layers providing packed modules where:

  • One or more .nupkg files can exist, whether in the function root, or across n number of Lambda layers. Each .nupkg file contains only one module.
  • One or more .zip files can exist, whether in the function root, or across n number of Lambda layers. Each .zip file can contain one or more modules.

If not the same paths as $env:PSModulePath, then perhaps just a well named path that can be documented. Perhaps the task root folder and /opt/packed-modules are the only supported paths for packed modules.

Okay, I can add support for multiple .zip archives. That does slightly complicate conflict-resolution between modules across different ZIPs (unpacking them over the top of each-other would potentially go sideways), but it should be doable nonetheless.

How do you feel about packed-modules as a subdir for both layer- and task-root sourced archives? A single named subdirectory for both of the change interfaces customers control (layers and the function root) seems like it'd be easiest for people to understand.

This would mean the following would all work:

$env:LAMBDA_TASK_ROOT/packed-modules/MyCustomModule_1.0.nupkg
$env:LAMBDA_TASK_ROOT/packed-modules/modules_from_function.zip
/opt/packed-modules/AWS.Tools.Common.4.1.618.nupkg
/opt/packed-modules/modules_from_layer_1.zip # with one or more modules
/opt/packed-modules/modules_from_layer_2.zip # with one or more modules

sean-r-williams avatar Jul 22 '25 23:07 sean-r-williams

That "packed_modules" works for me.

austoonz avatar Jul 22 '25 23:07 austoonz

Any chance this can get released?

normelton avatar Sep 30 '25 02:09 normelton

FWIW, my workaround was to deploy the compressed .nupkg file in my layer. When the function starts, it uncompresses (Expand-Archive) the module into /tmp/modules since that is the ephemeral Lambda storage. It then adds that path to the PSModulePath. Dirty, but it works!

normelton avatar Sep 30 '25 03:09 normelton