ITI: Javascript Implementation of AMP Cache Transformer
Summary
The current Google AMP Cache requirements effectively mandate a specific implementation of the AMP transforms (the AMP Packager, currently written in Go). We’d like to create a standalone, JavaScript version of the transforms that could be used with any available SXG packaging logic.
Motivation
By introducing a standalone package, this would decouple the logic of the Google AMP cache transforms from that of the signed exchanges. This would introduce a new level of flexibility, and allow for wider adoption of the technology.
Additionally, a javascript version of the AMP cache transforms would integrate much more seamlessly with existing web pipelines, without requiring a standalone external service. Javascript would allow us to implement these transforms as a final step in our web server, which renders AMP for users.
There currently exists a subset of these transforms already written in JavaScript, in the AMP Toolbox → Optimizer repo. These are not fully complete, however,
Design
The main requirement is to ensure a way of keeping the JavaScript and Go versions in sync. The actual implementation will dictate a lot of the difficulty - we would need to set up some form of shared testing infrastructure that ensures any changes made to one version gets reflected in the other.
We see three main paths forward in terms of where this implementation will live:
- Within AMP Packager
- Within AMP Toolbox → Optimizer
- As a standalone repo
| Pros | Cons | |
|---|---|---|
| AMP Packager | Single source of truth for code and logicEasier to create a shared testing infrastructure | Polluting the same repo with two languages and two implementations |
| AMP Toolbox | Add on to the existing AMP transformsLeverage existing work and framework for adding new transformsAlready implemented and used by industry | Expands scope of existing toolsetHarder to keep implementation in sync with AMP Packager |
| Standalone Repo | Might be the right level of abstraction for a new codebaseDoes not affect technical direction of the AMP toolbox library | Harder to keep implementation in sync with AMP Packager |
| Standalone Repo + Go Cache transforms | Single source of truth of transformsBest of both worlds between a standalone repo and having the code live in AMP packager | Requires splitting up the AMP packager repo |
We believe that having the code live in the AMP Packager repo is the best way forward, although we are open to a discussion. Sharing the code in the same repo will have the highest likelihood of making sure the versions are in sync, and is the easiest way to set the tests up in such a way that they can be shared.
We could also split the AMP Packager, such that the Go and Javascript versions of the transforms live in their own “AMP Transforms” repo, and the AMP Signed Exchange live in separate repos.
We are open to options depending on which direction the AMP packager maintainers prefer.
@nainar @twifkak @jeffjose
related: https://github.com/envoyproxy/envoy/issues/9763
if we built out the proposal in this ITI, an envoy filter for signed exchange becomes pretty simple: it's basically just wrapping libsxg
AMP Optimizer maintainer here 👋. Thanks a lot for the write-up!
AMP Optimizer is the "official" JS based AMP transformer implementation. Hence I'd prefer to include any changes in there instead of creating a new JS based transformer implementation. Otherwise we'd need to maintain to JS based transformer implementations which would mean an additional maintenance burden I'd really like to avoid.
The important thing to note is that in the near future, transformers won't need to produce byte compatible AMPHTML to the AMPPackager. This will greatly simplify things and means that transformers only need to produce valid AMP. So you will be able to use AMP Optimizer out of the box. However, for performance reasons, asset URL rewriting to cdn.ampproject.org is critical in the SXG case and should be added to AMP Optimizer. This could be enabled via a flag as an "AMPPacker compatibility mode".
There's also the bigger problem of keeping all AMP transformers in sync. We currently support three different versions:
- Go-based which is part of AMPPackager.
- Node-based AMP Optimizer
- PHP-based AMP Optimizer used by the AMP Wordpress Plugin
The Node and PHP Optimizer already share the same test suite, but this is something we should extract and formalize with a standalone (versioned) spec and test suite.
Thanks for the context! This is great news.
I filed https://github.com/ampproject/amp-toolbox/issues/862 in amp-toolbox. If asset rewriting is all that's required we can probably get this out the door pretty quickly! 😄
@ampproject/wg-caching for general feedback. Overall, I'm pretty excited about this. I agree that maintenance is the problem to address here.
If AMP Optimizer is the place where this code lives, then @sebastianbenz is probably the best authority on how to keep up with AMP transformer changes. But if there's movement on a common test suite, I'm interested to follow.
I'll add to the list of AMP transformer variants: the AMP Cache transforms. Currently, the Google AMP Cache runs a closed-source set of C++ transforms. I'm not sure what the Bing AMP Cache does, but I suspect similar, since it was launched before the Go transforms were released. These internal transforms are also slightly different:
- some transforms are not exported to the packager (e.g. stripping some extra tags to protect the cache origin)
- some transforms have to take special care to work on packaged docs (dunno any examples offhand, but likewise packager has to run atop optimized AMP; e.g. don't re-add boilerplate and don't double-rewrite URLs)
- some transforms don't exist internally (e.g. stripjs which only exists as a defense-in-depth since the packager might run on invalid AMP docs; this won't happen internally)
None of that is to block anything; just something to keep in mind if working towards a full solution. I hope ampproject/amp-toolbox#862 is sufficient in the meantime!
Also, I'll add a couple of things that have to keep up with AMP validator changes. These are only tangentially related, but may serve as inspiration:
- The AMP WordPress plugin includes a PHP-based sanitizer, and seems to update roughly monthly from the validator protoascii.
- The Java AMP validator will also need to follow validator protoascii, though it looks like there's no regular process for that yet. Note that it was moved to its own repo because issues keeping it up-to-date were causing Travis failures that blocked others.