Allow content to expire
The purpose of this PR is to allow expired content on the nightly build Flink website to expire.
Currently, flink-web does not explicitly specify ExpiresActive in .htaccess, and therefore content expiration is disabled by default. CDN or users' web browser might still serve outdated content even after the content has been removed.
For example, this Flink ML web link still serves a page that has been deleted from the Flink ML repo even though the URL says "flink-ml-docs-master".
This PR fixes this problem by making the following changes:
- Enable expiration by setting
ExpiresActive on - Set
text/htmltyped content to expire/refresh once every hour. - Set all other content (e.g.
image/jpg) to expire/refresh once every day.
See [1] for a discussion of similar issues and the suggestion by Apache infra team. See [2] for documentation of the HTTP directives added in this PR. See [3] for a detailed explanation of the modification directive.
[1] https://issues.apache.org/jira/browse/INFRA-18519 [2] https://github.com/apache/echarts-website/blob/asf-site/.htaccess [3] https://stackoverflow.com/questions/562802/cache-expire-control-with-last-modification
@MartijnVisser Do you have time to review this PR?
Do you have time to review this PR?
Sure. I do think that there's a different issue. This .htaccess file is only used on https://flink.apache.org project website, but not for the documentation that's build on https://nightlies.apache.org/flink.
I don't immediately see a workflow for building the flink-ml docs: where is that done?
Edit: I see tools/ci/docs.sh but I don't see any workflow triggering that?
@MartijnVisser Thanks for the comments.
flink-ml docs is built by this script https://github.com/apache/infrastructure-bb2/blob/master/flink-ml.py. This script is executed every day by a build bot whose status can be found by searching "flink ml" at https://ci2.apache.org/#/builders.
If you are also not sure where to find/update .htaccess for https://nightlies.apache.org/flink, do you know who might know the answer? If none of us know, maybe I should create a JIRA for the Apache infra team.
flink-ml docs is built by this script https://github.com/apache/infrastructure-bb2/blob/master/flink-ml.py. This script is executed every day by a build bot whose status can be found by searching "flink ml" at https://ci2.apache.org/#/builders.
@lindong28 I'm wondering if there's something wrong in the rsync step, that causes the file that serves https://nightlies.apache.org/flink/flink-ml-docs-master/docs/try-flink-ml/quick-start/ not to be removed. I think it's best to file a Jira for it. For the Flink repo, we've moved away from buildbot to https://github.com/apache/flink/blob/master/.github/workflows/docs.yml