flink-web icon indicating copy to clipboard operation
flink-web copied to clipboard

Allow content to expire

Open lindong28 opened this issue 2 years ago • 4 comments

The purpose of this PR is to allow expired content on the nightly build Flink website to expire.

Currently, flink-web does not explicitly specify ExpiresActive in .htaccess, and therefore content expiration is disabled by default. CDN or users' web browser might still serve outdated content even after the content has been removed.

For example, this Flink ML web link still serves a page that has been deleted from the Flink ML repo even though the URL says "flink-ml-docs-master".

This PR fixes this problem by making the following changes:

  • Enable expiration by setting ExpiresActive on
  • Set text/html typed content to expire/refresh once every hour.
  • Set all other content (e.g. image/jpg) to expire/refresh once every day.

See [1] for a discussion of similar issues and the suggestion by Apache infra team. See [2] for documentation of the HTTP directives added in this PR. See [3] for a detailed explanation of the modification directive.

[1] https://issues.apache.org/jira/browse/INFRA-18519 [2] https://github.com/apache/echarts-website/blob/asf-site/.htaccess [3] https://stackoverflow.com/questions/562802/cache-expire-control-with-last-modification

lindong28 avatar Jul 19 '23 01:07 lindong28

@MartijnVisser Do you have time to review this PR?

lindong28 avatar Jul 19 '23 02:07 lindong28

Do you have time to review this PR?

Sure. I do think that there's a different issue. This .htaccess file is only used on https://flink.apache.org project website, but not for the documentation that's build on https://nightlies.apache.org/flink.

I don't immediately see a workflow for building the flink-ml docs: where is that done? Edit: I see tools/ci/docs.sh but I don't see any workflow triggering that?

MartijnVisser avatar Jul 19 '23 15:07 MartijnVisser

@MartijnVisser Thanks for the comments.

flink-ml docs is built by this script https://github.com/apache/infrastructure-bb2/blob/master/flink-ml.py. This script is executed every day by a build bot whose status can be found by searching "flink ml" at https://ci2.apache.org/#/builders.

If you are also not sure where to find/update .htaccess for https://nightlies.apache.org/flink, do you know who might know the answer? If none of us know, maybe I should create a JIRA for the Apache infra team.

lindong28 avatar Jul 19 '23 16:07 lindong28

flink-ml docs is built by this script https://github.com/apache/infrastructure-bb2/blob/master/flink-ml.py. This script is executed every day by a build bot whose status can be found by searching "flink ml" at https://ci2.apache.org/#/builders.

@lindong28 I'm wondering if there's something wrong in the rsync step, that causes the file that serves https://nightlies.apache.org/flink/flink-ml-docs-master/docs/try-flink-ml/quick-start/ not to be removed. I think it's best to file a Jira for it. For the Flink repo, we've moved away from buildbot to https://github.com/apache/flink/blob/master/.github/workflows/docs.yml

MartijnVisser avatar Jul 20 '23 07:07 MartijnVisser