Suggestion - Avoid 404's using meta tag
Smashing Magazine wanted to solve the same problem as this repository and I think their solution is fairly elegant as well. It involves a small 404 page.
<script>
sessionStorage.redirect = location.href;
</script>
<meta http-equiv="refresh" content="0;URL='/REPO_NAME_HERE'">
Because there's the "refresh" header, this gets translated into a 301 response within the browser (ignoring the 404 status code from the server). To use this, the SPA needs a bit of JavaScript at the very beginning to load the route from session storage and use replaceState to update the URL.
<script>
(function(){
var redirect = sessionStorage.redirect;
delete sessionStorage.redirect;
if (redirect && redirect != location.href) {
history.replaceState(null, null, redirect);
}
})();
</script>
According to my quick searching, Google doesn't penalize 301 redirects nor the refresh header redirects, so this might be a viable alternative without putting the path after a hash.
- https://www.smashingmagazine.com/2016/08/sghpa-single-page-app-hack-github-pages/ - article
- https://github.com/csuwildcat/sghpa - repo with code and link to GitHub Pages site featuring the redirection
Tried it and it seems to work the same way. But has a lot less code and possibly the mentioned Google benefit. So seems like a better solution.
Edit: I moved to s3 hosting with cloudfront, and what seemed to work in the end was this
It still does redirects with #!, but google search sets these pages as valid in the end. But it takes weeks for google to catch pages up and I think it still sets some as redirect error. But some are indexed
Also worth referencing https://developers.google.com/search/docs/crawling-indexing/301-redirects#metarefresh
If server-side redirects aren't possible to implement on your platform, meta refresh redirects may be a viable alternative. Google differentiates between two kinds of meta refresh redirects:
- Instant meta refresh redirect: Triggers as soon as the page is loaded in a browser. Google Search interprets instant meta refresh redirects as permanent redirects.
- Delayed meta refresh redirect: Triggers only after an arbitrary number of seconds set by the site owner. Google Search interprets delayed meta refresh redirects as temporary redirects.
I'm trying this approach on https://njt1982.github.io/minecraft-item-browser/copper_block as Google didn't like it when I simply served the page using 404.html (basically copied index.html to 404.html during gh-pages deploy).
Hoping this redirect approach makes it happier... Although I'm concerned that it will see all sub-pages as a 301 to the root of the repo. I'd really like each sub page to actually be its own URL. Might need to combine this approach with https://github.com/rafgraph/spa-github-pages... 🤔
EDIT: Although it looks like Google might respect History API, after all... https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics#use-history-api