spa-github-pages icon indicating copy to clipboard operation
spa-github-pages copied to clipboard

Suggestion - Avoid 404's using meta tag

Open fidian opened this issue 2 years ago • 2 comments

Smashing Magazine wanted to solve the same problem as this repository and I think their solution is fairly elegant as well. It involves a small 404 page.

<script>
  sessionStorage.redirect = location.href;
</script>
<meta http-equiv="refresh" content="0;URL='/REPO_NAME_HERE'">

Because there's the "refresh" header, this gets translated into a 301 response within the browser (ignoring the 404 status code from the server). To use this, the SPA needs a bit of JavaScript at the very beginning to load the route from session storage and use replaceState to update the URL.

<script>
  (function(){
    var redirect = sessionStorage.redirect;
    delete sessionStorage.redirect;
    if (redirect && redirect != location.href) {
      history.replaceState(null, null, redirect);
    }
  })();
</script>

According to my quick searching, Google doesn't penalize 301 redirects nor the refresh header redirects, so this might be a viable alternative without putting the path after a hash.

  • https://www.smashingmagazine.com/2016/08/sghpa-single-page-app-hack-github-pages/ - article
  • https://github.com/csuwildcat/sghpa - repo with code and link to GitHub Pages site featuring the redirection

fidian avatar Dec 27 '23 16:12 fidian

Tried it and it seems to work the same way. But has a lot less code and possibly the mentioned Google benefit. So seems like a better solution.

Edit: I moved to s3 hosting with cloudfront, and what seemed to work in the end was this

It still does redirects with #!, but google search sets these pages as valid in the end. But it takes weeks for google to catch pages up and I think it still sets some as redirect error. But some are indexed

tonisives avatar Jan 11 '24 07:01 tonisives

Also worth referencing https://developers.google.com/search/docs/crawling-indexing/301-redirects#metarefresh

If server-side redirects aren't possible to implement on your platform, meta refresh redirects may be a viable alternative. Google differentiates between two kinds of meta refresh redirects:

  • Instant meta refresh redirect: Triggers as soon as the page is loaded in a browser. Google Search interprets instant meta refresh redirects as permanent redirects.
  • Delayed meta refresh redirect: Triggers only after an arbitrary number of seconds set by the site owner. Google Search interprets delayed meta refresh redirects as temporary redirects.

I'm trying this approach on https://njt1982.github.io/minecraft-item-browser/copper_block as Google didn't like it when I simply served the page using 404.html (basically copied index.html to 404.html during gh-pages deploy).

Hoping this redirect approach makes it happier... Although I'm concerned that it will see all sub-pages as a 301 to the root of the repo. I'd really like each sub page to actually be its own URL. Might need to combine this approach with https://github.com/rafgraph/spa-github-pages... 🤔


EDIT: Although it looks like Google might respect History API, after all... https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics#use-history-api

njt1982 avatar Jun 30 '24 09:06 njt1982