almanac.httparchive.org icon indicating copy to clipboard operation
almanac.httparchive.org copied to clipboard

SEO 2022 Queries

Open csliva opened this issue 3 years ago • 3 comments

Progress on #2888

Contents of PR are duplicated from the Google doc outline

Robots.txt

  • [ ] mobile/desktop response codes. More or less 200s than last year & 2020?
  • [ ] How many sites completely block crawling compared to last year & 2020. % of websites which completely block Googlebot / Bingbot.
  • [x] File size any over the 500KiB limit.

MetaRobots

  • [x] Meta robots vs x-robots usage. Desktop/mobile differences. Change since last year. +image
  • [x] What values are returned? Noindex, nofollow, nosnippet, max-image-preview, none, all, etc. Are values changed when rendered, especially index/noindex?
  • [x] New indexifembedded meta tag usage – is it being used, any trend in where it’s being used or how?
  • [x] Invalid elements in - Google’s released new documentation on this. Assess % of sites that are using invalid elements in

Canonicals

  • [x] HTML Head vs HTTP header implementation, any in the body tags unrendered?
  • [x] Has canonical, canonicals to self or other page +image
  • [x] Usage of relative vs absolute vs schemaless URL?
  • [x] Any changes that happen to canonicals when rendering? Change in page or added or removed? Any moved to the body tag when rendered?
  • [x] How many pages has more than one canonical tag specified?

Page Experience

  • [x] Core web vitals scores - pass/fail broken down by FCP/CLS/TBT/LCP/INP/TTFB – any trends? % that passed / failed mobile friendly test.
  • [x] Mobile vs desktop (as desktop is now a factor)
  • [ ] CDN - number of websites using CDN / particular types of CDN
  • [x] HTTPS desktop vs. mobile comparison YoY
  • [x] Viewport meta tag usage, desktop/mobile, yearly changes
  • [x] CSS media queries, desktop/mobile, yearly changes
  • [x] Vary user-agent usage, desktop/mobile +image
  • [x] Legible font sizes - lighthouse, most common text sizes
  • [x] Unused CSS, JS + Image

Images

  • [x] elements
  • [x] Alt attributes

CMS Platforms

  • [ ] most popular CMS types (which can lead into individual chapter)

On Page

  • [x] Page titles - Number of missing page titles. Page titles too long / too short.
  • [x] Meta descriptions: number of sites which have inputted meta descriptions. Too long/too short split.
  • [x] Keywords - % of websites using meta keywords.
  • [x] Word count - avg. word count on pages vs. last year & 2020.
  • [x] Average # of heading tags, breakdown per type (h2/h3/etc)

Server

  • [ ] most popular servers used. Types of server (dedicated/shared/cloud).
  • [ ] Server-side rendering (SSR) vs. client-side rendering (CSR) - % of websites using SSR vs. CSR
  • [ ] % of service-side rendering that’s dynamic

Links

  • [x] do follow/no follow split (note for author - tie in something around digital PR and increase in nofollow links as digital PR has trended in the last year if the data matches this theory),
  • [x] rel=ugc & rel=sponsored - % of links using these attributes
  • [ ] anchor text usage
  • [x] number of external links (copying from last year)

Structured data

  • [ ] Is the page using structured data? Is there a correlation in most used type of structured data by industry?
  • [ ] What % of sites implement structured data with more than one method e.g. JSON-LD & microdata
  • [ ] Can we split this by CMS? To see which CMS is adding what markup by default (potentially) e.g. on Wix if you have a physical address on the homepage, you’ll automatically have local business structured data added. % split between manual & automatic implementation.

Mobile vs. Desktop

  • [ ] Word count
  • [x] Links
  • [ ] Structured data
  • [ ] Hreflang +image
  • [ ] Meta robots
  • [ ] Alt attributes on images +image
  • [ ] AMP
  • [ ] AMP. Usage. AMP-only. Vs 2021 & 2020. Desktop/Mobile.

Rendered vs non-rendered differences?

  • [ ] Word count
  • [ ] Things missing and then not, or things changed? Titles, meta descriptions, robots meta tags, structured data, links, etc.

Internationalisation

  • [x] Content-language usage. Headers vs meta tags for content language also.
  • [x] Most common values of the hreflang attribute

csliva avatar Jun 20 '22 20:06 csliva

@derekperkins @csliva @jroakes How are the queries coming along?

foxdavidj avatar Jul 22 '22 18:07 foxdavidj

Hey @foxdavidj I'm going to get cracking on this and hopefully finish the work by deadline end of month. I'll also ping @jroakes and @derekperkins to see what their bandwidth looks like.

csliva avatar Jul 26 '22 14:07 csliva

Hey @csliva how are we getting on with this?

SophieBrannon avatar Aug 09 '22 13:08 SophieBrannon

@csliva Are the checkboxes in the PR description up to date?

foxdavidj avatar Aug 12 '22 17:08 foxdavidj