almanac.httparchive.org
almanac.httparchive.org copied to clipboard
SEO 2022 Queries
Progress on #2888
Contents of PR are duplicated from the Google doc outline
Robots.txt
- [ ] mobile/desktop response codes. More or less 200s than last year & 2020?
- [ ] How many sites completely block crawling compared to last year & 2020. % of websites which completely block Googlebot / Bingbot.
- [x] File size any over the 500KiB limit.
MetaRobots
- [x] Meta robots vs x-robots usage. Desktop/mobile differences. Change since last year. +image
- [x] What values are returned? Noindex, nofollow, nosnippet, max-image-preview, none, all, etc. Are values changed when rendered, especially index/noindex?
- [x] New indexifembedded meta tag usage – is it being used, any trend in where it’s being used or how?
- [x] Invalid elements in - Google’s released new documentation on this. Assess % of sites that are using invalid elements in
Canonicals
- [x] HTML Head vs HTTP header implementation, any in the body tags unrendered?
- [x] Has canonical, canonicals to self or other page +image
- [x] Usage of relative vs absolute vs schemaless URL?
- [x] Any changes that happen to canonicals when rendering? Change in page or added or removed? Any moved to the body tag when rendered?
- [x] How many pages has more than one canonical tag specified?
Page Experience
- [x] Core web vitals scores - pass/fail broken down by FCP/CLS/TBT/LCP/INP/TTFB – any trends? % that passed / failed mobile friendly test.
- [x] Mobile vs desktop (as desktop is now a factor)
- [ ] CDN - number of websites using CDN / particular types of CDN
- [x] HTTPS desktop vs. mobile comparison YoY
- [x] Viewport meta tag usage, desktop/mobile, yearly changes
- [x] CSS media queries, desktop/mobile, yearly changes
- [x] Vary user-agent usage, desktop/mobile +image
- [x] Legible font sizes - lighthouse, most common text sizes
- [x] Unused CSS, JS + Image
Images
- [x]
elements
- [x] Alt attributes
CMS Platforms
- [ ] most popular CMS types (which can lead into individual chapter)
On Page
- [x] Page titles - Number of missing page titles. Page titles too long / too short.
- [x] Meta descriptions: number of sites which have inputted meta descriptions. Too long/too short split.
- [x] Keywords - % of websites using meta keywords.
- [x] Word count - avg. word count on pages vs. last year & 2020.
- [x] Average # of heading tags, breakdown per type (h2/h3/etc)
Server
- [ ] most popular servers used. Types of server (dedicated/shared/cloud).
- [ ] Server-side rendering (SSR) vs. client-side rendering (CSR) - % of websites using SSR vs. CSR
- [ ] % of service-side rendering that’s dynamic
Links
- [x] do follow/no follow split (note for author - tie in something around digital PR and increase in nofollow links as digital PR has trended in the last year if the data matches this theory),
- [x] rel=ugc & rel=sponsored - % of links using these attributes
- [ ] anchor text usage
- [x] number of external links (copying from last year)
Structured data
- [ ] Is the page using structured data? Is there a correlation in most used type of structured data by industry?
- [ ] What % of sites implement structured data with more than one method e.g. JSON-LD & microdata
- [ ] Can we split this by CMS? To see which CMS is adding what markup by default (potentially) e.g. on Wix if you have a physical address on the homepage, you’ll automatically have local business structured data added. % split between manual & automatic implementation.
Mobile vs. Desktop
- [ ] Word count
- [x] Links
- [ ] Structured data
- [ ] Hreflang +image
- [ ] Meta robots
- [ ] Alt attributes on images +image
- [ ] AMP
- [ ] AMP. Usage. AMP-only. Vs 2021 & 2020. Desktop/Mobile.
Rendered vs non-rendered differences?
- [ ] Word count
- [ ] Things missing and then not, or things changed? Titles, meta descriptions, robots meta tags, structured data, links, etc.
Internationalisation
- [x] Content-language usage. Headers vs meta tags for content language also.
- [x] Most common values of the hreflang attribute
@derekperkins @csliva @jroakes How are the queries coming along?
Hey @foxdavidj I'm going to get cracking on this and hopefully finish the work by deadline end of month. I'll also ping @jroakes and @derekperkins to see what their bandwidth looks like.
Hey @csliva how are we getting on with this?
@csliva Are the checkboxes in the PR description up to date?