static-html-output icon indicating copy to clipboard operation
static-html-output copied to clipboard

check / parse all sitemaps

Open leonstafford opened this issue 5 years ago • 6 comments

ie, sitemap_index and get all sitemaps URLs

similar to CSSProcessor, parse to get URLs, but don't generate XML from DOMDocument/XMLDocument, just string replace

leonstafford avatar Jun 09 '20 18:06 leonstafford

Also also need to include /wp-content/plugins/wordpress-seo/css/main-sitemap.xsl if that is a sitemap created by Yoast SEO

nhhau avatar Jun 18 '20 01:06 nhhau

@nhhau definitely!

will check against Yoast, SEO Framework and (other popular one I forget name of at the moment @thegulshankumar is using it)

will improve functionality to crawl URLs from the sitemaps and follow them, which should ensure Yoast/other multi-page sitemaps all get included properly

leonstafford avatar Jun 18 '20 13:06 leonstafford

WordPress SEO Plugin – Rank Math 😊

Similar as Yoast, it generates sitemap. image

thegulshankumar avatar Jun 18 '20 13:06 thegulshankumar

That's the one! Thanks @thegulshankumar!

leonstafford avatar Jun 18 '20 13:06 leonstafford

@leonstafford As tested in version 6.6.21,

Yoast Normally, If sitemaps are like this

https://example.com/post-sitemap.xml	2020-08-07 20:39 +00:00
https://example.com/page-sitemap.xml	2017-09-25 04:12 +00:00

It is being crawled, but it is missing a stylesheet located at /wp-content/plugins/wordpress-seo/css/main-sitemap.xsl

If sitemaps are paginated it is not being recognised

add_filter( 'wpseo_sitemap_entries_per_page', 'max_entries_per_sitemap' );

 function max_entries_per_sitemap() {
    return 1;
}
http://example.com/post-sitemap1.xml	        2020-08-30 21:36 +00:00
http://example.com/post-sitemap2.xml	        2020-09-02 21:31 +00:00
http://example.com/post-sitemap3.xml 	2020-09-02 22:01 +00:00
http://example.com/page-sitemap1.xml	2020-08-30 21:55 +00:00
http://example.com/page-sitemap2.xml	2020-08-30 22:38 +00:00
http://example.com/page-sitemap3.xml	2020-08-30 22:38 +00:00

Rank Math

I see sitemap.xml but not anything at sitemap_index.xml

http://example.com/post-sitemap.xml
http://example.com/page-sitemap.xml

Notes

  • RankMath and Yoast both using the same Sitemap path as Yoast.

  • I have disabled default sitemap in both above tests,

add_filter( 'wp_sitemaps_enabled', '__return_false' );

I still see sitemap.xml (It should be 404, and the main sitemap index should be available only at sitemap_index.xml path only). Harmless.

thegulshankumar avatar Sep 02 '20 22:09 thegulshankumar

Update: I tried Slim SEO plugin, It produces XML file at the /sitemap.xml location

Only the first sitemap.xml file is coming in export. I tried as .zip

List of missing files

/sitemap-post-type-post.xml
/sitemap-post-type-page.xml
/sitemap-taxonomy-category.xml 
/wp-content/plugins/slim-seo/src/Sitemaps/style.xsl
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="http://example.com/wp-content/plugins/slim-seo/src/Sitemaps/style.xsl"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
	<sitemap>
		<loc>http://example.com/sitemap-post-type-post.xml</loc>
	</sitemap>
	<sitemap>
		<loc>http://example.com/sitemap-post-type-page.xml</loc>
	</sitemap>
	<sitemap>
		<loc>http://example.com/sitemap-taxonomy-category.xml</loc>
	</sitemap>
</sitemapindex>

thegulshankumar avatar Sep 04 '20 19:09 thegulshankumar