structured-data icon indicating copy to clipboard operation
structured-data copied to clipboard

Add support for @itemprop="mainEntity" top-level items

Open timvdalen opened this issue 4 years ago • 3 comments

An entity with @itemprop="mainEntity" is also a top-level entity (the primary entity described in the page), per: https://schema.org/mainEntity

If this is not something you want to support in the library, let me know and I'll add a (private) fork to our package repo instead.

timvdalen avatar Mar 04 '21 18:03 timvdalen

Hi, thanks for your PR! I just tried with and without your modifications on the example HTML code located at https://schema.org/mainEntity (example 2).

Currently:

https://schema.org/WebPage
  - https://schema.org/breadcrumb: Books > Literature & Fiction > Classics
  - https://schema.org/mainEntity: (https://schema.org/Book)

With your PR:

https://schema.org/WebPage
  - https://schema.org/breadcrumb: Books > Literature & Fiction > Classics
  - https://schema.org/mainEntity: (https://schema.org/Book)
https://schema.org/Book
  - https://schema.org/image: http://www.example.com/catcher-in-the-rye-book-cover.jpg
  - ...

My remarks:

  • Book is now on the same level as WebPage; is that what we want?
  • Book is now present both at root level, and under WebPage as mainEntity; should it be filtered out from there?

BenMorel avatar Mar 21 '21 13:03 BenMorel

Thanks for your remarks! It's quite possible I've overlooked something or the page I'm testing this against isn't standards compliant.

Before, the mainEntity wasn't returned at all, since it wasn't the child of some other top-level element.

I will take some time tomorrow to test this against both my example and the example HTML code on schema.org and get back to you on your remarks.

timvdalen avatar Mar 21 '21 13:03 timvdalen

Indeed; the difference is that the example I'm using, the mainEntity isn't a child of some other top-level item. As far as I can see from the spec, that should be legal.

Here is the example from https://schema.org/mainEntity edited to reflect the situation that prompted this PR:

<body>
	<div itemprop="mainEntity" itemscope itemtype="https://schema.org/Book">

		<img itemprop="image" src="catcher-in-the-rye-book-cover.jpg"
		     alt="cover art: red horse, city in background"/>
		<span itemprop="name">The Catcher in the Rye</span> -
		<link itemprop="bookFormat" href="https://schema.org/Paperback">Mass Market Paperback
		by <a itemprop="author" href="/author/jd_salinger.html">J.D. Salinger</a>

		<div itemprop="aggregateRating" itemscope itemtype="https://schema.org/AggregateRating">
			<span itemprop="ratingValue">4</span> stars -
			<span itemprop="reviewCount">3077</span> reviews
		</div>

		<div itemprop="offers" itemscope itemtype="https://schema.org/Offer">
			Price: $<span itemprop="price">6.99</span>
			<meta itemprop="priceCurrency" content="USD" />
			<link itemprop="availability" href="https://schema.org/InStock">In Stock
		</div>

		Product details
		<span itemprop="numberOfPages">224</span> pages
		Publisher: <span itemprop="publisher">Little, Brown, and Company</span> -
		<meta itemprop="datePublished" content="1991-05-01">May 1, 1991
		Language: <span itemprop="inLanguage">English</span>
		ISBN-10: <span itemprop="isbn">0316769487</span>

		Reviews:

		<div itemprop="review" itemscope itemtype="https://schema.org/Review">
			<span itemprop="reviewRating">5</span> stars -
			<b>"<span itemprop="name">A masterpiece of literature</span>"</b>
			by <span itemprop="author">John Doe</span>,
			Written on <meta itemprop="datePublished" content="2006-05-04">May 4, 2006
			<span itemprop="reviewBody">I really enjoyed this book. It captures the essential
  challenge people face as they try make sense of their lives and grow to adulthood.</span>
		</div>

		<div itemprop="review" itemscope itemtype="https://schema.org/Review">
			<span itemprop="reviewRating">4</span> stars -
			<b>"<span itemprop="name">A good read.</span>" </b>
			by <span itemprop="author">Bob Smith</span>,
			Written on <meta itemprop="datePublished" content="2006-06-15">June 15, 2006
			<span itemprop="reviewBody">Catcher in the Rye is a fun book. It's a good book to read.</span>
		</div>

	</div>
</body>

Currently, this library doesn't detect any Things in the given snippet, while I think it should.

With that out of the way, I agree that the output for the example you've posted is also not what we want. I can see two basic approaches:

  1. Only return the mainEntity as a top-level Thing if it doesn't have a parent
  2. Filter out Book from WebPage so we don't report it twice

I am happy to update this PR to do either - what do you think is best here?

timvdalen avatar Mar 22 '21 13:03 timvdalen