Implemented Markup improvements for Google

rrlevering

Google
Hi, my name is Ryan Levering and I currently handle structured data ingestion at Google (this guy). I've been doing some fairly spontaneous spot checks (not based on any specific problem) of some of the major forum software markups on the web just to see whether the markup is being generated in an ideal way for our systems to ingest. You can stick a URL in http://validator.schema.org to get an idea of what your markup for a given URL looks like. I have a couple of high-level suggestions to take or leave as you see fit:
  1. Include more than the OP with "http://schema.org/Comment" nodes through a "http://schema.org/comment" property. We're trying to normalize the forum markup space to use DiscussionForumPosting for the OP and attach Comment typed markup for the replies in a flat list (or threaded if you are a threaded forum). Without that, it makes it harder to segment the rest of the page appropriately in our index.
  2. Co-typing WebPage and DiscussionForumPosting like you do is going to confuse our ingestion a bit. If you squint it's not that inaccurate, but it would be clearer to either have WebPage (separate node) -> mainEntity -> DiscussionForumPosting or DiscussionForumPosting -> mainEntityOfPage -> WebPage (separate node). A co-typed self-cycle needs to be detected specially often.
  3. Include profile URLs in your author -> Person nodes. Raw names are not nearly as useful for disambiguation.
There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).
 
Upvote 41
This suggestion has been implemented. Votes are no longer accepted.
Hi, my name is Ryan Levering and I currently handle structured data ingestion at Google (this guy). I've been doing some fairly spontaneous spot checks (not based on any specific problem) of some of the major forum software markups on the web just to see whether the markup is being generated in an ideal way for our systems to ingest. You can stick a URL in http://validator.schema.org to get an idea of what your markup for a given URL looks like. I have a couple of high-level suggestions to take or leave as you see fit:
  1. Include more than the OP with "http://schema.org/Comment" nodes through a "http://schema.org/comment" property. We're trying to normalize the forum markup space to use DiscussionForumPosting for the OP and attach Comment typed markup for the replies in a flat list (or threaded if you are a threaded forum). Without that, it makes it harder to segment the rest of the page appropriately in our index.
  2. Co-typing WebPage and DiscussionForumPosting like you do is going to confuse our ingestion a bit. If you squint it's not that inaccurate, but it would be clearer to either have WebPage (separate node) -> mainEntity -> DiscussionForumPosting or DiscussionForumPosting -> mainEntityOfPage -> WebPage (separate node). A co-typed self-cycle needs to be detected specially often.
  3. Include profile URLs in your author -> Person nodes. Raw names are not nearly as useful for disambiguation.
There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).

Thank you very much for your insight and I hope it gets included in the next release! Great suggestion!
 
Not sure how i missed this thread for over a month.

super critical. @rrlevering thank you for taking the time to report the issues and if you have any other tips for the xf devs to help forums stay relevant, please do contribute them. We are all so heavily dependent on google. And let's just say, it hasn't been nice to us forum runners over the years as the social apps have taken vast audience away from us.
 
Hi, my name is Ryan Levering and I currently handle structured data ingestion at Google (this guy). I've been doing some fairly spontaneous spot checks (not based on any specific problem) of some of the major forum software markups on the web just to see whether the markup is being generated in an ideal way for our systems to ingest. You can stick a URL in http://validator.schema.org to get an idea of what your markup for a given URL looks like. I have a couple of high-level suggestions to take or leave as you see fit:
  1. Include more than the OP with "http://schema.org/Comment" nodes through a "http://schema.org/comment" property. We're trying to normalize the forum markup space to use DiscussionForumPosting for the OP and attach Comment typed markup for the replies in a flat list (or threaded if you are a threaded forum). Without that, it makes it harder to segment the rest of the page appropriately in our index.
  2. Co-typing WebPage and DiscussionForumPosting like you do is going to confuse our ingestion a bit. If you squint it's not that inaccurate, but it would be clearer to either have WebPage (separate node) -> mainEntity -> DiscussionForumPosting or DiscussionForumPosting -> mainEntityOfPage -> WebPage (separate node). A co-typed self-cycle needs to be detected specially often.
  3. Include profile URLs in your author -> Person nodes. Raw names are not nearly as useful for disambiguation.
There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).
Hello Ryan!

Thank you so much for taking the time to make these suggestions.

Since the very beginning XenForo has continuously adapted to stay as close to best practices as humanly possible and make our output as friendly to search engines as we possibly can. There's always more to do and it's difficult to keep track of so having an expert such as yourself guide the way has been super useful.

With that in mind, we really appreciate your efforts.

@Jeremy P has done some work on this which we've just implemented and rolled out here.

In summary, just copying his notes so I don't have to rewrite them:
  • Remove duplicate mainEntity nodes
  • Include IDs and URLs in author metadata (both JSON-LD and Microdata)
  • Move member and resource item structured data into PHP for flexibility
  • Defer escaping XFMG structured data to the template itself
  • Make a few minor adjustments to adhere to current best practices
It also introduces a new \XF\Util\Arr::filterRecursive method for recursively filtering an array and adds a new optional argument to \XF\Util\Arr::filterNull to filter recursively. This is used to filter out null items from structured data in PHP.

We've also introduced comment microdata to posts. I think I had a separate suggestion from @Stuart Wright about providing content tags in the metadata in the keywords field which has also been introduced.

All in all, I hope we've managed to cover off most of what is required and recommended. Please let us know here if you spot any oversights or anything not quite expected.

And feel free to post more suggestions in the future if you have any :)
 
We can certainly look into it further but I'm not entirely sure the Validator isn't wrong.

Our markup:

HTML:
<h4 class="message-name"><a href="/community/members/sdev.97451/" class="username " dir="auto" data-user-id="97451" itemprop="name" data-xf-init="member-tooltip">sdev</a></h4>

The error:

namePerson is not a known valid target type for the name property.
@typePerson
@idhttps://xenforo.com/community/members/sdev.97451/

Pretty sure our markup here is saying that the name property should be sdev i.e. the content of the element that is applied to.

@Jeremy P what do you think?
 
If the element is an a, area, or link element
The value is the resulting URL string that results from parsing the value of the element's href attribute relative to the node document of the element at the time the attribute is set, or the empty string if there is no such attribute or if parsing it results in an error.

That's most likely the reason why the current markup doesn't fully work:
The URL of the user profile (from attribute href) becomes the value.
This URL is also the ID of the Person - hence the error message that Person is not a valid target for the name property of a Person.

HTML:
<h4 class="message-name"><a href="/community/members/sdev.97451/" class="username " dir="auto" data-user-id="97451" itemprop="name" content="sdev"  data-xf-init="member-tooltip">sdev</a></h4>
seems to make the validator happy, eg. does give the correct result - not sure though if this approach would be fully valid.
 
Last edited:
Hello Ryan!

Thank you so much for taking the time to make these suggestions.

Since the very beginning XenForo has continuously adapted to stay as close to best practices as humanly possible and make our output as friendly to search engines as we possibly can. There's always more to do and it's difficult to keep track of so having an expert such as yourself guide the way has been super useful.

With that in mind, we really appreciate your efforts.

@Jeremy P has done some work on this which we've just implemented and rolled out here.

In summary, just copying his notes so I don't have to rewrite them:


We've also introduced comment microdata to posts. I think I had a separate suggestion from @Stuart Wright about providing content tags in the metadata in the keywords field which has also been introduced.

All in all, I hope we've managed to cover off most of what is required and recommended. Please let us know here if you spot any oversights or anything not quite expected.

And feel free to post more suggestions in the future if you have any :)
Thank you @Chris D and @Jeremy P for an update on this topic! It feels so good to receive good news!!
 
There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).
What other things can we do?

This is all very rare direct communication from Google. With the upcoming AI changes, Google has said they want to surface more forum discussions so this is top priority to get right.
 
I noticed that in DiscussionForumPosting that the keywords key is empty. Could you use tags to use as keywords for that key? Running the schema through Bard, it's telling me that we're missing replyCount too.
 
Last edited:
Regular threads look good but articles still have issues.
Those are from mainEntity attributes in post microdata to associate the reply with the original post (which actually does have the supposed missing attributes, just in a separate JSON-LD node). It does look as though we'll want to fetch the type from the given thread type rather than hard-coding it as DiscussionForumPosting, but that doesn't seem to make a difference here.

I'm not entirely sure if there's a better way of associating the comment nodes (in microdata) with their parent node, or if there is just an issue with the way it is parsed by the validator.
 
Top Bottom