Implemented Markup improvements for Google

rrlevering

Google
Hi, my name is Ryan Levering and I currently handle structured data ingestion at Google (this guy). I've been doing some fairly spontaneous spot checks (not based on any specific problem) of some of the major forum software markups on the web just to see whether the markup is being generated in an ideal way for our systems to ingest. You can stick a URL in http://validator.schema.org to get an idea of what your markup for a given URL looks like. I have a couple of high-level suggestions to take or leave as you see fit:
  1. Include more than the OP with "http://schema.org/Comment" nodes through a "http://schema.org/comment" property. We're trying to normalize the forum markup space to use DiscussionForumPosting for the OP and attach Comment typed markup for the replies in a flat list (or threaded if you are a threaded forum). Without that, it makes it harder to segment the rest of the page appropriately in our index.
  2. Co-typing WebPage and DiscussionForumPosting like you do is going to confuse our ingestion a bit. If you squint it's not that inaccurate, but it would be clearer to either have WebPage (separate node) -> mainEntity -> DiscussionForumPosting or DiscussionForumPosting -> mainEntityOfPage -> WebPage (separate node). A co-typed self-cycle needs to be detected specially often.
  3. Include profile URLs in your author -> Person nodes. Raw names are not nearly as useful for disambiguation.
There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).
 
Upvote 41
This suggestion has been implemented. Votes are no longer accepted.
Not sure how i missed this thread for over a month.

super critical. @rrlevering thank you for taking the time to report the issues and if you have any other tips for the xf devs to help forums stay relevant, please do contribute them. We are all so heavily dependent on google. And let's just say, it hasn't been nice to us forum runners over the years as the social apps have taken vast audience away from us.
Took me two months to see this post...

Thank you Ryan (@rrlevering). Many of us in here work very hard to keep our forum threads ranking well in Google. Forums tend to get forgotten with all of the blogger fluff that has flooded the search results over the years, so any recommendations on ways we can keep being found are always greatly appreciated from our side.
 
Not necessarily, but Google doesn't seem do anything with it (it's not one of their supported properties) and I imagine it can be derived from the page contents anyway. Duplicating entire message bodies in JSON-LD feels excessive, so we'd probably just switch to using microdata if it ever made a difference to Google.
 
I totally didn't notice the replies on this thread, I'm confused why I didn't get updates. Regardless, apologies for not being around during the action.

And most importantly, thanks for making changes. It heartens me to see forum platforms moving to expose their content better. I really would like to see this content more in search results rather than low quality, over-optimized rehash articles.

Several replies to previous conversations:
Not necessarily, but Google doesn't seem do anything with it (it's not one of their supported properties)
DiscussionForumPosting/Comment markup is not the same as Article markup. There are a couple of overlaps (like we'll use dates and such) but the planned recommendations are much more extensive than Article.
Should articleBody value in DiscussionForumPosting be truncated?
Surprisingly, articleBody is actually useful. One of the problems we have is proper post segmentation in the midst of other formatting on the page. Especially if you are trying for very high precision levels and in the presence of things like quotes/inline replies.
The URL of the user profile (from attribute href) becomes the value.
This is the single biggest mistake users of Microdata make on the web, the href link of the A tag is used as the contents unexpectedly.

Ok, now on the meat of my reply. We are still struggling with some structure issues with these pages.
  1. Comment linkage: The generated code is trying to use an inverse comment property to do the association: Comment -mainEnity-> DiscussionForumPosting (DFP). This is likely because XenForo is generating the DFP via JSON-LD and the Comment with Microdata. That would be fine (and looks somewhat good in validator.schema.org) but a) the truth is we don't merge JSON-LD and Microdata node ID spaces currently and b) we currently don't support DFP <-> Comment linkages via mainEntity if we did. I'm going to see whether I can unblock these issues on our side horizontally, because I don't really have a good suggestion ATM that doesn't involve XenForo moving the DFP to Microdata and adding comment links to the underlying comments. Which doesn't feel necessary especially until I have a better schema recommendation.
  2. Main entity of page: The JSON-LD DFP generation should annotate the DFP with mainEntityOfPage -> <url> or a WebPage subtype with url: <url>. That's going to be safer and right now because of #1 and because we don't have logic like DFP always trumps Comment (which I might go play around with now), we just see a linear bunch of posts in Microdata and JSON-LD. So strangely without this, adding the Comments hurt some of our calculation of the main content (though segmentation was still likely improved).
 
DiscussionForumPosting/Comment markup is not the same as Article markup. There are a couple of overlaps (like we'll use dates and such) but the planned recommendations are much more extensive than Article.
Thanks for the clarification. My understanding from Schema.org was that DiscussionForumPosting inherits the articleBody property from Article, and Google only publishes recommendations for the latter thus far. We do also have different thread types, including an article type which uses Article rather than DiscussionForumPosting.

Surprisingly, articleBody is actually useful. One of the problems we have is proper post segmentation in the midst of other formatting on the page. Especially if you are trying for very high precision levels and in the presence of things like quotes/inline replies.
We can duplicate the full text body if it is useful. There are some technical challenges with using Microdata for the DFP. Namely we don't always have the necessary contextual information for each property in the view layer. It's much simpler to extract the properties from the model layer.

The generated code is trying to use an inverse comment property to do the association: Comment -mainEnity-> DiscussionForumPosting (DFP). This is likely because XenForo is generating the DFP via JSON-LD and the Comment with Microdata. That would be fine (and looks somewhat good in validator.schema.org) but a) the truth is we don't merge JSON-LD and Microdata node ID spaces currently and b) we currently don't support DFP <-> Comment linkages via mainEntity if we did. I'm going to see whether I can unblock these issues on our side horizontally, because I don't really have a good suggestion ATM that doesn't involve XenForo moving the DFP to Microdata and adding comment links to the underlying comments. Which doesn't feel necessary especially until I have a better schema recommendation.
That would be nice. I knew that JSON-LD and Microdata schemas were not merged, but not that they had separate node ID spaces. Given the challenges with using Microdata for the DFP and the desire not to duplicate comment text bodies for JSON-LD, it would be very useful to have some means to link between the two.

The JSON-LD DFP generation should annotate the DFP with mainEntityOfPage -> <url> or a WebPage subtype with url: <url>.
I will go ahead and make this change for the next release.

Thank you again for the insights. Documentation is great but it's incredibly helpful to have real-world feedback.
 
I totally didn't notice the replies on this thread, I'm confused why I didn't get updates. Regardless, apologies for not being around during the action.
When I hit "watch" it defaults to "not recieve email notifications."
You may need to 'unwatch' and 'rewatch' with a 'recieve email notifications' set.
 
And most importantly, thanks for making changes. It heartens me to see forum platforms moving to expose their content better. I really would like to see this content more in search results rather than low quality, over-optimized rehash articles.

it always bothered me to see forum content increasingly ignored by Google starting about 10 or 12 years ago. Most "articles" on Wordpress sites are actually just promotional pages with affiliate links and for that reason the accuracy and usefulness of the content is questionable and likely compromised.

When I want an honest review of a product or service or just want to learn about a particular topic I will go to a forum first because people give you genuine feedback, answer questions, and often share their own expert knowledge.

I hope that Google will once again value forum content for being among the most useful and informative available on the internet.
 
@Jeremy P & @rrlevering
XenForo JSON-LD DFP uses the users avatar or the site logo for propery image - does that make sense at all?

Neither the users avatar nor the site logo seems to be "An image of the item. " (at least not to me).

Also the validated markup for Comment looks a bit strange to me, shouldn't there be just one @type?
1692879918271.webp
 
@Jeremy P & @rrlevering
XenForo JSON-LD DFP uses the users avatar or the site logo for propery image - does that make sense at all?

Neither the users avatar nor the site logo seems to be "An image of the item. " (at least not to me).

That is correct, the https://schema.org/image properties on DFP should only contain any images from the post. It would be against our eventual guidelines to use either user images (clearly wrong) or a site logo (noise). We can always get the site logo from other places if we need it. Feel free to attach the user images to the author object, where they should belong.

Also the validated markup for Comment looks a bit strange to me, shouldn't there be just one @type?
View attachment 290268
This is kinda ugly and redundant but not necessarily incorrect and at least doesn't affect Google's accurate interpretation. I've considered deduplicating in our systems, but then I have to pick a winner to show and right now it comes from all those places so it might add confusion.

Usually it's more common to just type the node on the "main place it's defined" and then link to it without adding an extra type.
 
Getting the schema correct, optimized, and fully featured is very important for search engines now and will be even more important in the future. We see this here with Google but also confirmed by Bing's PM Fabrice Canal earlier this year.
 
Thanks for the clarification. My understanding from Schema.org was that DiscussionForumPosting inherits the articleBody property from Article, and Google only publishes recommendations for the latter thus far. We do also have different thread types, including an article type which uses Article rather than DiscussionForumPosting.


We can duplicate the full text body if it is useful. There are some technical challenges with using Microdata for the DFP. Namely we don't always have the necessary contextual information for each property in the view layer. It's much simpler to extract the properties from the model layer.


That would be nice. I knew that JSON-LD and Microdata schemas were not merged, but not that they had separate node ID spaces. Given the challenges with using Microdata for the DFP and the desire not to duplicate comment text bodies for JSON-LD, it would be very useful to have some means to link between the two.


I will go ahead and make this change for the next release.

Thank you again for the insights. Documentation is great but it's incredibly helpful to have real-world feedback.
To follow up on this, we recently changed our infrastructure to support syntax merging and your forums are now being reasonably parsed. This has been long requested by the semantic web community and this was a nice catalyst. But it would still be better to switch your Comment -> mainEntity -> DiscussionForumPosting predicates to use "parentItem" instead of "mainEntity". The next release of schema.org should have a change to make that valid in schema. mainEntity is a very generic type that more often is used to refer to the actual semantic entity of the topic (like the car being talked about on a forum) rather than structural relationships so parentItem will work better.
 
Just a follow up note, in our upcoming launch of forum reports in Google Search Console, the Comment -> mainEntity -> DiscusionForumPosting link is going to cause error reports. It still will be parsed for a while to not penalize these forums, but only Comment -> parentItem -> DiscussionForumPosting will be recognized as a valid inverse link from Comment by our tooling. So just changing that one property from "mainEntity" to "parentItem" is recommended.
 
Just a follow up note, in our upcoming launch of forum reports in Google Search Console, the Comment -> mainEntity -> DiscusionForumPosting link is going to cause error reports. It still will be parsed for a while to not penalize these forums, but only Comment -> parentItem -> DiscussionForumPosting will be recognized as a valid inverse link from Comment by our tooling. So just changing that one property from "mainEntity" to "parentItem" is recommended.
Thank you! Do you have any more information on what this new "forum" report will be reporting and when it is expected to drop? @Kier can we manually make the change needed above or do we expect 2.2.14 to be released before the new forum report?
 
It's been officially announced with specifications

Forum specifications

Also they added profile page markup

@Jeremy P can we make sure this is all in 2.2.14?
 
Here are the reported issues. We need these addressed asap. This is core to all of our forums survival with Google. Missing text error seems to stem from posts with only an image. Missing url in author may be due to deleted member?issues.png
issue2.png
 
Last edited:
Top Bottom