Implemented Markup improvements for Google

rrlevering · Mar 17, 2023

Hi, my name is Ryan Levering and I currently handle structured data ingestion at Google (this guy). I've been doing some fairly spontaneous spot checks (not based on any specific problem) of some of the major forum software markups on the web just to see whether the markup is being generated in an ideal way for our systems to ingest. You can stick a URL in http://validator.schema.org to get an idea of what your markup for a given URL looks like. I have a couple of high-level suggestions to take or leave as you see fit:

Include more than the OP with "http://schema.org/Comment" nodes through a "http://schema.org/comment" property. We're trying to normalize the forum markup space to use DiscussionForumPosting for the OP and attach Comment typed markup for the replies in a flat list (or threaded if you are a threaded forum). Without that, it makes it harder to segment the rest of the page appropriately in our index.
Co-typing WebPage and DiscussionForumPosting like you do is going to confuse our ingestion a bit. If you squint it's not that inaccurate, but it would be clearer to either have WebPage (separate node) -> mainEntity -> DiscussionForumPosting or DiscussionForumPosting -> mainEntityOfPage -> WebPage (separate node). A co-typed self-cycle needs to be detected specially often.
Include profile URLs in your author -> Person nodes. Raw names are not nearly as useful for disambiguation.

There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).

Andy.N · Mar 18, 2023

This is good coming from a Google team member who signed up and made suggestion for XF.

JulianD · Mar 18, 2023

rrlevering said:
Hi, my name is Ryan Levering and I currently handle structured data ingestion at Google (this guy). I've been doing some fairly spontaneous spot checks (not based on any specific problem) of some of the major forum software markups on the web just to see whether the markup is being generated in an ideal way for our systems to ingest. You can stick a URL in http://validator.schema.org to get an idea of what your markup for a given URL looks like. I have a couple of high-level suggestions to take or leave as you see fit:

Include more than the OP with "http://schema.org/Comment" nodes through a "http://schema.org/comment" property. We're trying to normalize the forum markup space to use DiscussionForumPosting for the OP and attach Comment typed markup for the replies in a flat list (or threaded if you are a threaded forum). Without that, it makes it harder to segment the rest of the page appropriately in our index.

Co-typing WebPage and DiscussionForumPosting like you do is going to confuse our ingestion a bit. If you squint it's not that inaccurate, but it would be clearer to either have WebPage (separate node) -> mainEntity -> DiscussionForumPosting or DiscussionForumPosting -> mainEntityOfPage -> WebPage (separate node). A co-typed self-cycle needs to be detected specially often.

Include profile URLs in your author -> Person nodes. Raw names are not nearly as useful for disambiguation.

There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).

Thank you very much for your insight and I hope it gets included in the next release! Great suggestion!

Stuart Wright · Mar 18, 2023

This needs to be implemented yesterday. Can't delay in giving forums every advantage in Google.

sdev · Apr 12, 2023

Just had my first Google Discover traffic, for a 10 year+ old forum. Searched for an answer how to add imageobject to Xenforo, and landed here. What a coincidence (not).

Google discover is a major traffic driver. This must be fixed! @Chris D @Jeremy P @NixFifty

Frode789 · Apr 12, 2023

Hope this will be taken seriously. Forums need every help they can to drive traffic, and SEO is obviously vital.

nrep · Apr 12, 2023

A really simple fix to get this added and such an important improvement. Thanks @rrlevering!

briansol · Apr 12, 2023

Not sure how i missed this thread for over a month.

super critical. @rrlevering thank you for taking the time to report the issues and if you have any other tips for the xf devs to help forums stay relevant, please do contribute them. We are all so heavily dependent on google. And let's just say, it hasn't been nice to us forum runners over the years as the social apps have taken vast audience away from us.

Chris D · May 9, 2023

rrlevering said:
Hi, my name is Ryan Levering and I currently handle structured data ingestion at Google (this guy). I've been doing some fairly spontaneous spot checks (not based on any specific problem) of some of the major forum software markups on the web just to see whether the markup is being generated in an ideal way for our systems to ingest. You can stick a URL in http://validator.schema.org to get an idea of what your markup for a given URL looks like. I have a couple of high-level suggestions to take or leave as you see fit:

Include more than the OP with "http://schema.org/Comment" nodes through a "http://schema.org/comment" property. We're trying to normalize the forum markup space to use DiscussionForumPosting for the OP and attach Comment typed markup for the replies in a flat list (or threaded if you are a threaded forum). Without that, it makes it harder to segment the rest of the page appropriately in our index.

Co-typing WebPage and DiscussionForumPosting like you do is going to confuse our ingestion a bit. If you squint it's not that inaccurate, but it would be clearer to either have WebPage (separate node) -> mainEntity -> DiscussionForumPosting or DiscussionForumPosting -> mainEntityOfPage -> WebPage (separate node). A co-typed self-cycle needs to be detected specially often.

Include profile URLs in your author -> Person nodes. Raw names are not nearly as useful for disambiguation.

There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).

Hello Ryan!

Thank you so much for taking the time to make these suggestions.

Since the very beginning XenForo has continuously adapted to stay as close to best practices as humanly possible and make our output as friendly to search engines as we possibly can. There's always more to do and it's difficult to keep track of so having an expert such as yourself guide the way has been super useful.

With that in mind, we really appreciate your efforts.

@Jeremy P has done some work on this which we've just implemented and rolled out here.

In summary, just copying his notes so I don't have to rewrite them:

Remove duplicate mainEntity nodes

Include IDs and URLs in author metadata (both JSON-LD and Microdata)

Move member and resource item structured data into PHP for flexibility

Defer escaping XFMG structured data to the template itself

Make a few minor adjustments to adhere to current best practices

It also introduces a new \XF\Util\Arr::filterRecursive method for recursively filtering an array and adds a new optional argument to \XF\Util\Arr::filterNull to filter recursively. This is used to filter out null items from structured data in PHP.

We've also introduced comment microdata to posts. I think I had a separate suggestion from @Stuart Wright about providing content tags in the metadata in the keywords field which has also been introduced.

All in all, I hope we've managed to cover off most of what is required and recommended. Please let us know here if you spot any oversights or anything not quite expected.

And feel free to post more suggestions in the future if you have any

Kirby · May 9, 2023

@ChrisD
This is great news

Is there still ongoing work being done here?

I am asking this because the validator is giving errors:

Schema Markup Validator

validator.schema.org

Chris D · May 9, 2023

We can certainly look into it further but I'm not entirely sure the Validator isn't wrong.

Our markup:

HTML:

<h4 class="message-name"><a href="/community/members/sdev.97451/" class="username " dir="auto" data-user-id="97451" itemprop="name" data-xf-init="member-tooltip">sdev</a></h4>

The error:

name	Person is not a known valid target type for the name property.
@type	Person
@id	https://xenforo.com/community/members/sdev.97451/

Pretty sure our markup here is saying that the name property should be sdev i.e. the content of the element that is applied to.

@Jeremy P what do you think?

JoyFreak · May 9, 2023

Looking forward to this update the most. Thank you for your suggestions and good job on the team implementing them swiftly.

Kirby · May 9, 2023

HTML Standard

html.spec.whatwg.org

If the element is an a, area, or link element
The value is the resulting URL string that results from parsing the value of the element's href attribute relative to the node document of the element at the time the attribute is set, or the empty string if there is no such attribute or if parsing it results in an error.

That's most likely the reason why the current markup doesn't fully work:
The URL of the user profile (from attribute href) becomes the value.
This URL is also the ID of the Person - hence the error message that Person is not a valid target for the name property of a Person.

HTML:

<h4 class="message-name"><a href="/community/members/sdev.97451/" class="username " dir="auto" data-user-id="97451" itemprop="name" content="sdev"  data-xf-init="member-tooltip">sdev</a></h4>

seems to make the validator happy, eg. does give the correct result - not sure though if this approach would be fully valid.

JulianD · May 9, 2023

Chris D said:
Hello Ryan!

Thank you so much for taking the time to make these suggestions.

Since the very beginning XenForo has continuously adapted to stay as close to best practices as humanly possible and make our output as friendly to search engines as we possibly can. There's always more to do and it's difficult to keep track of so having an expert such as yourself guide the way has been super useful.

With that in mind, we really appreciate your efforts.

@Jeremy P has done some work on this which we've just implemented and rolled out here.

In summary, just copying his notes so I don't have to rewrite them:

We've also introduced comment microdata to posts. I think I had a separate suggestion from @Stuart Wright about providing content tags in the metadata in the keywords field which has also been introduced.

All in all, I hope we've managed to cover off most of what is required and recommended. Please let us know here if you spot any oversights or anything not quite expected.

And feel free to post more suggestions in the future if you have any

Thank you @Chris D and @Jeremy P for an update on this topic! It feels so good to receive good news!!

Chris D · May 10, 2023

Kirby said:
@ChrisD
This is great news

Is there still ongoing work being done here?

I am asking this because the validator is giving errors:

Schema Markup Validator

validator.schema.org

This is fixed now. Thanks Kirby, thanks @Jeremy P.

dethfire · May 16, 2023

rrlevering said:
There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).

What other things can we do?

This is all very rare direct communication from Google. With the upcoming AI changes, Google has said they want to surface more forum discussions so this is top priority to get right.

briansol · May 17, 2023

Regular threads look good but articles still have issues.

https://search.google.com/test/rich-results/result/r%2Farticles?id=naj_Ju837T71zde3Jlze_Q

dethfire · May 18, 2023

I noticed that in DiscussionForumPosting that the keywords key is empty. Could you use tags to use as keywords for that key? Running the schema through Bard, it's telling me that we're missing replyCount too.

Chris D · May 18, 2023

We added that in 2.2.13:

Jeremy P · May 18, 2023

briansol said:
Regular threads look good but articles still have issues.

Those are from mainEntity attributes in post microdata to associate the reply with the original post (which actually does have the supposed missing attributes, just in a separate JSON-LD node). It does look as though we'll want to fetch the type from the given thread type rather than hard-coding it as DiscussionForumPosting, but that doesn't seem to make a difference here.

I'm not entirely sure if there's a better way of associating the comment nodes (in microdata) with their parent node, or if there is just an issue with the way it is parsed by the validator.

Implemented Markup improvements for Google

Google

Well-known member

Well-known member

Well-known member

Active member

Well-known member

Well-known member

Well-known member

XenForo developer

Well-known member

XenForo developer

Well-known member

Well-known member

Well-known member

XenForo developer

Well-known member

Well-known member

Well-known member

XenForo developer

XenForo developer

Similar threads

We value your privacy