Implemented Markup improvements for Google

rrlevering

Google
Hi, my name is Ryan Levering and I currently handle structured data ingestion at Google (this guy). I've been doing some fairly spontaneous spot checks (not based on any specific problem) of some of the major forum software markups on the web just to see whether the markup is being generated in an ideal way for our systems to ingest. You can stick a URL in http://validator.schema.org to get an idea of what your markup for a given URL looks like. I have a couple of high-level suggestions to take or leave as you see fit:
  1. Include more than the OP with "http://schema.org/Comment" nodes through a "http://schema.org/comment" property. We're trying to normalize the forum markup space to use DiscussionForumPosting for the OP and attach Comment typed markup for the replies in a flat list (or threaded if you are a threaded forum). Without that, it makes it harder to segment the rest of the page appropriately in our index.
  2. Co-typing WebPage and DiscussionForumPosting like you do is going to confuse our ingestion a bit. If you squint it's not that inaccurate, but it would be clearer to either have WebPage (separate node) -> mainEntity -> DiscussionForumPosting or DiscussionForumPosting -> mainEntityOfPage -> WebPage (separate node). A co-typed self-cycle needs to be detected specially often.
  3. Include profile URLs in your author -> Person nodes. Raw names are not nearly as useful for disambiguation.
There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).
 
Upvote 41
This suggestion has been implemented. Votes are no longer accepted.
Check the thread this is reported for - does it actually contain text in the first post or just images / videos / embeds?
I checked some URL's with the same error, and that seems to be exactly the case. They all contained a post (not specific the first post) with only an image.
 
So I checked again and all the error pages reported are inner pages. But lack of text doesn't seem like a common issue. Here are the ten pages that probably got indexed since the installation of 2.2.14 listed as having issues in search console. Cheers.

 
Last edited:
I may not be reading this thread closely enough, but did a validation on my site regarding the Missing field "author" issue and it isn't validating.

However, I'm am seeing a data-author setting in my code. I'm on v2.2.15.

Am I looking at this properly? Wondering if Google is validating against a cached page, or if I still have a problem.
 
I was just looking at my "Missing field "URL" (in "author")" failures, and it seems they are all for deleted members, which would make sense as they have no profile.
 
Yes missing URL is due to deleted members. Also you may find missing text error, which is due to a post with only media. Maybe we can have some boilerplate text inserted with the media file names? @Jeremy P?
 
I'm currently seeing 3 points for possible improvement.

1) Need to be able to assign author @id to Guest/Deleted Member posts
2) Need to be able to assign author url to Guest/Deleted Member posts
3) I believe publisher url should be the home page url not the board url, for those who don't have their forum in the root of their site
 
I may not be reading this thread closely enough, but did a validation on my site regarding the Missing field "author" issue and it isn't validating.

However, I'm am seeing a data-author setting in my code. I'm on v2.2.15.

Am I looking at this properly? Wondering if Google is validating against a cached page, or if I still have a problem.
This would be due to outdated (or improperly updated) templates in a custom theme. Notably, your post markup is missing itemprop="author".

Yes missing URL is due to deleted members. Also you may find missing text error, which is due to a post with only media. Maybe we can have some boilerplate text inserted with the media file names? @Jeremy P?
The resulting rich results wouldn't serve search users very well. It would be better to just fill the image or video attributes instead but we don't have a robust way to do that as it stands. I've made a note of it for the future though.

1) Need to be able to assign author @id to Guest/Deleted Member posts
2) Need to be able to assign author url to Guest/Deleted Member posts
Those fields are optional and there isn't really a compelling reason to provide them for guest posts. It's important to understand that Search Console provides a lot of informational notices that aren't actually problematic or reasonable to action.

3) I believe publisher url should be the home page url not the board url, for those who don't have their forum in the root of their site
Maybe, though the publisher title, alternate name, and description are taken from the board title, board short name, and board description respectively, so that doesn't seem entirely consistent. I don't think publisher information is featured in most of the rich results anyways.
 
Maybe, though the publisher title, alternate name, and description are taken from the board title, board short name, and board description respectively, so that doesn't seem entirely consistent.
For me anyway, I already have my Site Name in "Board Title" & "Board Short Title", since there is no option to use a Site Title / Site Short Title with the forum in a sub-directory. Using site title in place of board title/board short title, for me, makes the page titles more representative of the site directory hierarchy being based off the home page, rather than the forum.

Those placing XenForo in a subdirectory, regardless of what they have in board title & board short title I believe would be better represented by the publisher url being the site home page and not the site url. Always open to other's opinions though.
 
I have 21 urls that report Missing field "text" and when I tried to validate the fix, it said can't not process. Here are the url that they can't process.
Anyone can see anything missing there that I can't see?

 
Per above, this is referring to the individual posts on that page which only contain images or embedded media.
You are right. I went through each link with Google Rich Result Text and it points to the exact post id where it has issue.
The issues seem to be the members posting a youtube video that is no longer work or blocked, or they just attached an image.
It seems that you need to add some text in your post, not just video/image attachment.
 
Ideally we would just set the alternate image/video attributes instead:
The resulting rich results wouldn't serve search users very well. It would be better to just fill the image or video attributes instead but we don't have a robust way to do that as it stands. I've made a note of it for the future though.
 
Per above, this is referring to the individual posts on that page which only contain images or embedded media.
Is there a way we can require a minimum of text in a post excluding emoji, attachments? I imagine these errors will pop up more if we allow posts with zero text.
 
No, and I don't think decreasing UX for people that actually use the forum for the benefit of search results is a good idea anyway. These errors just mean those specific posts won't be eligible for rich results (they will still be eligible for regular results). The thread itself and other posts will still be eligible for rich results too.

The right solution is just to set the alternate attributes, and I'm sure we'll do that eventually, but given we've already made rather large improvements to the schema over the last few cycles and this has a relatively small impact to a limited subset of content, it's not a huge priority at the moment.
 
Top Bottom