Fixed Google: Spammy structured markup. Data-vocabulary dead since 2011

Alpha1

Well-known member
Affected version
1.5.22
Google is marking hundreds of thousands of threads as 'Spammy structured markup' and a violation of their policies because of the way data-vocabulary microdata is used for 'Person'. This has a negative effect on ranking. At least for sites that are classed by Google as YMYL (critical for well-being).

As you know for most microdata (not all) XenForo opted for Data-Vocabulary microdata markup which was abandoned in 2011 instead of Schema.org which is now the standard. See: http://www.data-vocabulary.org/
There have been various bug reports on the matter since 2014. Some have been fixed by changing from Data-Vocabulary to Schema.org.
The deprecation of Data-Vocabulary is an important factor of the issue below because Schema.org has expanded for forum discussion and person while Data-Vocabulary is deprecated where the two conflict. Now the bug below:

Spammy structured markup
Description
Markup on some pages on this site appears to use techniques such as marking up content that is invisible to users, marking up irrelevant or misleading content, and/or other manipulative behavior that violates Google''s Spammy Structured Markup guidelines. Learn more
Affects
Pages with the URL pattern:
my-forum.com/threads/

Google refers to the following violations:
  • Don't mark up content that is not visible to readers of the page. For example, if the JSON-LD markup describes a performer, the HTML body should describe that same performer.
  • Don't mark up irrelevant or misleading content, such as fake reviews or content unrelated to the focus of a page.
  • Put the structured data on the page that it describes, unless specified otherwise by the documentation.
  • Specify all required properties for your rich result type. Items that are missing required properties are not eligible for rich results.
https://developers.google.com/search/docs/guides/sd-policies

Some similar examples from the page Common Structured Data Errors:
  • A page is using event markup but there is no visible event content on the page.
  • Job markup used, but no job related content on the page
  • Job markup doesn't match the user-visible job description
  • A page using recipe markup isn't about recipes.
https://developers.google.com/search/docs/guides/prototype#common-sd-errors

By marking up only the Persons in a thread, the structured data appears irrelevant to the purpose of a thread. The Person data does not describe the thread; on its own, it is better suited to a profile page. To be relevant, each post should be scoped as a Comment, with the Person encapsulated as the post's author property, and the entire thread should be scoped as DiscussionForumPosting. This is a flaw in XenForo's usage of structured data.

Side note: The Structured Data Testing Tool only checks if the microdata is valid. Not how it is used. The microdata 'Person' exists so the tool does not highlight the errors.

Here is how the markup should be: https://schema.org/DiscussionForumPosting
Schema.org markup makes use of 'Author' markup in combination with 'Person', because the Person is the Author of a forum discussion post. The page is not about a Person. This is where the two conflict. See the example at the bottom how Reddit uses it.:
Code:
    <div itemid="http://www.reddit.com/r/webdev/comments/2gypch/is_schemaorg_still_a_thing/" itemscope itemtype="http://schema.org/DiscussionForumPosting">
      <h1 itemprop="headline">Is Schema.org still a thing?</h1>
      <p>Author:
        <span itemprop="author" itemscope itemtype="http://schema.org/Person">
          <span itemprop="name">haecceity123</span>
        </span>
      </p>
      <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter">
        <link itemprop="interactionType" href="http://schema.org/CommentAction" />
        <p>Comment count: <span itemprop="userInteractionCount">25</span></p>
      </div>
    </div>

The solution here would be to add the correct microdata for DiscussionForumPosting.

XenForo advertises its microdata benefits on its homepage:
SEO Built-in
With XenForo there is no need to pay more for your search engine optimization needs. Human-readable URLs, semantic HTML with embedded microdata, and many more SEO features are present in the very core of the system.
As the result of this Google Policy violation is severe and because its bad SEO advertised on the home page of XenForo, I am hoping for a solution in XF1.
 
Last edited:
This is probably because the name in the Person data is a member link instead of actual name. This could be seen as 'invisible'. For xf1 and xf2 best way would be adding some sort of count on DiscussionForumPosting.

Like userinteractionCount because you can't add the number of authors or you have to markup each post as a comment with the author.

The media gallery uses this to markup for the amount of comments on xf2 and could be used for posts. Google shows the word authors and posts for thread snippets in search but i have seen forums with comment markup and still authors and posts is used.

Using data-vocabulary microdata is not a problem at the moment and google parses it just fine and is only used on xf1. I tried adding it as schema.org but gave some trouble.

Also when multiple data is on a page you could use 'MainEntityOfPage' for DiscussionForumPosting to let google know this is the primary data (not the Persons). The canonical url can be used directly or as @id under 'WebPage'.

Code:
"mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://google.com/article"
  },
 
Last edited:
btw. if you use comment markup for each post nofollow should be removed from the #post url. Even without this it is better for forums because it seems google is using this to show snippets for forums.

Also showing on desktop now (the green v is from avast).
187671
 
Last edited:
Microdata (schema.org) is best way to handle this as used on xf, difficult to implement with JSON-LD because you have to get every comment. So better to do DiscussionForumPosting seperate from comment markup.

Tested this on media and works fine, with the current xf template i can't get the author name (only member link) but think would be possible with some adjustements. Also can only get comment date, not exact time but google can figure some things out themself like date, post en author count in threads.

Test -> https://search.google.com/structured-data/testing-tool#url=https://forum.bodybuilding.nl/media/slowcut-85kg.7195/

Should be same for posts, the author can be added to that and Person data removed. DiscussionForumPosting should be made mainEntityOfPage then like i did with CreativeWork for media.

But as you can see in my post above google has no trouble finding it out with the data it has at the moment. On media it is having some trouble so gone leave that there.
 
Last edited:
Ok, i think i found the problem. On mobile some of the Person data in threads is hidden with CSS and this could give a penalty. Especially now when google is crawling your mobile site instead of desktop version. So you got to check that all parsed data is also visible on mobile, this can depend on your style.

I made some adjustements in september because of this so most things are not hidden on mobile and i also changed what data is used for Person in threads. You can also not use url's in the data that are blocked for google like with robots (and probably nofollow).

data-vocabulary is deprecated but can still be used so that is not a problem for xf1. And you can mark the posters in threads with Person or author seperate from DiscussionForumPosting (or comments) without a problem. Google does show the amount of posters (authors) as a snippet and uses the #post url's for marking the posts per page.

The structure of forums is most alike so google understand already a lot without the data, That said, for profile pages, ProfilePage markup would be great. I also added ImageGallery (media), CollectionPage (forums) and SearchResultsPage (search, has noindex so probably not needed).
 
Last edited:
These are indications that Google is more interested in the user.
By all means, good experiences will keep the user longer after the Medic algorithm
 
Thank you for reporting this issue. The issue is now resolved and we are aiming to include that in a future XF release (1.5.23).

Change log:
remove Person markup in message_user_info as there is some evidence that Google may not be happy with this. (XF2 takes a different approach here already.)
Any changes made as a result of this issue being resolved may not be rolled out here until later.
 
I am still seeing this in the thread page source:
Code:
<div class="messageUserInfo" itemscope="itemscope" itemtype="http://data-vocabulary.org/Person">
 
That would likely indicate an outdated template, as the template now simply has:
Code:
<div class="messageUserInfo">
 
Back
Top Bottom