Implemented Markup improvements for Google

rrlevering

Google
Hi, my name is Ryan Levering and I currently handle structured data ingestion at Google (this guy). I've been doing some fairly spontaneous spot checks (not based on any specific problem) of some of the major forum software markups on the web just to see whether the markup is being generated in an ideal way for our systems to ingest. You can stick a URL in http://validator.schema.org to get an idea of what your markup for a given URL looks like. I have a couple of high-level suggestions to take or leave as you see fit:
  1. Include more than the OP with "http://schema.org/Comment" nodes through a "http://schema.org/comment" property. We're trying to normalize the forum markup space to use DiscussionForumPosting for the OP and attach Comment typed markup for the replies in a flat list (or threaded if you are a threaded forum). Without that, it makes it harder to segment the rest of the page appropriately in our index.
  2. Co-typing WebPage and DiscussionForumPosting like you do is going to confuse our ingestion a bit. If you squint it's not that inaccurate, but it would be clearer to either have WebPage (separate node) -> mainEntity -> DiscussionForumPosting or DiscussionForumPosting -> mainEntityOfPage -> WebPage (separate node). A co-typed self-cycle needs to be detected specially often.
  3. Include profile URLs in your author -> Person nodes. Raw names are not nearly as useful for disambiguation.
There's a couple other smaller things, but those are the things that would improve the markup the most. Note also you don't need to use JSON-LD if you are worried about duplicating contents/page size. Microdata is fine for text/content-heavy schema (though can be harder to author/inject).
 
Upvote 41
This suggestion has been implemented. Votes are no longer accepted.
Should issues with the implementation be reported as a new bug?

If not:
  • FollowAction seems to be missing entirely
  • interactionStatistic LikeAction counter is wrong - it gives the reaction score the user has received but it should be the total amount of likes the user has received (which IMHO doesn't really exist in XenForo)
  • agentInteractionStatistic LikeAction is missing
  • dateModified is missing
  • image should have three variants (aspect ratio 1x1, 16x9, 4x3, min 50K pixel)
  • sameAs is missing (if the user profile has social media identities and those can be seen by a guest)
I'm guessing if the issues will be the same in 2.2.14, then it's not related to the new 'dogfood' testing, so they're bugs that should be tracked. Imo all bugs should be tracked and have their own threads anyway. Doesn't quite help in the future if there is regression and no existing point of reference (aside from a comment in a thread).
 
As I mentioned in my last post, just change Comment -> mainEntity to Comment -> parentItem and that structural problem will go away. With that change, this exact forum page here will parse fully valid. Syntax merging should work fine currently.

The other issues (only image comments and deleted authors) don't show up as much in our web diffs, but we'll work on fixing them next round of updates. These are likely not leading to problems in our ingestion, it's just a reporting issue.View attachment 294402
Does anyone know exactly how to implement this seemingly simple request? What file/template to alter, code to use and where to insert it would be helpful.
 
My 2c (I suppose it would be correct to call myself an "SEO expert," but one needn't take my word for it):

Worry about the forum member and visitor experience.

Having it all work nicely and stably is a lot more important than the schema.org structured data.

Ideally, you should have both, but rushing an update and risking stability and user experience just to "please Google" is most likely to do more harm than good - even to your search rankings.

Kudos to XenForo devs for keeping a level head on this topic (and also for pushing a timely, but not rushed update).

Relja
 
While I agree with your point that a good user experience is important, we also need our forums to be readable by the searches as easily as possible. Google will soon start dinging forums that don’t have the desired mark up structure, as appears evident by Mr Levering’s post below:

Just a follow up note, in our upcoming launch of forum reports in Google Search Console, the Comment -> mainEntity -> DiscusionForumPosting link is going to cause error reports. It still will be parsed for a while to not penalize these forums, but only Comment -> parentItem -> DiscussionForumPosting will be recognized as a valid inverse link from Comment by our tooling. So just changing that one property from "mainEntity" to "parentItem" is recommended.

In my opinion, we need to stay on top of this request.
 
While I agree with your point that a good user experience is important, we also need our forums to be readable by the searches as easily as possible. Google will soon start dinging forums that don’t have the desired mark up structure, as appears evident by Mr Levering’s post below:



In my opinion, we need to stay on top of this request.

You think two weeks (or even a month or two) would make a huge difference?

I think they won’t. Especially not in the mid and long run.

Though my point is to avoid rushing unstable updates. I agree that this should be implemented.
 
You think two weeks (or even a month or two) would make a huge difference?

I think they won’t. Especially not in the mid and long run.

Though my point is to avoid rushing unstable updates. I agree that this should be implemented.
I understand your point and do agree. I have faith that Xenforo won’t rush a patch or release. Just voicing my concerns that this needs to be a priority.
 
Yeah, that’d be ideal. :)
Done.


And another older issue;
 
Last edited:
As I mentioned in my last post, just change Comment -> mainEntity to Comment -> parentItem and that structural problem will go away. With that change, this exact forum page here will parse fully valid. Syntax merging should work fine currently.

@rrlevering
XenForo has implemented the mainEntity -> parentItem change.

The test tool doesn't complain, but the output doesnt't look correct to me:
1701901687518.webp

1701901582170.webp

So instead of one DiscussionForumPosting with 19 Comment (as property comment) the test tool finds one known DiscussionForumPosting
and 19 Unknown Comment, each of them with 19 parentItem DiscussionForumPosting.

Is this really correct or is there still smth. else that needs to be done by either XenForo or Google?

The JSON-LD-only example code from https://developers.google.com/search/docs/appearance/structured-data/discussion-forum works as expected (as does the Microdata-only code).

HTML:
<html>
  <head>
    <title>I went to the concert!</title>
    <script type="application/ld+json">
    {
      "@context": "https://schema.org",
      "@type": "DiscussionForumPosting",
      "mainEntityOfPage": "https://example.com/post/very-popular-thread",
      "headline": "I went to the concert!",
      "text": "Look at how cool this concert was!",
      "video": {
        "@type": "VideoObject",
        "contentUrl": "https://example.com/media/super-cool-concert.mp4",
        "name": "Video of concert",
        "uploadDate": "2023-03-01T06:34:34+02:00",
        "thumbnailUrl": "https://example.com/media/super-cool-concert-snap.jpg"
      },
      "url": "https://example.com/post/very-popular-thread",
      "author": {
        "@type": "Person",
        "name": "Katie Pope",
        "url": "https://example.com/user/katie-pope",
        "agentInteractionStatistic": {
          "@type": "InteractionCounter",
          "interactionType": "https://schema.org/WriteAction",
          "userInteractionCount": 8
        }
      },
      "datePublished": "2023-03-01T08:34:34+02:00",
      "interactionStatistic": {
        "@type": "InteractionCounter",
        "interactionType": "https://schema.org/LikeAction",
        "userInteractionCount": 27
      },
      "comment": [{
        "@type": "Comment",
        "text": "Who's the person you're with?",
        "author": {
          "@type": "Person",
          "name": "Saul Douglas",
          "url": "https://example.com/user/saul-douglas",
          "agentInteractionStatistic": {
            "@type": "InteractionCounter",
            "interactionType": "https://schema.org/WriteAction",
            "userInteractionCount": 167
          }
        },
        "datePublished": "2023-03-01T09:46:02+02:00"
      },{
        "@type": "Comment",
        "text": "That's my mom, isn't she cool?",
        "author": {
          "@type": "Person",
          "name": "Katie Pope",
          "url": "https://example.com/user/katie-pope",
          "agentInteractionStatistic": {
            "@type": "InteractionCounter",
            "interactionType": "https://schema.org/WriteAction",
            "userInteractionCount": 8
          }
        },
        "datePublished": "2023-03-01T09:50:25+02:00",
        "interactionStatistic": {
          "@type": "InteractionCounter",
          "interactionType": "https://schema.org/LikeAction",
          "userInteractionCount": 7
        }
      }]
    }
  </script>
</head>
<body>
</body>
</html>

results in:

One DiscussionForumPosting with two Comment as property comment
1701903165710.webp
1701902514235.webp
1701902545537.webp
 
Last edited:
I think the tool shows the data almost exactly as it is parsed from the page (ie. our data is actually just a bunch of loose Comment nodes with parentItem set), but my understanding is the data is further processed to build the knowledge graph since the graph is able to link nodes that appear across pages, etc. (ie. it's ultimately parsed the same as if they were nested). Though I agree the output is confusing. Their examples nest the comments under the parent post.
 
Last edited:
I think the tool shows the data almost exactly as it is parsed from the page (ie. our data is actually just a bunch of loose Comment nodes with parentItem set)
Yeah, but why does each Comment have 19 type DiscussionForumPosting in parentItem - that doesn't make any sense to me at all.

If it was just 19 Comment, each of them with one type DiscussionForumPosting in parentItem, that could make sense - but still be confusing as both examples (Microdata and JSON-LD ) nest the comments.
 
Last edited:
Each successive comment is adding a type to the definition, and they're being merged since they have the same ID. If you remove the itemtype attribute it seems to go away (though the id is still duplicated, among some other fields). I'm not sure it's a big problem though:

This is kinda ugly and redundant but not necessarily incorrect and at least doesn't affect Google's accurate interpretation. I've considered deduplicating in our systems, but then I have to pick a winner to show and right now it comes from all those places so it might add confusion.

Usually it's more common to just type the node on the "main place it's defined" and then link to it without adding an extra type.
 
Last edited:
As much work has been done on XenForo 2.2.14, 2.2.3 and 2.3. I believe this single topic will be the most important factor increasing site traffic, in the short term especially. I'm thankful for Googles help and involvement along with the XenForo staff and fellow admins trying to work out all the kinks quickly.
 
As much work has been done on XenForo 2.2.14, 2.2.3 and 2.3. I believe this single topic will be the most important factor increasing site traffic, in the short term especially. I'm thankful for Googles help and involvement along with the XenForo staff and fellow admins trying to work out all the kinks quickly.

We can't know that for certain. Sure, ideally all the markup should be perfect, but we don't know how big of a difference imperfect markup makes.

A reasonable expectation would be that in a couple of years everyone will just be speaking to their phone and getting direct answers from the AI - and if Google's AI does not end up on top, then Google's only source of traffic (and income) would be YouTube.
(Is Google trying to make YouTube more profitable?)

Forums (and all the websites apart from online stores) will probably go where BBS and Usenet went...

Wish I were wrong on this.
 
I see here a lot of changes, so I'm a bit confused.

Can somebody please refer me to the changes (I guess in templates) that need to be done to fix these four?

1. Missing field 'author'

2. Missing field 'datePublished'

3. 'Comment' object must be nested inside a 'CreativeWork'

4. object Missing field 'headline'


Thanks!
 
Last edited:
Nice to see this thread going forward.

Just coded this profile page markup as per Google guidelines. Had to remove some stuff, but it's a start.

Template name: member_view, replace old schema around line 16.
Code:
<xf:page option="ldJsonHtml">
    <script type="application/ld+json">
    {
      "@context": "https://schema.org",
      "@type": "ProfilePage",
      "dateCreated": "{{ date($user.register_date, 'c')|raw }}",

      "mainEntity": {
        "@type": "Person",
        "name": "{$user.username|escape('json')}",
        "identifier": "{$user.user_id|escape('json')}",
        "interactionStatistic": [
          {
            "@type": "InteractionCounter",
            "interactionType": "https://schema.org/LikeAction",
            "userInteractionCount": "{$user.like_count|escape('json')}"
          }
        ],
        "agentInteractionStatistic": {
          "@type": "InteractionCounter",
          "interactionType": "https://schema.org/WriteAction",
          "userInteractionCount": "{$user.message_count|escape('json')}"
        },
         "description": "{$user.Profile.about|default('')|escape('json')}",
        "image": "{$user.getAvatarUrl('o', null, true)|escape('json')}"
      }
    }
    </script>
</xf:page>
 
10 profiles have now been validated in Google search console.... No problems.
Do we replace this bit with the above code?

Code:
<xf:page option="ldJsonHtml">
    <script type="application/ld+json">
        {{ $user.getLdStructuredData()|json(true)|raw }}
    </script>
</xf:page>
 
Top Bottom