Not a bug HTML lang tags, again.

Affected version
2.2.0 - 2.2.5

ShikiSuen

Well-known member
(TEMPORARY NON-OFFICIAL WORKAROUND WRITTEN IN #4, for XF 2.2.5 ONLY.)

Thankfully all new Chinese locales are correct, except these three deprecated ones: zh-CN, zh-HK, zh-TW.
In XF 2.2.5 they are zh-Hans-CN, zh-Hans-HK, zh-Hans-TW which are actually not deprecated ones.

It is suspected that Baidu at this moment only recognizes these three deprecated usages, though we are still investigating this issue.
A sitemaster can decide whether following the latest IANA standards or not:
Language Tags - OBSOLETE (iana.org)
Sigh that there is no way found to complain this to Baidu at this moment...

My friend will give you a support ticket mentioning this thread. He really wants zh-CN to be applied in his forum ASAP.
In the recent week he complained with me dozens of times why Baidu doesn't give his site a ****.
(Meanwhile, I choose to follow IANA standard for my site.)

Note that his PHP is 7.4

1623829045591.png
 
Last edited:

ShikiSuen

Well-known member
Found it.

Even if rewriting this, zh-TW and zh-HK are supposed to be rewritten as zh-Hant-TW and zh-Hant-HK.
Though I believe that your new region settings are now following the latest IANA standards (which is the right thing you have done), but for these 3 old region settings maybe users want an option to enable or disable the rewrite.

I am helping my friend removing these three lines now. Please add a choice of whether enabling these 3 rewrites in future XF releases.

1623830377293.png
 
Last edited:

ShikiSuen

Well-known member
I'll try contact magazines like CPCW and CFan to suggest them speak in the public in order to persuade Baidu to follow the latest IANA standards.
 

ShikiSuen

Well-known member
Here comes the patch for XenForo 2.2.5 only.

CentOS / RHEL / Fedora / AlmaLinux / Debian / Ubuntu / Deepin ... (supposing for any major Linux distro):
Bash:
sed -i_backup 's/^.*-Hans-.*$//g' ./src/XF/Language.php
sed -i_backup 's/eebe6e95cc70164f704817df9b548963c6441e0b614aab5f67ed77df248ec515/071f25e67db44a591d7c9af88cdbac321a9b4337b15f64c9a2f412bc00891641/g' ./src/addons/XF/hashes.json

macOS:
Bash:
sed -i '_backup' 's/^.*-Hans-.*$//g' ./src/XF/Language.php
sed -i '_backup' 's/eebe6e95cc70164f704817df9b548963c6441e0b614aab5f67ed77df248ec515/071f25e67db44a591d7c9af88cdbac321a9b4337b15f64c9a2f412bc00891641/g' ./src/addons/XF/hashes.json

After executing the above commands at XenForo installation root directory, one can set his language to any [deprecated] one to attract Baidu spiders' attention.
 
Last edited:

ShikiSuen

Well-known member
Instead of patching the file you could create an Add-on to do the same thing.
IMHO that would be much more maintainable.
I believe that XF official devs will patch this soon.

P.S.: I know nothing about PHP. Otherwise I would like to write an addon.
 

Mike

XenForo developer
Staff member
You're right that the mapping for the TW/HK locales should be Traditional, though the specific situation you're reporting doesn't involve this (as it relates to zh-CN and that does map to zh-Hans-CN).

However, as it stands, this report is essentially asserting that Baidu will refuse to index based on the latter. Before we'd consider making any sort of change here (outside of the Hans/Hant mistake), we'd really need to see evidence that this is the case. With a quick bit of searching, I have found results via Baidu that either have no language specified or specify zh-Hant. The last example would seem to imply that this is not a general Baidu issue, but that there may be other Baidu-internal reasons that they haven't indexed the site.

(Given the site we're talking about, I do see results for it Baidu using a site: search.)
 

ShikiSuen

Well-known member
You're right that the mapping for the TW/HK locales should be Traditional, though the specific situation you're reporting doesn't involve this (as it relates to zh-CN and that does map to zh-Hans-CN).

However, as it stands, this report is essentially asserting that Baidu will refuse to index based on the latter. Before we'd consider making any sort of change here (outside of the Hans/Hant mistake), we'd really need to see evidence that this is the case. With a quick bit of searching, I have found results via Baidu that either have no language specified or specify zh-Hant. The last example would seem to imply that this is not a general Baidu issue, but that there may be other Baidu-internal reasons that they haven't indexed the site.

(Given the site we're talking about, I do see results for it Baidu using a site: search.)
I just patched the Language.php file for my friend's site (I guess you just answered his ticket).
Now his site shows zh-CN after changing language region to [deprecated].
It might need one or two weeks of observation to see whether it works.

// To mention, he wants all of his xenporta news indexed by Baidu.

This is just a workaround. The solution is to persuade Baidu to make the necessary change on their side (though extremely difficult).

P.S.: Previously some modern browsers I tried doesn't fallback Chinese fonts well with zh-CN and zh-TW. I don't know how things changed now.
 

ShikiSuen

Well-known member
@Mike

Follow up: The website which you have answered the ticket is now collected and enumerated increased amount of search results through Baidu, though not that much. Latest posts still not present yet... probably due to the DDOS attack happened yesterday (as what announced by the sitemaster).

I got a new idea: move the global lang tag from <html> to <header> <body> <footer>.
Baidu is too stupid that it only cares the lang tag in <html>.
 

Mike

XenForo developer
Staff member
We have removed the (incorrect) mapping of the older/deprecated Chinese language tags so that does give both options if it's really important, but based on findings, there doesn't seem to be a limitation with Baidu as suggested, so I don't think the core of the report is is applicable.
 

ShikiSuen

Well-known member
We have removed the (incorrect) mapping of the older/deprecated Chinese language tags so that does give both options if it's really important, but based on findings, there doesn't seem to be a limitation with Baidu as suggested, so I don't think the core of the report is is applicable.
Thanks. Removing that is enough at this stage.
My friend's website starts being catched by Baidu regularly since I patched his website (XF 2.2.5).
 
Top