Unfurl - domain exceptions (blacklist)

cwe

Well-known member
I enjoy the XF unfurl feature for the most part. External links usually look nice in posts when it works. External links look fine (raw URL) when it doesn't.

However, there is one domain that vexes me terribly - msn.com. All links to news items on msn.com "unfurl" to a simple link with "MSN" as the anchor text. Those links look horrible in the forum and are less informative than having the actual URL displayed.

I would absolutely love it if the XF ACP allowed us to define a list of domains that should be ignored by the unfurl feature so I could add msn.com to the list and never have to manually edit another post (to set unfurl="false" for msn.com links).
 
Upvote 17
testing as i've never seen this:


Yup, looks bad :D

Seems to be that the body is all js-driven from react or angular/next/whateverjs so there's not actually any content available to the ping to get the data.

perhaps the unfurl script could account for this somehow instead of a manual blacklist (although, that could be a nice to have for other use cases)
 
As a workaround you can abuse the "BB Code Media Sites" feature. So on one of my sites ebay have decided we're a bot (which of course we are I guess!) and we get their lovely "Pardon our interruption..." error rather than the nice little preview of the auction. Not very useful.

So until it's sorted properly I've added a new "BB Code Media Site" for ebay where I'm matching the the ebay URL, eg https://www.ebay.co.uk/itm/{$id} and an embed template that is simply a hyperlink: <a href="https://www.ebay.co.uk/itm/{$id}">https://www.ebay.co.uk/itm/{$id}</a>.

That gets matched ahead of any unfurling so any ebay links are just shown as hyperlinks, which is better than the obtuse ebay error message ebay returns. You might find you can do similar with the few sites that are causing you problems.
 
Last edited:
I created a BB code media with https://www.msn.com/{$id} and <a href="https://www.msn.com/{$id}">https://www.msn.com/{$id}</a>

I then tested with a typical msn link: https://www.msn.com/en-us/money/markets/suze-orman-says-she-loves-bitcoin-but-issues-a-huge-word-of-caution-for-anyone-thinking-of-buying/ar-AA1tkddg

Unfortunately, the {$id} only matches up to the first backslash (ie. "en-us" in this case), so it is not a viable solution.
 
That's a shame. I'd say try using a regular expression (advanced options) like #https://www\.msn\.com/(?P<id>[a-z0-9_\-\/]+)#i to match the full URL, but I am pretty sure the forward slashes will get encoded to %2F in the $id and render the URL useless. Might be worth a quick try however just in case it doesn't escape them - suspect it will however.

You could of course create a tonne of rules for all the MSN subject categories I guess - the article will only be determined by the final code, the bit before that suze-orman-says ... is just for SEO and you could replace it with anything, but you'd need the language and category bits. So probably far too much work really.

I would imagine you'd have to use a PHP callback to process the URL in that case if you want to try this route. Probably the route I'd explore. There are a few examples - YouTube for instance that are callbacks.

I suppose one other OTT solution would be to have a small webservice pretending to be the site in question (msn in your case) and either doing a fetch and munge on the data (so unfurling worked) or just returning a canned response that worked for the unfurl. Or a small service retuning oEmbed JSON which would save faffing around with pretending to be other servers. Although probably just as easy to do a callback or I guess investigate the code that does the unfurling to see if that can be hooked into / overloaded with an add-on.
 
So since I hate it when someone like me glibly says "write a callback" I thought I should rough something out as an example. Now PHP is not my language of choice and I don't use it very often so don't take this as the "right" way to do it!

Obviously rename things as required. So I'll create the callback first. For this example I'm going to create a file in my XF install: src/addons/Generic/MediaSites/MsnMedia.php and populate it thus:

PHP:
<?php
namespace Generic\MediaSites;
class MsnMedia
{
        public static function htmlCallback($mediaKey, array $site, $siteId)
        {
                $params = ['siteId' => $siteId ];
                $urlInfo = explode('/', $mediaKey);
                $params['lang']     = rawurlencode($urlInfo[0]);
                $params['category'] = rawurlencode($urlInfo[1]);
                $params['section']  = rawurlencode($urlInfo[2]);
                $params['slug']     = rawurlencode($urlInfo[3]);
                $params['id']       = rawurlencode($urlInfo[4]);
                $params['title']    = ucfirst(str_replace('-', ' ', $params['slug']));
                $params['raw']      = $mediaKey;
                return \XF::app()->templater()->renderTemplate('public:_media_site_embed_MSN', $params);
        }
}

So I thought it worth breaking down the 'id' ($mediaKey) the function received, but I guess you wouldn't have to - although you might want to be a bit careful about what you echo back to your site for inclusion in pages! You could obviously do a bit more robust validation here, for example making sure things matched patterns to prevent any potential abuse.

Anyhow back to the admin UI now and create a new BB code media site. I shall call give it a Media site ID of MSN. Note that the ID will determine what the template is named, note the last line of the PHP return \XF::app()->templater()->renderTemplate('public:_media_site_embed_MSN', $params);, the MSN bit of that comes from the Media site ID.

I'll set the Match URLs to #https://www\.msn\.com/(?P<id>[a-z0-9_\-\/]+)#i and fill in the Embed Template (note that under the Embed Template title is the template name - _media_site_embed_MSN) thus:

HTML:
<a href="https://www.msn.com/{$lang}/{$category}/{$section}/{$slug}/{$id}">MSN: {$title}</a>

In the Advanced Options section I am going to enable Regular expression matching and then for the Embed HTML callback I will set the first box to Generic\MediaSites\MsnMedia and the second box to htmlCallback

That should be that and should render the link out as "MSN: The slug title here". You can tweak the template as desired to make it look how you want.

Anyhow hopefully that might be of some use to get started on this "work around".

That all said I think a simple "don't unfurl" URL list for XenForo would be an awesome addition and very welcome.
 
Back
Top Bottom