As designed If a URL is not able to be unfurled

AndyB

Well-known member
Affected version
XF v2.1.0
I noticed that some URLs cannot be unfurled, for example this link:

https://www.amazon.com/dp/B00KJGPN2K

When this happens the URL still gets unfurled, like this:

[URL unfurl="true"]https://www.amazon.com/dp/B00KJGPN2K[/URL]

When I look at the xf_unfurl_result table I see this:

196115

I suspect this is a bug in that if a URL is not able to be unfurled, then the URL should be left alone (unfurl="true" should not be added) and no data should be entered into the xf_unfurl_result table.
 
The unfurl="true" indicator merely indicates that we should attempt to unfurl the URL. Long story short: the URL is on its own line, therefore a block representing the URL would work there.

We do not immediately attempt to unfurl the URL. If we did, then it would mean that your post would be blocked from being submitted until we'd made a HTTP request (for each URL!) to ascertain whether it was unfurlable. At best, that would represent a delay of several seconds. At worst, it could block the post from being displayed back to the end user at all.

The first time we attempt to actually check if the URL is unfurlable, is when it is first displayed. This is a separate AJAX request so it is "non-blocking".

At that point, the post has already been submitted though, so we won't go back and change the post content.

However, the whole idea of the xf_unfurl_result table is that it is a cache. Below a certain error threshold, we'll try again in the future. One day that URL might not return an error and will unfurl as expected. If it doesn't? Well you end up with unfurl="true" in your BB code forever. Not something that we should lose sleep over.
 
offtopic but on related... is the cached data part of search index? coz i do not seem to be able to search for title/excerpt content in unfurled urls. thanks and sorry!
 
right. but there seem to be a small difference here! you are caching title and excerpt in database! it is not done for most (all?) media embeds. maybe something that can be considered if this is not hard to implement?

went through the table data again. it does not link url to post content. it is basically creating individual rows for each url 😶 so i guess it would not be possible in current implementation to show search results that include link to the post containing the searched keyword.

now wondering... is this table cleaned with unused links removed (there is a column for last used from what i see)? is the data updated on a regular basis? is there any post i missed which explains how this feature is designed to work over a longer period of time!

sorry for more offtopic content. but i learnt something about how this feature worked because of this thread! cheers.
 
We're caching that stuff for the specific reason that you really don't want us to be performing a HTTP request every time a post contains something to be unfurled.

Arguably even if we did include that in the search index, it wouldn't be particularly useful and it could come up with false positives. Honestly it doesn't feel particularly useful.

Unfurls can theoretically appear in any piece of content so the cached results are shared by whichever content is referring to a specific URL.

The table isn't cleaned up, no. The vast majority of the data in there will be in use (that's why it's there) and anything that is orphaned could be used at some point in the future. The last_request_date is used to ascertain if we should recrawl the URL. We don't do this routinely. We only recrawl a URL if it is used in a new post, or if someone comes across a thread containing the unfurl and the data hasn't been fetched for over 30 days.

Any more questions, probably just best to create a new thread to be honest :)
 
Top Bottom