Server issue [XF2] Bug with combination of using Image Proxy with RSS Feed with enclosure URL or something else going on?

Kevin

Well-known member
Affected version
2.0.2
I'm having a problem with some RSS feeds, in particular http://www.nasa.gov/rss/image_of_the_day.rss, with images not loading. In short, when I have the image proxy turned on no images are showing in the post; with the proxy turned off it works fine. And, yes, the same feed worked fine in XF1. ;)

If I edit the post the image in the editor appears fine and as expected. When I save the post it is gone again. If I turn off image proxy the image appears; turn proxy on and the image is gone again. If I manually create a post (as opposed to the RSS Feed auto creating the post) and embed the same image from the URL with image proxy turned on it works fine. Nothing shows in the XF server log.

OK, the nitty gritty... when viewing the feed output the enclosure URL images are http, like http://www.nasa.gov/sites/default/files/thumbnails/image/clyde_foster_2.jpg, but when viewed in a browser it is being redirected to https as in https://www.nasa.gov/sites/default/files/thumbnails/image/clyde_foster_2.jpg. If I turn off the image proxy and insert the image using http (http://www.nasa.gov/sites/default/files/thumbnails/image/clyde_foster_2.jpg) it displays fine.

I'm attaching screen shots of my proxy settings, the feed settings, and what the post looks like when saved and when it is edited.

PS: Annoyingly, I kind of remember this happening with XF1 way back (way, way, way back!) in the early days of switching to XF and I think Mike may have been involved in taking a look at it back then.
 

Attachments

  • Edit.webp
    Edit.webp
    44.1 KB · Views: 16
  • Feed Settings.webp
    Feed Settings.webp
    50.4 KB · Views: 17
  • Thread View.webp
    Thread View.webp
    22.9 KB · Views: 20
The image proxy didn't used to follow redirects for security reasons, but I believe that has changed since. What happens when testing the non-HTTPS image URL with the ACP test tool?
 
Thanks for the thoughts, guys. :)
That's NASA for you.
Indeed! :LOL:
The image proxy didn't used to follow redirects for security reasons, but I believe that has changed since. What happens when testing the non-HTTPS image URL with the ACP test tool?
Good question! I didn't realize till you asked that XF2 has a test option.

Using the http link it seems to alternate between getting an error and between working; if I try the test over & over most of the results are the error.
http://www.nasa.gov/sites/default/files/thumbnails/image/clyde_foster_2.jpg could not be fetched or is not a valid image. The specific error message was: cURL error 7: Failed to connect to www.nasa.gov port 80: Connection timed out
 
So I'm doing a bunch of experiments and going through old posts... turns out that XF 1.5.11 restored the ability to handle curl redirects for use with the image proxy (announcement thread | thread talking about the problem). With XF2 with the image proxy turned on I can indeed embed the http image in a post and have it show. The same image though trying to be embedded via the RSS feeder though fails.

Anybody able to tell if the RSS feed would be using the same curl logic as embedding an image in a post? :unsure:
 
I've just checked a forum I use frequently that has RSS feeds posted into threads and the non SSL images display via the proxy correctly.
 
As far as embedding images, I believe the RSS importer just inserts the image BBCode into posts (and let's the image proxy take over from there, if enabled). A timeout would mean that the image proxy isn't getting a response when it tries to fetch the image. Given that you said you begin to receive the timeout more frequently after successive tests, I would guess that NASA is rate-limiting your requests. This may explain why it seems to coincide with the RSS importer, since the importer would cause a short burst of requests. When the image proxy fails to fetch an image, it will make periodic attempts to re-fetch it when someone tries to load the image, but subsequent failures result in more and more time between retries.
 
Last edited:
I've just checked a forum I use frequently that has RSS feeds posted into threads and the non SSL images display via the proxy correctly.
The problem I seem to be having is not with non SSL images alone but with non SSL images that do a redirect to SSL URLs.
 
Still potentially related to rate-limiting, as redirects mean two subsequent requests. Unless your problem is also occurring with non-NASA HTTPS redirected images?

If you try the HTTPS URL with the image proxy test tool, does it still begin to timeout after a number of successive tests?
 
Still potentially related to rate-limiting, as redirects mean two subsequent requests. Unless your problem is also occurring with non-NASA HTTPS redirected images?
If the remote server was rate-limiting my request then shouldn't I be able to see the same result then when manually embedding an image in a post? So far I've been unable to reproduce the result even with trying to hammer away by posting the same URL over & over again in different posts. Know of any non-NASA redirected images I can try hammering away with?
If you try the HTTPS URL with the image proxy test tool, does it still begin to timeout after a number of successive tests?
I just tried now 10 times in a row... it would alternate between error and no error so about 6 good, 4 bad. Interestingly some of the errors report a timeout when trying to connect to port 443 (as opposed to just port 80) so it seems that at least some times the redirect is being recognized and trying to be loaded.

Thinking out loud... I'm wondering if I can increase the connection timeout to see if that makes a difference; looks like it might be getting set to 150 seconds. (EDIT: But in the proxy test tool it returns the timeout error within a couple of seconds, not 150. Hhhmm....)
 
If the remote server was rate-limiting my request then shouldn't I be able to see the same result then when manually embedding an image in a post? So far I've been unable to reproduce the result even with trying to hammer away by posting the same URL over & over again in different posts. Know of any non-NASA redirected images I can try hammering away with?
The image proxy caches results locally, so posting the same image over and over will not result in any subsequent requests. Also, rate-limiting typically happens on the scale of multiple requests per some seconds. Only the test tool will allow you to perform multiple subsequent requests for the same image using the proxy system.

Thinking out loud... I'm wondering if I can increase the connection timeout to see if that makes a difference; looks like it might be getting set to 150 seconds. (EDIT: But in the proxy test tool it returns the timeout error within a couple of seconds, not 150. Hhhmm....)
The timeout is hard-coded to 8 seconds I think. In any case, it appears the issue is that these requests are timing out for one reason or another. These failures are cached for a period of time as well, so the proxy will only try re-fetching them periodically. If one of these retries is successful, the successful result will be cached locally. If the retries keep failing, it will result in longer and longer periods between them.
 
Last edited:
That's kind of the point, I'm trying to figure out the "one reason or another" part. :coffee:
I understand :) Timeouts can be a bit difficult to pin down though, as not receiving so much as an error response means you're mostly left with guess work and trial and error, unfortunately.

If anybody else out there minds trying, can you plug http://www.nasa.gov/sites/default/files/thumbnails/image/clyde_foster_2.jpg into the proxy image tool (ACP => Tools => Test Image Proxy) and trying a few times to see what results you get? Thanks :)
I've just tested dozens of times without any issue, strangely enough. I suppose that rules out rate-limiting, unless it's of a more targeted nature. Maybe someone else has an idea to help narrow down potential causes. In case it's of any help, many large image hosts which support HTTPS will do HTTP redirecting (for example, Imgur).
 
Lots of experimenting (lots and lots of experimenting)... it keeps coming back to looking like some type of timeout issue when curl tries to retrieve the file. It happens more often using the http version of the image (which is doing the redirect) versus the https version (no redirect but still occasional error).

I put some quick code in place to replace http for https in the incoming feed to see if that helps. No issues the past couple of days but since the feed only updates daily, and even then not consistently, it's too soon for me to say if the error is gone. If/when I see it happening again in the live forums the next vector will be seeing if I can increase the timeout settings being used.
 
How's things going with this @Kevin?

I've had the same feed set up locally since you started talking about this and I've not noticed any issues.
 
How's things going with this @Kevin?

I've had the same feed set up locally since you started talking about this and I've not noticed any issues.
Chris, thanks for keeping an eye on this & following-up, it is appreciated. 🍻

Here's the current standing...
  • When trying to retrieve the non-SSL version of the images using the image proxy test tool it still gets an error after about 2-3 seconds. If I keep trying then sometimes it'll succeed but mostly it fails. I'm attaching a screen capture showing that in action. Trying the SSL version of the URL gives the same general results.
  • I'm using Snog's RSS add-on to use use a single thread for the NASA Image of the Day posts. I've tried torture testing my test install with the add-on enabled & disabled to see the add-on was having any affect on the results and it doesn't.
  • I hacked added some code to Snog's add-on so that for this one particular feed that the URLs in the content & enclosure_url are changed to https.
  • With the incoming feed now using https the problem of 'missing' images has been reduced. I will still see one on occasion but if I reload the thread a few hours later the image is then present more often than not. Of course it could be happening more often and is cleared by the time I do view the thread but I've been trying to keep on eye on it.
  • Possibly related is that the server has burped a virtual memory size error from proxy.php (copy below). So far in March I've only gotten two of them, both on the same day, and I've had the 'missing' image problem since then so I don't know if the error is just a coincidence or a symptom of the issue.
    Code:
    Time:         Sun Mar  4 15:49:24 2018 -0500Account:      xxxxx
    Resource:     Virtual Memory Size
    Exceeded:     523 > 500 (MB)
    Executable:   /opt/cpanel/ea-php70/root/usr/bin/php-cgi
    Command Line: /opt/cpanel/ea-php70/root/usr/bin/php-cgi /home/xxxxx/public_html/proxy.php
    PID:          23181 (Parent PID:22013)
    Killed:       No
So I'm at the point where the evidence is leading that problem is likely related to either my server or my server's connection route to the NASA servers since nobody else seems to be able to reproduce it using the same data points.

For an action item my next step is to dig into the timeout issue. As the test tool shows the timeout happens pretty fast, about 2-3 seconds. Is that length of time being defined in the XF code anywhere or is it picking it up from the server settings (and is therefor something I change)?:unsure:
 

Attachments

  • waka4.gif
    waka4.gif
    130.4 KB · Views: 9
So I'm at the point where the evidence is leading that problem is likely related to either my server or my server's connection route to the NASA servers since nobody else seems to be able to reproduce it using the same data points.

For an action item my next step is to dig into the timeout issue. As the test tool shows the timeout happens pretty fast, about 2-3 seconds. Is that length of time being defined in the XF code anywhere or is it picking it up from the server settings (and is therefor something I change)?:unsure:
At this point in time, I think I'd tend to agree that it's server related, for the same reasons you mention.

As for the timeout, we do have a connect_timeout value of 3 seconds, so that could be related. I'm reluctant to change that though. 3 seconds should be ample time to actually make a connection. After the connection is made we give it 8 seconds or so to actually retrieve the data, so 3 seconds to actually establish a connection should be enough.

Do let us know if the issue continues, and also let us know if you find something eventually that resolves the issue for you.
 
Back
Top Bottom