XF 1.4 Image Proxy crashing PHP-FPM after moving to SSL

DeltaHF · Feb 26, 2015

I just moved my site to SSL last night, and now the Image Proxy will crash PHP-FPM if it has trouble pulling images from problematic origin servers. I've had trouble with it before (and it's always been a bit temperamental for me), but the impact is now much more severe after the SSL conversion. This is a very large high-traffic forum (10.2 million posts, ~300k daily page views) which has some extremely image-heavy threads, so we push the Image Proxy hard.

PHP-FPM has crashed about 8 times in the past 12 hours. There are no errors in the PHP-FPM error log. The server runs a LEMP web stack configured by @eva2000's Centminmod. Any help or advice would be greatly appreciated.

NewRelic graphs from two of the most severe incidents:

Screen Shot 2015-02-26 at 2.33.18 PM.webp

Screen Shot 2015-02-26 at 2.34.15 PM.webp

Screen Shot 2015-02-26 at 2.34.36 PM.webp

My php-fpm.conf:

Code:

log_level = debug

pid = /var/run/php-fpm/php-fpm.pid
error_log = /var/log/php-fpm/www-error.log
emergency_restart_threshold = 10
emergency_restart_interval = 1m
process_control_timeout = 10s

[www]
user = nginx
group = nginx
listen = 127.0.0.1:9000
listen.allowed_clients = 127.0.0.1
;listen.backlog = -1
;listen = /tmp/php5-fpm.sock
;listen.owner = nobody
;listen.group = nobody
;listen.mode = 0666

pm = static
pm.max_children = 8
; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
pm.start_servers = 8
pm.min_spare_servers = 8
pm.max_spare_servers = 8
pm.max_requests = 100

; PHP 5.3.9 setting
; The number of seconds after which an idle process will be killed.
; Note: Used only when pm is set to 'ondemand'
; Default Value: 10s
pm.process_idle_timeout = 10s;

rlimit_files = 65536
rlimit_core = 0

; The timeout for serving a single request after which the worker process will
; be killed. This option should be used when the 'max_execution_time' ini option
; does not stop script execution for some reason. A value of '0' means 'off'.
; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
; Default Value: 0
;request_terminate_timeout = 0
;request_slowlog_timeout = 0
slowlog = /var/log/php-fpm/www-slow.log

pm.status_path = /phpstatus
ping.path = /phpping
ping.response = pong

; Limits the extensions of the main script FPM will allow to parse. This can
; prevent configuration mistakes on the web server side. You should only limit
; FPM to .php extensions to prevent malicious users to use other extensions to
; exectute php code.
; Note: set an empty value to allow all extensions.
; Default Value: .php
security.limit_extensions = .php .php3 .php4 .php5

; catch_workers_output = yes
php_admin_value[error_log] = /var/log/php-fpm/www-php.error.log

PHP-FPM Status (about 10 minutes after a crash and manual restart):

Code:

pool: www
process manager: static
start time: 26/Feb/2015:19:48:23 +0000
start since: 428
accepted conn: 9846
listen queue: 0
max listen queue: 129
listen queue len: 128
idle processes: 7
active processes: 1
total processes: 8
max active processes: 8
max children reached: 0
slow requests: 0

I have also modified the timeout in /library/XenForo/Model/ImageProxy.php from 10 to 3 seconds, from line 426:

Code:

$response= XenForo_Helper_Http::getClient($requestUrl, array(
                                'output_stream' =>$streamUri,
                                'timeout' =>3
                        ))->setHeaders('Accept-encoding', 'identity')->request('GET');

Mike · Feb 27, 2015

Admittedly I haven't really gotten into the nuances of PHP-FPM configuration, but since you're setting process manager (pm) to static, pm.max_children means that you can only serve 8 processes concurrently. If you have 8 cores and the processes are always CPU bound, that makes sense. But we're seeing an example where you're really just network bound, so if there are 8 processes waiting on network connections, then you could be not serving anything with effectively no load. While normally this wouldn't happen, if someone embeds a number of images in a post from the same bad host, the browser will try to load them simultaneously (up to a point) and that could trigger an issue. It may be worth trying the "dynamic" pm setting to see if that makes a difference.

That's not a crash though and presumably it should recover. Does it just stop serving full stop until you intervene? You may want to increase your log_level to see if that gives any more information.

DeltaHF · Feb 27, 2015

Thanks for your input, Mike, I think you are correct.

I have doubled max_children, start_servers, min_spare_servers, and max_spare_servers to 16 and set the max_requests to 500 (100 is apparently far too low). This is running on an 8-core (Xeon E5-2680 v2 @ 2.80GHz) Linode.

So far the server has been performing well with no crashes. Let's hope it stays that way.

EDIT: Nope, it didn't. Just crashed again, same way. It does eventually come back after 15 minutes or so of downtime.

rdn · Feb 27, 2015

DeltaHF said:
This is running on an 8-core (Xeon E5-2680 v2 @ 2.80GHz) Linode.

Use this config:

Code:

pm = dynamic
pm.max_children = 16
pm.start_servers = 6
pm.min_spare_servers = 2
pm.max_spare_servers = 10
pm.max_requests = 500

Mike · Feb 27, 2015

If you up the log_level to debug, does that give any more details?

DeltaHF · Feb 28, 2015

Mike said:
If you up the log_level to debug, does that give any more details?

Yep, it sure does:

Code:

server reached pm.max_children setting (16), consider raising it

I'm going to switch to dynamic and increase the max_children to 32 and see how it goes. Oddly, most of the timeouts seem to come during the late night when traffic is low. I suppose a large forum with heavy Image Proxy use just needs a lot more PHP children than most configurations call for. I've got 16GB of RAM.

The next question: why did this problem suddenly become so severe after switching to HTTPS?

My theory: I enabled SPDY/3.1 at the same time, which supports multiplexing. With regular HTTP, most browsers would only make 2-6 connections at a time, effectively throttling their requests to proxy.php and limiting the impact of a slow origin server. With SPDY, browsers are making many more simultaneous requests to the proxy, so a sluggish origin server can quickly tie up a large number of PHP processes for a single visitor loading an image-heavy thread. Perhaps this is something to keep in mind for other large sites making the move to SSL.

RoldanLT said:

Thank you, Roldan. Now that I have confirmation the server is running out of processes, I'm going to try those settings out with 32 max_children instead.

rdn · Feb 28, 2015

And also, make the timeout 2 seconds: https://xenforo.com/community/threa...sion-from-proxy-php-images.81988/#post-818599
2 seconds is a lot already for most host and will reject slow server origins.

rdn · Jun 3, 2015

Mike said:
Admittedly I haven't really gotten into the nuances of PHP-FPM configuration

You should be now working with PHP-FPM, on the current Nginx server of XenForo.com

kontrabass · Dec 8, 2016

DeltaHF said:
Yep, it sure does:

Code:

server reached pm.max_children setting (16), consider raising it

I'm going to switch to dynamic and increase the max_children to 32 and see how it goes. Oddly, most of the timeouts seem to come during the late night when traffic is low. I suppose a large forum with heavy Image Proxy use just needs a lot more PHP children than most configurations call for. I've got 16GB of RAM.

The next question: why did this problem suddenly become so severe after switching to HTTPS?

My theory: I enabled SPDY/3.1 at the same time, which supports multiplexing. With regular HTTP, most browsers would only make 2-6 connections at a time, effectively throttling their requests to proxy.php and limiting the impact of a slow origin server. With SPDY, browsers are making many more simultaneous requests to the proxy, so a sluggish origin server can quickly tie up a large number of PHP processes for a single visitor loading an image-heavy thread. Perhaps this is something to keep in mind for other large sites making the move to SSL.

Thank you, Roldan. Now that I have confirmation the server is running out of processes, I'm going to try those settings out with 32 max_children instead.

I'm trying the same thing you did (switching to dynamic, max_children to 32), after experiencing similar issue. How has everything been working out for you?

Brent W · Dec 8, 2016

We use ondemand on ChristianForums with no issues.

DeltaHF · Dec 8, 2016

kontrabass said:
I'm trying the same thing you did (switching to dynamic, max_children to 32), after experiencing similar issue. How has everything been working out for you?

It's been going well. I have since moved from Linode to a dedicated server with 32GB of RAM. Here are my current PHP-FPM settings:

Code:

pm = ondemand
pm.max_children = 50
pm.start_servers = 20
pm.min_spare_servers = 5
pm.max_spare_servers = 35
pm.max_requests = 5000

Brent W · Dec 8, 2016

DeltaHF said:
It's been going well. I have since moved from Linode to a dedicated server with 32GB of RAM. Here are my current PHP-FPM settings:

Code:

pm = ondemand pm.max_children = 50 pm.start_servers = 20 pm.min_spare_servers = 5 pm.max_spare_servers = 35

These don't do anything with ondemand:

Code:

pm.start_servers = 20
pm.min_spare_servers = 5
pm.max_spare_servers = 35
pm.max_requests = 5000

Per PHP FPM Comments:

Code:

;  ondemand - no children are created at startup. Children will be forked when
;             new requests will connect. The following parameter are used:
;             pm.max_children           - the maximum number of children that
;                                         can be alive at the same time.
;             pm.process_idle_timeout   - The number of seconds after which
;                                         an idle process will be killed.

rdn · Dec 9, 2016

Ondemand is ideal for low traffic site.
Dynamic is ideal for high traffic sites.

Brent W · Dec 9, 2016

RoldanLT said:
Ondemand is ideal for low traffic site.
Dynamic is ideal for high traffic sites.

https://ma.ttias.be/a-better-way-to-run-php-fpm/

rdn · Dec 9, 2016

BamaStangGuy said:
https://ma.ttias.be/a-better-way-to-run-php-fpm/

You can read on that article he mention this:

If you're working on a high performance PHP setup, the 'ondemand' PM may not be for you. In that case, it's wise to pre-fork your PHP-FPM processes up to the maximum your server can handle. That way, all your processes are ready to serve your requests without needing to be spawned first.

Brent W · Dec 9, 2016

RoldanLT said:
You can read on that article he mention this:

High Performance is not defined. We do 5,000-7,000+ posts a day and 50-60k pageviews. Ondemand is plenty fine for us and I am willing to bet for anyone that posts on this site.

TPerry · Dec 9, 2016

BamaStangGuy said:
Ondemand is plenty fine for us and I am willing to bet for anyone that posts on this site.

It may be fine.. but it's not optimal - because of the very nature of having to spawn the PHP process(es).

Brent W · Dec 9, 2016

Tracy Perry said:
It may be fine.. but it's not optimal (by the very nature of having to spawn the PHP process(es).

Sure. However, with the hardware that exists today the only place I think you will see the difference ever is in benchmarks or actual high performance application setups.

DeltaHF · Dec 10, 2016

The revival of this thread got me to look back into why I was using "ondemand" in the first place, because according to my server setup notes, I had decided to go with "dynamic" early last year.

I think I changed it to just do some real-world benchmarks and just forgot to revert it back.

Anyway, I changed it back to "dynamic" again to see what would happen. The answer appears to be nothing, according to my NewRelic PHP graphs over the past 24 hours. Having said that, page loads do feel faster, though that may just be a placebo effect. My site serves between 100k-250k pageviews per day between WordPress and XenForo.

Brent W · Dec 10, 2016

In order for you to see any difference you would have to be spawning processes non stop in an insane amount of short period of time and even then your hardware would probably handle it with no issue at all.

Ondemand will be able to handle your traffic and everyone else's here just fine. I like to keep it as simple as possible these days with configuration of servers.

XF 1.4 Image Proxy crashing PHP-FPM after moving to SSL

Well-known member

Attachments

XenForo developer

Well-known member

Well-known member

XenForo developer

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Similar threads

We value your privacy