1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XF 1.4 Image Proxy crashing PHP-FPM after moving to SSL

Discussion in 'Troubleshooting and Problems' started by DeltaHF, Feb 26, 2015.

  1. DeltaHF

    DeltaHF Well-Known Member

    I just moved my site to SSL last night, and now the Image Proxy will crash PHP-FPM if it has trouble pulling images from problematic origin servers. I've had trouble with it before (and it's always been a bit temperamental for me), but the impact is now much more severe after the SSL conversion. This is a very large high-traffic forum (10.2 million posts, ~300k daily page views) which has some extremely image-heavy threads, so we push the Image Proxy hard.

    PHP-FPM has crashed about 8 times in the past 12 hours. There are no errors in the PHP-FPM error log. The server runs a LEMP web stack configured by @eva2000's Centminmod. Any help or advice would be greatly appreciated.

    NewRelic graphs from two of the most severe incidents:
    Screen Shot 2015-02-26 at 2.33.18 PM.png

    Screen Shot 2015-02-26 at 2.34.15 PM.png

    Screen Shot 2015-02-26 at 2.34.36 PM.png

    My php-fpm.conf:
    log_level = debug
    pid = /var/run/php-fpm/php-fpm.pid
    error_log = /var/log/php-fpm/www-error.log
    emergency_restart_threshold = 10
    emergency_restart_interval = 1m
    process_control_timeout = 10s
    user = nginx
    group = nginx
    listen =
    listen.allowed_clients =
    ;listen.backlog = -1
    ;listen = /tmp/php5-fpm.sock
    ;listen.owner = nobody
    ;listen.group = nobody
    ;listen.mode = 0666
    pm = static
    pm.max_children = 8
    ; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
    pm.start_servers = 8
    pm.min_spare_servers = 8
    pm.max_spare_servers = 8
    pm.max_requests = 100
    ; PHP 5.3.9 setting
    ; The number of seconds after which an idle process will be killed.
    ; Note: Used only when pm is set to 'ondemand'
    ; Default Value: 10s
    pm.process_idle_timeout = 10s;
    rlimit_files = 65536
    rlimit_core = 0
    ; The timeout for serving a single request after which the worker process will
    ; be killed. This option should be used when the 'max_execution_time' ini option
    ; does not stop script execution for some reason. A value of '0' means 'off'.
    ; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
    ; Default Value: 0
    ;request_terminate_timeout = 0
    ;request_slowlog_timeout = 0
    slowlog = /var/log/php-fpm/www-slow.log
    pm.status_path = /phpstatus
    ping.path = /phpping
    ping.response = pong
    ; Limits the extensions of the main script FPM will allow to parse. This can
    ; prevent configuration mistakes on the web server side. You should only limit
    ; FPM to .php extensions to prevent malicious users to use other extensions to
    ; exectute php code.
    ; Note: set an empty value to allow all extensions.
    ; Default Value: .php
    security.limit_extensions = .php .php3 .php4 .php5
    ; catch_workers_output = yes
    php_admin_value[error_log] = /var/log/php-fpm/www-php.error.log
    PHP-FPM Status (about 10 minutes after a crash and manual restart):
    pool: www
    process manager: static
    start time: 26/Feb/2015:19:48:23 +0000
    start since: 428
    accepted conn: 9846
    listen queue: 0
    max listen queue: 129
    listen queue len: 128
    idle processes: 7
    active processes: 1
    total processes: 8
    max active processes: 8
    max children reached: 0
    slow requests: 0
    I have also modified the timeout in /library/XenForo/Model/ImageProxy.php from 10 to 3 seconds, from line 426:
    $response= XenForo_Helper_Http::getClient($requestUrl, array(
                                    'output_stream' =>$streamUri,
                                    'timeout' =>3
                            ))->setHeaders('Accept-encoding', 'identity')->request('GET');

    Attached Files:

    Last edited: Feb 26, 2015
  2. Mike

    Mike XenForo Developer Staff Member

    Admittedly I haven't really gotten into the nuances of PHP-FPM configuration, but since you're setting process manager (pm) to static, pm.max_children means that you can only serve 8 processes concurrently. If you have 8 cores and the processes are always CPU bound, that makes sense. But we're seeing an example where you're really just network bound, so if there are 8 processes waiting on network connections, then you could be not serving anything with effectively no load. While normally this wouldn't happen, if someone embeds a number of images in a post from the same bad host, the browser will try to load them simultaneously (up to a point) and that could trigger an issue. It may be worth trying the "dynamic" pm setting to see if that makes a difference.

    That's not a crash though and presumably it should recover. Does it just stop serving full stop until you intervene? You may want to increase your log_level to see if that gives any more information.
    jeffwidman and DeltaHF like this.
  3. DeltaHF

    DeltaHF Well-Known Member

    Thanks for your input, Mike, I think you are correct.

    I have doubled max_children, start_servers, min_spare_servers, and max_spare_servers to 16 and set the max_requests to 500 (100 is apparently far too low). This is running on an 8-core (Xeon E5-2680 v2 @ 2.80GHz) Linode.

    So far the server has been performing well with no crashes. Let's hope it stays that way. :)

    EDIT: Nope, it didn't. Just crashed again, same way. It does eventually come back after 15 minutes or so of downtime.
    Last edited: Feb 27, 2015
  4. RoldanLT

    RoldanLT Well-Known Member

    Use this config:
    pm = dynamic
    pm.max_children = 16
    pm.start_servers = 6
    pm.min_spare_servers = 2
    pm.max_spare_servers = 10
    pm.max_requests = 500
  5. Mike

    Mike XenForo Developer Staff Member

    If you up the log_level to debug, does that give any more details?
  6. DeltaHF

    DeltaHF Well-Known Member

    Yep, it sure does:
    server reached pm.max_children setting (16), consider raising it
    :) I'm going to switch to dynamic and increase the max_children to 32 and see how it goes. Oddly, most of the timeouts seem to come during the late night when traffic is low. I suppose a large forum with heavy Image Proxy use just needs a lot more PHP children than most configurations call for. I've got 16GB of RAM.

    The next question: why did this problem suddenly become so severe after switching to HTTPS?

    My theory: I enabled SPDY/3.1 at the same time, which supports multiplexing. With regular HTTP, most browsers would only make 2-6 connections at a time, effectively throttling their requests to proxy.php and limiting the impact of a slow origin server. With SPDY, browsers are making many more simultaneous requests to the proxy, so a sluggish origin server can quickly tie up a large number of PHP processes for a single visitor loading an image-heavy thread. Perhaps this is something to keep in mind for other large sites making the move to SSL.

    Thank you, Roldan. Now that I have confirmation the server is running out of processes, I'm going to try those settings out with 32 max_children instead.
    jeffwidman and RoldanLT like this.
  7. RoldanLT

    RoldanLT Well-Known Member

  8. RoldanLT

    RoldanLT Well-Known Member

    You should be now working with PHP-FPM, on the current Nginx server of XenForo.com :)

Share This Page