1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XF 1.4 Image Proxy crashing PHP-FPM after moving to SSL

Discussion in 'Troubleshooting and Problems' started by DeltaHF, Feb 26, 2015.

  1. DeltaHF

    DeltaHF Well-Known Member

    I just moved my site to SSL last night, and now the Image Proxy will crash PHP-FPM if it has trouble pulling images from problematic origin servers. I've had trouble with it before (and it's always been a bit temperamental for me), but the impact is now much more severe after the SSL conversion. This is a very large high-traffic forum (10.2 million posts, ~300k daily page views) which has some extremely image-heavy threads, so we push the Image Proxy hard.

    PHP-FPM has crashed about 8 times in the past 12 hours. There are no errors in the PHP-FPM error log. The server runs a LEMP web stack configured by @eva2000's Centminmod. Any help or advice would be greatly appreciated.

    NewRelic graphs from two of the most severe incidents:
    Screen Shot 2015-02-26 at 2.33.18 PM.png

    Screen Shot 2015-02-26 at 2.34.15 PM.png


    Screen Shot 2015-02-26 at 2.34.36 PM.png

    My php-fpm.conf:
    Code:
    log_level = debug
    
    pid = /var/run/php-fpm/php-fpm.pid
    error_log = /var/log/php-fpm/www-error.log
    emergency_restart_threshold = 10
    emergency_restart_interval = 1m
    process_control_timeout = 10s
    
    [www]
    user = nginx
    group = nginx
    listen = 127.0.0.1:9000
    listen.allowed_clients = 127.0.0.1
    ;listen.backlog = -1
    ;listen = /tmp/php5-fpm.sock
    ;listen.owner = nobody
    ;listen.group = nobody
    ;listen.mode = 0666
    
    pm = static
    pm.max_children = 8
    ; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
    pm.start_servers = 8
    pm.min_spare_servers = 8
    pm.max_spare_servers = 8
    pm.max_requests = 100
    
    ; PHP 5.3.9 setting
    ; The number of seconds after which an idle process will be killed.
    ; Note: Used only when pm is set to 'ondemand'
    ; Default Value: 10s
    pm.process_idle_timeout = 10s;
    
    rlimit_files = 65536
    rlimit_core = 0
    
    ; The timeout for serving a single request after which the worker process will
    ; be killed. This option should be used when the 'max_execution_time' ini option
    ; does not stop script execution for some reason. A value of '0' means 'off'.
    ; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
    ; Default Value: 0
    ;request_terminate_timeout = 0
    ;request_slowlog_timeout = 0
    slowlog = /var/log/php-fpm/www-slow.log
    
    pm.status_path = /phpstatus
    ping.path = /phpping
    ping.response = pong
    
    ; Limits the extensions of the main script FPM will allow to parse. This can
    ; prevent configuration mistakes on the web server side. You should only limit
    ; FPM to .php extensions to prevent malicious users to use other extensions to
    ; exectute php code.
    ; Note: set an empty value to allow all extensions.
    ; Default Value: .php
    security.limit_extensions = .php .php3 .php4 .php5
    
    ; catch_workers_output = yes
    php_admin_value[error_log] = /var/log/php-fpm/www-php.error.log
    
    PHP-FPM Status (about 10 minutes after a crash and manual restart):
    Code:
    pool: www
    process manager: static
    start time: 26/Feb/2015:19:48:23 +0000
    start since: 428
    accepted conn: 9846
    listen queue: 0
    max listen queue: 129
    listen queue len: 128
    idle processes: 7
    active processes: 1
    total processes: 8
    max active processes: 8
    max children reached: 0
    slow requests: 0
    
    I have also modified the timeout in /library/XenForo/Model/ImageProxy.php from 10 to 3 seconds, from line 426:
    Code:
    $response= XenForo_Helper_Http::getClient($requestUrl, array(
                                    'output_stream' =>$streamUri,
                                    'timeout' =>3
                            ))->setHeaders('Accept-encoding', 'identity')->request('GET');
    
     

    Attached Files:

    Last edited: Feb 26, 2015
  2. Mike

    Mike XenForo Developer Staff Member

    Admittedly I haven't really gotten into the nuances of PHP-FPM configuration, but since you're setting process manager (pm) to static, pm.max_children means that you can only serve 8 processes concurrently. If you have 8 cores and the processes are always CPU bound, that makes sense. But we're seeing an example where you're really just network bound, so if there are 8 processes waiting on network connections, then you could be not serving anything with effectively no load. While normally this wouldn't happen, if someone embeds a number of images in a post from the same bad host, the browser will try to load them simultaneously (up to a point) and that could trigger an issue. It may be worth trying the "dynamic" pm setting to see if that makes a difference.

    That's not a crash though and presumably it should recover. Does it just stop serving full stop until you intervene? You may want to increase your log_level to see if that gives any more information.
     
    jeffwidman and DeltaHF like this.
  3. DeltaHF

    DeltaHF Well-Known Member

    Thanks for your input, Mike, I think you are correct.

    I have doubled max_children, start_servers, min_spare_servers, and max_spare_servers to 16 and set the max_requests to 500 (100 is apparently far too low). This is running on an 8-core (Xeon E5-2680 v2 @ 2.80GHz) Linode.

    So far the server has been performing well with no crashes. Let's hope it stays that way. :)

    EDIT: Nope, it didn't. Just crashed again, same way. It does eventually come back after 15 minutes or so of downtime.
     
    Last edited: Feb 27, 2015
  4. RoldanLT

    RoldanLT Well-Known Member

    Use this config:
    Code:
    pm = dynamic
    pm.max_children = 16
    pm.start_servers = 6
    pm.min_spare_servers = 2
    pm.max_spare_servers = 10
    pm.max_requests = 500
     
    HWS likes this.
  5. Mike

    Mike XenForo Developer Staff Member

    If you up the log_level to debug, does that give any more details?
     
  6. DeltaHF

    DeltaHF Well-Known Member

    Yep, it sure does:
    Code:
    server reached pm.max_children setting (16), consider raising it
    
    :) I'm going to switch to dynamic and increase the max_children to 32 and see how it goes. Oddly, most of the timeouts seem to come during the late night when traffic is low. I suppose a large forum with heavy Image Proxy use just needs a lot more PHP children than most configurations call for. I've got 16GB of RAM.

    The next question: why did this problem suddenly become so severe after switching to HTTPS?

    My theory: I enabled SPDY/3.1 at the same time, which supports multiplexing. With regular HTTP, most browsers would only make 2-6 connections at a time, effectively throttling their requests to proxy.php and limiting the impact of a slow origin server. With SPDY, browsers are making many more simultaneous requests to the proxy, so a sluggish origin server can quickly tie up a large number of PHP processes for a single visitor loading an image-heavy thread. Perhaps this is something to keep in mind for other large sites making the move to SSL.

    Thank you, Roldan. Now that I have confirmation the server is running out of processes, I'm going to try those settings out with 32 max_children instead.
     
    jeffwidman and RoldanLT like this.
  7. RoldanLT

    RoldanLT Well-Known Member

  8. RoldanLT

    RoldanLT Well-Known Member

    You should be now working with PHP-FPM, on the current Nginx server of XenForo.com :)
     
  9. kontrabass

    kontrabass Well-Known Member

    I'm trying the same thing you did (switching to dynamic, max_children to 32), after experiencing similar issue. How has everything been working out for you?
     
  10. Brent W

    Brent W Formerly BamaStangGuy

    We use ondemand on ChristianForums with no issues.
     
  11. DeltaHF

    DeltaHF Well-Known Member

    It's been going well. I have since moved from Linode to a dedicated server with 32GB of RAM. Here are my current PHP-FPM settings:

    Code:
    pm = ondemand
    pm.max_children = 50
    pm.start_servers = 20
    pm.min_spare_servers = 5
    pm.max_spare_servers = 35
    pm.max_requests = 5000
     
    kontrabass likes this.
  12. Brent W

    Brent W Formerly BamaStangGuy

    These don't do anything with ondemand:

    Code:
    pm.start_servers = 20
    pm.min_spare_servers = 5
    pm.max_spare_servers = 35
    pm.max_requests = 5000
    
    Per PHP FPM Comments:

    Code:
    ;  ondemand - no children are created at startup. Children will be forked when
    ;             new requests will connect. The following parameter are used:
    ;             pm.max_children           - the maximum number of children that
    ;                                         can be alive at the same time.
    ;             pm.process_idle_timeout   - The number of seconds after which
    ;                                         an idle process will be killed.
     
    Last edited: Dec 9, 2016
    DeltaHF likes this.
  13. RoldanLT

    RoldanLT Well-Known Member

    Ondemand is ideal for low traffic site.
    Dynamic is ideal for high traffic sites.
     
    HWS likes this.
  14. Brent W

    Brent W Formerly BamaStangGuy

  15. RoldanLT

    RoldanLT Well-Known Member

    You can read on that article he mention this:
     
  16. Brent W

    Brent W Formerly BamaStangGuy

    High Performance is not defined. We do 5,000-7,000+ posts a day and 50-60k pageviews. Ondemand is plenty fine for us and I am willing to bet for anyone that posts on this site.
     
  17. Tracy Perry

    Tracy Perry Well-Known Member

    It may be fine.. but it's not optimal - because of the very nature of having to spawn the PHP process(es).
     
  18. Brent W

    Brent W Formerly BamaStangGuy

    Sure. However, with the hardware that exists today the only place I think you will see the difference ever is in benchmarks or actual high performance application setups.
     
  19. DeltaHF

    DeltaHF Well-Known Member

    The revival of this thread got me to look back into why I was using "ondemand" in the first place, because according to my server setup notes, I had decided to go with "dynamic" early last year. o_O I think I changed it to just do some real-world benchmarks and just forgot to revert it back.

    Anyway, I changed it back to "dynamic" again to see what would happen. The answer appears to be nothing, according to my NewRelic PHP graphs over the past 24 hours. Having said that, page loads do feel faster, though that may just be a placebo effect. My site serves between 100k-250k pageviews per day between WordPress and XenForo.
     
  20. Brent W

    Brent W Formerly BamaStangGuy

    In order for you to see any difference you would have to be spawning processes non stop in an insane amount of short period of time and even then your hardware would probably handle it with no issue at all.

    Ondemand will be able to handle your traffic and everyone else's here just fine. I like to keep it as simple as possible these days with configuration of servers.
     

Share This Page