Not a bug XF jobMaxRunTime is not php max_execution_time aware

Xon

Well-known member
Affected version
2.2.1
The XenForo jobMaxRunTime value does not consider if php max_execution_time is shorter, this can result in XF's notification code doing long-ish C tasks which block for much longer than they should (aka curl timeouts) resulting in failed requests with little diagnostics that it is happening.

This becomes an issue in that curl's default 10 second timeout is greater than jobMaxRunTime, and maybe greater than max_execution_time.

When XF uses curl todo push notifications, sometimes the maximum timeout is reached for whatever reason. This means the default jobMaxRunTime (~8 seconds) is blown past and can result in a connect staying around for much longer than expected.
 
So I think there are a few things here and I'm not sure if changes are needed.

First, in terms of job running and max_execution_time, AbstractJob calls @set_time_limit(0) so we'd essentially doing as much as we can to avoid the issue there. There are a couple instances that may use jobMaxRunTime that are outside of that context, though not many. In terms of push notifications, we do trigger notifications outside of jobs (like when replying to a thread), though this is capped at 3 seconds regardless of the job max run time.

In terms of cURL timeouts and max_execution_time, PHP's documented behavior here is that when the script is waiting on external events (streams, DB queries, etc), this time doesn't count towards the time limit (except on Windows). I would have to investigate the push library in more detail, as it may be that it's using async cURL operations and since the PHP script is still spinning, that point isn't relevant.

When you've run into issues, what has the max_execution_time been set at? The default is 30 which should generally be much longer than a cURL timeout, for example. We generally take the view that we should only increase/change this where it's reasonably expected it might be needed, as such, we generally accept the value the server has been configured to use.

(On a side note, the point of the jobMaxRunTime config value is primarily for reducing the value on badly configured servers, rather than increasing it. It's also always just meant as an approximate value rather than a hard step.)
 
My concern is that a php-fpm pool can easily get exhausted if the pushing service repeatedly hits 'stuck' outbound requests on a busy forum.

I think I tracked down it to the incorrect usage of php_admin_flag in the php-fpm configuration and the use of tinyproxy with a too high timeout value (so the connection would unexpectedly linger).

I'm not really sure there is any sane thing to change to support this beyond supporting a dedicated multi-process worker for the cron task behaviour and supporting disabling job.php entirely.
 
Top Bottom