Fixed xf:run-jobs may never stop

Sim

Well-known member
Affected version
2.2b1
The current logic for the xf:run-jobs command may in some circumstances see the task never stop - or at least run for longer than the server cron period.

PHP:
        do
        {
            $jobManager->runQueue($manualOnly, $maxRunTime);

            // keep the memory limit down on long running jobs
            $app->em()->clearEntityCache();

            // IF WE HAVE MORE JOBS IN THE QUEUE, KEEP EXECUTING
            // REGARDLESS OF HOW LONG WE'VE ALREADY BEEN RUNNING
            $more = $jobManager->queuePending($manualOnly);
            if (!$more)
            {
                break;
            }
        }
        while (true);

Given the run-jobs command is designed to be executed once per minute by a server cron task - if we have a large number of outstanding jobs to process, it could conceivably take longer than 1 minute to clear the queue - thus causing a situation where we have multiple run-jobs commands executing at the same time.

Indeed, if for some reason the rate of new jobs being spawned is greater than the rate at which we can process them - this command may never actually end!

I strongly recommend that the command be allowed to run for no longer than 1 minute (or perhaps even 30 seconds in case the last job takes longer than expected).

Perhaps something like:

PHP:
        $start = microtime(true);

        do
        {
            $jobManager->runQueue($manualOnly, $maxRunTime);

            // keep the memory limit down on long running jobs
            $app->em()->clearEntityCache();

            $more = $jobManager->queuePending($manualOnly);
            if (!$more)
            {
                break;
            }
        }
        while (microtime(true) - $start < 30); // STOP AFTER 30 SECONDS AND WAIT FOR NEXT CRON TRIGGER

... or make that 30 second limit configurable.
 
Last edited:
Hmm ... limiting the runtime to 30 seconds could slow down execution of jobs more than necessary; eg if there are a lot of time-consuming jobs and they are only being run for 30 seconds and afterwards no jobs are being executed for another approx 30 seconds it would be kinda wasteful?

I therefore think it might be better to make sure that there there will only ever be one run-jobs command running at any time.
So if it does take > cron-interval, a newly started command would terminate immediately if another command is still running.
 
Hmm ... limiting the runtime to 30 seconds could slow down execution of jobs more than necessary; eg if there are a lot of time-consuming jobs and they are only being run for 30 seconds and afterwards no jobs are being executed for another approx 30 seconds it would be kinda wasteful?

I therefore think it might be better to make sure that there there will only ever be one run-jobs command running at any time.
So if it does take > cron-interval, a newly started command would terminate immediately if another command is still running.

It's difficult to strike a balance between waiting for jobs to run versus allowing jobs to run for longer but then running the risk of it running for longer than 1 minute and thus having multiple run-jobs commands running at the same time.

The whole system relies on jobs to be well behaved and stop themselves after a reasonable execution time to allow other jobs time to execute.

However, some jobs are not easily broken up into smaller chunks for restartable processing - which is where the real risk lies. It is quite conceivable that some jobs may take more than a minute to execute. However, in those cases where we have jobs which are expected to take a long time to complete, I would suggest it would probably be best set up as a dedicated CLI command with its own system cron task rather than running as a job - to avoid clogging the job queue with slow processes.

Either way, my updated CLI Job Runner addon for XF 2.2 will have support for lock files to prevent multiple job runners executing at the same time - which then allows us to set a long maximum execution time (longer than 1 minute) to minimise wasted time.
 
Given the run-jobs command is designed to be executed once per minute by a server cron task - if we have a large number of outstanding jobs to process, it could conceivably take longer than 1 minute to clear the queue - thus causing a situation where we have multiple run-jobs commands executing at the same time.

There are other ways of dealing with jobs that run longer than expected. For example, you could move the job to a tool like supervisord. Which such a tool, you wouldn't have to introduce a time limit (which isn't ideal as stated by @Kirby).
 
There are other ways of dealing with jobs that run longer than expected. For example, you could move the job to a tool like supervisord. Which such a tool, you wouldn't have to introduce a time limit (which isn't ideal as stated by @Kirby).

Kind of missing the point here. This is a bug report about the code in the core product based on how it is intended to be used - triggered by a unix cron task.

Supervisord is well beyond the scope of what this feature of the product is intended to do.
 
Thank you for reporting this issue, it has now been resolved. We are aiming to include any changes that have been made in a future XF release (2.2.0 Beta 5).

Change log:
Add a max execution time option to the CLI job runner, defaulting to 55 seconds
There may be a delay before changes are rolled out to the XenForo Community.
 
Top Bottom