Tech Blog :: Workaround to variables cache bug in Drupal 6


May 18 '11 11:48pm
Tags

Workaround to variables cache bug in Drupal 6

I run my Drupal crons with Drush and Jenkins, and have been running into a race condition frequently where it tells me, Attempting to re-run cron while it is already running, and fails to run.

That error is triggered when the cron_semaphore variable is found. It's set when cron starts, and is deleted when cron ends, so if it's still there, cron is presumably still running. Except it wasn't really - the logs show the previous crons ended successfully.

I dug into it a little further: drush vget cron_semaphore brought up the timestamp value of the last cron, like it was still set. But querying the `variables` table directly for cron_semaphore brought up nothing! That tipped me off to the problem - it was caching the variables array for too long.

Searching the issue brought up a bunch of posts acknowledging the issue, and suggesting that people clear their whole cache to fix it. I care about performance on the site in question, so clearing the whole cache every 15 minutes to run cron is not an option.

The underlying solution to the problem is very complex, and the subject of several ongoing Drupal.org threads:

Following Drupal core dev policy now (which is foolish IMHO), if this bug is resolved, it has to be resolved first in 8.x (which won't be released for another 2-3 years), then 7.x, then 6.x. So waiting for that to work for my D6 site in production isn't feasible.

As a stopgap, I have Jenkins clear only the 'variables' cache entry before running cron:
drush php-eval "cache_clear_all('variables', 'cache');"

That seems to fix the immediate problem of cron not running. It's not ideal, but at least it doesn't clear the entire site cache every 15 minutes.

That's strange... variable_set calls cache_clear_all('variables', 'cache'); so the variables cache should be set properly when cron clears the semaphore. Is there any chance that cron isn't finishing?

Yes, it's strange. Cron very explicitly says that it completed successfully, and the database has no semaphore value. It's only in the cache. Maybe it's a memcache bug, but the fact that the threads I linked to don't specifically mention Memcache makes me think it's not specific to that.

I'm working on this problem once and for all. It's quite a complex mess of problems. See post #58 at http://drupal.org/node/973436 for my second attempt at solving it. post #57 for my first attempt.

It looks like there are more deeper problems - cache_get() is "buggy" (it never respects 'expire' time, even if you set cache_lifetime). It doesn't help when the code is written with an if nest in mind.

the only time when expire time is used is when it is time to flush old cache based on cache_lifetime... which doesn't make sense, for this reason:
expire == "now" + cache time.
but the DELETE part of the code is based on deleting expire + cache_lifetime == "now" + cache time + cache_lifetime, where cache_lifetime is 25-100 times larger than "cache time".

No wonder caching has never been solved... it's a complete mess.

And of course... "expire" is used to check if it is matched against CACHE_PERMANENT (== 0), but we already knew that use-case.