We had the typical LAMP setup going on, with
Drupal as the base CMS and
APC for bytecode cache. We needed a good caching engine so I figured why not use APC's user cache. Well, we tried the
APC Cache Drupal Module which, with minor fixes proved to work very nicely. That is, until we actually put this all thing on production.
The first thing we had was having our apache hang and not respond to any user requests. We susspected network issues, especially since
netstat -na showed that all the apache processes were hanging on
SYN_WAIT. However, since apache restart solved the issue i started to suspect this was something else.
To make a long (very long) story short, I got strace on our prod machines to find out that apache was either hanging on
futex_lock(....FUTEXT_WAIT...) or doing infinte loops on the same functions.
To make even a longer story short, I got gdb installed on those machines and the backtrace clearly indicated that the locks were from APC user-cache calls.
We decided to abandon APC user-cache and switch to
memcached which proved faster and had less lockdowns.
The funny thing is that when we talked about this over dinner the same evening a developer from another team just pointed me to this article by one of the APC leaders: (or something)
How to Dismantle an APC Bomb which has been around for over a year. I am supprised and shocked that such a information is hidded so well and not mentioned anywere in the docs. Moreover, I went through the APC code again after reading this post (I went through it once when i started analysing the problem) and it seems that this is not even close to being resolved. there are no patches and no TODOs and nothing of the sort. From reading the code the entire user-cache needs a major re-write. What gives?
(this is a post i wrote a couple of months ago, never had time to finish it.
Unfortunately this is still not fixed afaik)
EDIT (07/2009):
http://pecl.php.net/bugs/bug.php?id=15179 reports this to be fixed. If anyone can confirm this please send me a note so that I could update this post