The first thing we had was having our apache hang and not respond to any user requests. We susspected network issues, especially since
netstat -nashowed that all the apache processes were hanging on
SYN_WAIT. However, since apache restart solved the issue i started to suspect this was something else.
To make a long (very long) story short, I got strace on our prod machines to find out that apache was either hanging on
futex_lock(....FUTEXT_WAIT...)or doing infinte loops on the same functions.
To make even a longer story short, I got gdb installed on those machines and the backtrace clearly indicated that the locks were from APC user-cache calls.
We decided to abandon APC user-cache and switch to memcached which proved faster and had less lockdowns.
The funny thing is that when we talked about this over dinner the same evening a developer from another team just pointed me to this article by one of the APC leaders: (or something) How to Dismantle an APC Bomb which has been around for over a year. I am supprised and shocked that such a information is hidded so well and not mentioned anywere in the docs. Moreover, I went through the APC code again after reading this post (I went through it once when i started analysing the problem) and it seems that this is not even close to being resolved. there are no patches and no TODOs and nothing of the sort. From reading the code the entire user-cache needs a major re-write. What gives?
(this is a post i wrote a couple of months ago, never had time to finish it.
EDIT (07/2009): http://pecl.php.net/bugs/bug.php?id=15179 reports this to be fixed. If anyone can confirm this please send me a note so that I could update this post