Tuesday, January 27, 2009

slow lstat, slow php, slow drupal

If you are short on time go to bottom line.

Ok, the lstat() and stat() system calls system call are not really slow per se. But take a look at an apache strace -r dump from a drupal installation I was testing:

[user@host ~]# strace -o strace.load -r -s 256 -p <PID>
...
0.000081 getcwd("/mnt/var/www/html/drupal", 4096) = 32
0.000048 lstat("/mnt", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
0.000805 lstat("/mnt/var", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
0.001596 lstat("/mnt/var/www", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
0.000104 lstat("/mnt/var/www/html", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
0.000362 lstat("/mnt/var/www/html/drupal", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
0.000804 lstat("/mnt/var/www/html/drupal/sites", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
0.057575 lstat("/mnt/var/www/html/drupal/sites/all", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
0.000297 lstat("/mnt/var/www/html/drupal/sites/all/modules", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
0.014762 lstat("/mnt/var/www/html/drupal/sites/all/modules/Internet", {st_mode=S_IFDIR|0755, st_size=4096,
0.000214 lstat("/mnt/var/www/html/drupal/sites/all/modules/Internet/date", {st_mode=S_IFDIR|0755, st_size=4
0.000112 lstat("/mnt/var/www/html/drupal/sites/all/modules/Internet/date/date.module", {st_mode=S_IFREG|064
0.010816 open("/mnt/var/www/html/drupal/sites/all/modules/Internet/date/date.module", O_RDONLY) = 20
0.000835 fstat(20, {st_mode=S_IFREG|0644, st_size=44118, ...}) = 0
0.000069 lseek(20, 0, SEEK_CUR) = 0
0.000063 stat("././sites/all/modules/Internet/date/date.module", {st_mode=S_IFREG|0644, st_size=44118, ...}) = 0
0.000159 close(20) = 0
...

As you can see, for every file included in the system there are around 10-15 lstat calls. this means that if we have ~200 files included we have ~2500 lstat calls per page. A simple sum on the strace results gave me ~8 seconds per page of lstat time, which was over a third of my test page time. see?

[user@host ~]# cat strace.single | awk '{printf " " $1 "\n " $2}' |sed 1d | sed \$d |awk '{printf $2 "+"}' | sed s/.$/\\n/ | bc
21.204960
[user@host ~]# cat strace.single | awk '{printf " " $1 "\n " $2}' |sed 1d | sed \$d | fgrep lstat | awk '{printf $2 "+"}' | sed s/.$/\\n/ | bc
8.361994
if you really want to understand this shell script let me know. It deserves a post by itself

Now we have 3 options: a) make lstat() faster. b) make php stop doing all those lstats(). c) make drupal include less files.

Well, (a) is not an option as our infrastructure team (aka sysadmins) insist there is nothing wrong with the filesystem and nothing can be done to enhance it's performance. Since I gave up being a sysadmin when it was still call 'sysadmin' it would be hard for me to prove them wrong (especially since they might be are right).

As (c) would require us to refactor most of our code in a very very 'not drupal way' and create very large php files, I was hoping to find a better solution.

By a rare chance of good fortune i came to learn about realpath_cache_size. Once I changed the foolish default to a reasonable 1M in php.ini, I lost most of the lstat() calls and over 10 seconds of processing time.

[user@host ~]# cat strace.2.single | awk '{printf " " $1 "\n " $2}' |sed 1d | sed \$d |awk '{printf $2 "+"}' | sed s/.$/\\n/ | bc
9.752309
[user@host ~]# cat strace.2.single | awk '{printf " " $1 "\n " $2}' |sed 1d | sed \$d | fgrep lstat | awk '{printf $2 "+"}' | sed s/.$/\\n/ | bc
.040935

Here is what to do in php.ini

; ...php.ini...
; Determines the size of the realpath cache to be used by PHP. This value should
; be increased on systems where PHP opens many files to reflect the quantity of
; the file operations performed.
realpath_cache_size=1M

; Duration of time, in seconds for which to cache realpath information for a given
; file or directory. For systems with rarely changing files, consider increasing this
; value.
realpath_cache_ttl=300


For normal pages under normal load I was able to get a 10%-25% performance improvment. I think that's quite a bit for a configuration change.

7 comments:

  1. Works like a charm.

    You just put an end to a whole night of troubleshooting for me.

    ReplyDelete
  2. 1M is huge (IIRC the cache is per PHP process, it's not shared - so it is a lot!) and most probably you need less. I suggest experimenting with it a bit, maybe start with 200k or so, and see where you get.

    Anyway glad I could help ;)

    Shahar.

    ReplyDelete
  3. I just ran into the same problem. Changing realpath_cache_size to 1M had no effect though. Seems as open_basedir checks do not use the realpath_cache... after kicking open_basedir, the performance nearly skyrocked... sys cpu load dropped from 40% to 15% (with usr cpu still at 50%), lstat count dropped from 15k to 5 (five!!!) per request

    thanks for the post :)

    ReplyDelete
  4. Just found this php bugreport (tagged as wontfix): http://bugs.php.net/bug.php?id=46965

    realpath cache is disabled when open_basedir or safe_mode are in use... good to know ;)

    ReplyDelete
  5. Uwe, Good to know.
    Shahar, i'll try to lower it and see what happens. Toda.

    ReplyDelete
  6. very nice article :)

    thanks a million !

    ReplyDelete
  7. Great information shared nice article drupal

    ReplyDelete

[Due to much spam, comments are now moderated and will be posted after review]