Thursday, November 20, 2008

drupal memcache items disappearing from the cache

The scenario: you are using memcached as your Drupal cache system. You do a cache_set for some item. You do a cache_get for this item. The item is there. You do a cache_get again after a few seconds/minutes, the item is not there.

You might be experiencing this issue of the memcache module: In short it means that the memcache module flushes the entire cache cluster whenever someone does a wildcard cache_clear_all. Since it's very likely that all your bins are in a single cluster you get to a situation that whenever someone-on-the-other-side-of-the-world does something that is probably completely unrelated to you, your item will still get flushed from the cache.

Note that since flush does not activly removes the items from the cache but rather marks them as expired you will not see any memcached stats that will hint to this issue.

I have commented on the issue. Here are some highlights:
...we are currently using gaolei's patch which is fast albeit expensive. We are using it on a large production system that depends heavily on memcache and we did not see any problems yet. (thank you gaolei).

We have tried to implement a lock-add-unlock scenario such as was suggested in several comments but this will be a definite nightmare for high-traffic/high-update sites...

I would suggest letting the user choose which flush mechanism to use: the current one, or the salt one. I see no reason why the module's developers should decide for me. On a site with little updates and little memory i would prefer to have the whole memcache flush; On a site with many updates and tons of memory i'm willing to sacrifice space for the sake good performance. a simple memcache_flush_method variable would do the job just fine.

Monday, November 10, 2008

MYSQL Query Optimization: Avoiding ORs

So, you've probably heard that using ORs in your queries is heavy and should be avoided when possible. Here is a live example. We had the following query running on a medium sized table in MYSQL (around 200K rows):

SELECT count(*) FROM interactions p
(p.employer = 69 AND p.flag1 = 1)
(p.employee = 69 AND p.flag2 = 0)

Let's think of interactions as a table holding the interaction between employers and employees, and this query should tell the user the total number of interactions where user number 69 is involved.

The query looked pretty innoccent on an idle MYSQL server taking around 200ms, but when the server got busy the query execution time was reaching 5-6 seconds.

Altough we did have indexes on interactions.employer and interactions. employee doing EXPLAIN showed that MYSQL was not using them.

After some digging and fiddling and trying we ended up with:

SELECT count(*) as count FROM (
SELECT 1 FROM interactions p
WHERE (p.employer = 69 AND p.flag1 = 1)
SELECT 1 FROM privatemsg p
WHERE (p.employee = 69 AND p.flag2 = 0)
) t2

This query took on the busy server less then 250 milliseconds and less then 80ms on the idle server. Doing EXPLAIN showed that MYSQL was doing two queries, and using the correct index on each one.

After some more digging we noticed that if we EXPLAIN the original query on the idle server the optimizer occasionally converts it to a UNION, but for some reason this did not always happen and in any case took about twice the time then the UNION query.

To sum up the results here is a simple table

Query typeServer activityQuery Time
Using ORIdle200ms
Using ORBusy5500ms
Using UNIONIdle78ms
Using UNIONBusy189ms

Conclusion: always consider alternatives to OR, but make sure you check them well against real-time examples.

A Further Note: When timing queries in MYSQL alway use the optimizer hint /*! SQL_NO_CACHE */ to make sure you are not getting results from the query cache (if you have one set up).