Wednesday, September 2, 2009

What I don't like about mysql and memcache

I love mysql and memcache, don't get me wrong. But I find a lot of times they just don't match up well with how I want to build software.

1) I have some graph of objects I want to store into the database. Creating a table for every relationship and reading/writing it via SQL is tedious. Memcache on the other hand make this very easy, but only by serializing it and thus loosing any understanding of what the data means so the data cannot be operated on within memcache.

2) I have some one-off piece of data that I want to compute in a back-end process and have my website read. I could store it in memcache, but it's not persistent and I don't want to loose this data. I could write it into mysql but then I have a table with one record with a single field which feels kind of silly. I could write it into files, but I'd like to use the replication infrastructure I have with mysql to get this value to all my data centers, back up servers, dev machines, qa machines etc. I could write it into memcache, but I want it be persistent.

3) I frequently want to use message queues without the hassle of setting up message queing software like ActiveMQ. I don't want yet another server to monitor, yet another complicated configuration file to support and yet another app to understand. So I end up emulating message queues in memcache or mysql, but neither make it a natural thing to do.

4) It's annoying to keep memcache and mysql in sync with each other. If you're caching underlying database result sets in memcache, then you have problems of inconsistencies between the objects that came out of the database vs the objects that came out of memcache. If you're caching higher-level more granular data (like the result of some internal API call) then you may get more cache misses since two different parts of the system might be caching different objects even though they use the same underlying db records.

5) I've learned to hate transactions as programming model. It is impossible to develop modular software if you have to have acquire all locks in a fixed global order to avoid dead locks. Further, one rogue application that holds some critical lock too long can completely kill your web application. Transactions are also nearly impossible to replicate reliably and are impossible to reconcile in a distributed environment where data centers come up and go down periodically.

The alternative to locks and transactions is to write lock free code that potentially leaves things in a messy state if it crashes or fails to clean up things properly. I'd rather have a back end process run periodically over my data and clean it up. Obviously in the classic "bank account" kind of application this would be unacceptable, but for many applications this is fine.

6) Stored procedures are nice but always second class citizens in databases. I'd like them to be upgraded to first class citizens in terms of the performance I get, the ease of writing, being able to use modern languages etc.

7) Multi-master replication for fault-tolerance is hard. Statement and row based replication are too low-level. Instead of replicating a bunch of statements between locations, I'd like to replicate a higher level operation like "try to make a user with this username and this user id". If there was lag between the two masters and somehow the same user created two different accounts on both masters with the same username but with different user ids, I'd like to have custom reconciliation logic that that might rename one of the users.
blog comments powered by Disqus