I've spent the last week learning about MySQL locks and in particular deadlocks. Since the existing MySQL documentation isn't as clear on all of this as I would have hoped, I've written a summary of what I've learned.
First, a brief aside on how MySQL locking works. The way MySQL guarantees that when you commit a transaction that no other transaction has made conflicting changes is through record and table level locks. This prevents two transactions that try to update the same things from both being able to commit. One transaction gets the locks first, commits first and then the second one gets the locks and is able to commit and thus the two transactions have been serialized.
MySQL implements row-level locks on entries in indices and the gaps between entries in the index. Every time an insert, update or delete statement references an entry in an index, that index entry (and maybe the gap before and after it) is locked. Inserting with an auto-increment value also requires a table-wide lock on the auto-increment value. Finally, inserting a record requires a lock on any parent records that are referenced by foreign keys in the newly inserted record.
The bit about locking gaps is kind of confusing. Any time an update/delete/insert statement references a range of records ("delete from foo" or "update where age > 10" vs an exact value "update where age = 43") MySQL needs to make sure another transaction doesn't concurrently insert records into the range being operated on. MySQL does this by locking the gaps in the index between the entries that the insert/delete/update statement modifies. This prevents transaction 1 from running "insert blah into foo" while transaction 2 is running "delete from foo".
Ok, back to deadlocks. The only sure-fire way to prevent deadlocks is to have all transactions acquire locks in the same order. This prevents transaction 1 from holding lock A while waiting for lock B and transaction 2 from holding lock B and waiting for lock A. The general ordering I've come up with is:
1) Parent rows that are referenced by foreign keys in newly inserted rows (acquire locks in alpha order of parent table name if multiple parent tables)
2) Auto-increment locks
3) Locks on specific records (acquire locks in ascending primary key value)
4) Gaps between records
Additionally, you can make your life easier if you:
1) use fewer auto-increment values -- only have an id column if we really need an id column. The fewer auto-inc columns, the fewer auto-inc locks you have to worry about.
2) use fewer foreign keys -- only between the really critical tables like items, topics, propositions, questions, responses and auth_user maybe. The fewer foreign keys you have the fewer locks that have to be taken on parent table rows.
3) absolutely positively make sure any update/delete statement is using an index since otherwise it has to lock every record in the table
Some concrete examples (in python):
bad:
sql = "update foo set blah = baz where id in (%s)" % ',".join(ids)
good:
sql = "update foo set blah = baz where in in (%s)" % ',".join(sorted(ids))
why?
make sure all transactions lock rows in the same order
bad:
table A has a foreign key on field parent_id to table B's id field
insert into A (parent_id) values (1)
update B set blah = baz where id = 1
good:
update B set blah = baz where id = 1
insert into A (parent_id) values (1)
why?
Inserting a record into A requires a shared lock of the parent row in B. The subsequent update statement then tries to upgrade this lock to an exclusive lock. If another transaction/thread is also running this same code both may first get a shared lock on the same parent row in B and then deadlock waiting for the other thread to give up its shared lock so that the update statement can get an exclusive lock.
bad:
table foo has an auto-inc colum
delete from foo;
insert ... into foo;
good:
delete from foo where id = records you don't want to keep
insert into foo new records you want to create
why:
The delete will lock every gap between all existing records to make sure new records aren't inserted after the delete runs but before the transaction completes. However, if a second transaction tries to insert into table foo at the same time, the second transaction's insert may get the auto-inc lock and then wait for the gap lock created by the delete. But the insert after the delete in the original transaction will try to get the auto-inc lock and thus deadlock.
Saturday, December 20, 2008
blog comments powered by Disqus
Subscribe to:
Post Comments (Atom)