Tuesday, December 15, 2009

Protecting user data from SQL injection theft

TechCrunch has an article this morning about RockYou's loss of 32 million user accounts and their email and passwords.

The last thing a startup wants to get press for is how they lost all their user's personal information. Unfortunately, it's also nearly impossible to 100% protect this information. Instead, you need to work out a set of reasonable precautions such as:

1) always encrypt sensative information like usernames, email addresses and passwords in your database
2) always use some standard library for sanitizing user-input before it's used in SQL queries
3) run daily vulnerability scans from nessus or something like it.
4) do the normal system security stuff -- don't run unnecessary services that could be exploitable, use good passwords, etc

Encrypting your database content is nice since it also protects backups of the data, the data while it's being replicated, etc. This unfortunately usually means storing the decryption keys somewhere in the filesystem or in the app itself, all of which is readable by Apache. In theory an attacker could get them, but at this point your system is probably so compromised that nothing is going to save you.

Make sure that wherever you store the keys that they are not servable by Apache so store them (and ideally all of your source) outside your htdocs directory. Just to be clear, I'm talking about decryption keys for encrypted data like email addresses. Passwords should always be encrypted by a salted one-way hash that is not reversible. Something simple like DES is perfectly fine for encrypting email and passwords. SHA is a good non-reversible password hash. Ideally your app framework has standard code to do this for you so that you're not reinventing the wheel.

Next, avoid programatically building SQL queries based on user input. That is, don't do stuff like:

query = "select * from foo where username = '" + username_from_user + "'"

The classic SQL injection attack would then be for a user to enter a username like:

'; select * from foo where username not = '

Which ends the query you intended to run and instead runs the user's own query to list every record in the table.

Instead use whatever variable substitution system your db server or client provide. For example, in Python's MySQLdb you can template in user-supplied variables like so:

cursor.execute("select * from foo where username = %s", username_from_user)

MySQLdb then will do any escaping and quoting of the user supplied input to make sure the user can't inject their own SQL into the query.

Additionally, you should always validate user supplied input against some sort of regular expression. For example, in the example above you should make sure to match the variable username_from_user against a regex that makes sure it only consists of alphanumeric characters.

Finally, it doesn't hurt to run nessus against your system every day. I've never had it actually find a real vulnerability, but it's a cheap, free and easy way to make sure you haven't done something really stupid.

Tuesday, December 8, 2009

How to survive being slashdotted

There's nothing like the pride of having your content featured on slashdot, digg, reddit etc turn into the depression of watching your servers melt down under all the load. Here are some tips to make sure that doesn't happen.

First, the basics:

* if using Apache, use worker MPM with MaxClients and ServerLimit as high as you have RAM for. Better to cap max # of threads and reject some users than let the whole machine go down. Need to benchmark to figure out how high you can go before your machine dies.

* turn off keep-alive for your dynamic content servers

* benchmark your website ahead of time with realistic workloads simulating what real users will do in terms of page flow, click rate, ajax, data values they enter into forms etc.

* run explain on queries to look for queries hitting every row in your db. Looks at MySQL's slow query log!

* follow Yahoo's advice on static content (javascript minification, reduce image sizes, serve static content from several domains, etc).

* make sure you have enough OS file handles (/etc/security/limits.conf on Redhat), MySQL connections (max_connections = 5000), Tomcat threads etc.

* turn off DNS reverse lookups for Apache logging (HostnameLookups Off)

If you think a particular page or set of pages could be slashdotted, consider changing your code to serve the page out of memcache. Alternatively, generate a static version of the page and use mod_rewrite to transparently serve it. Wordpress can do this via the super-cache plugin. If your need to update these pages, use a backend process that periodically re-generates the cached or static version of the page every few minutes.

If all else fails...send traffic from slashdot to a static page asking them to try again later. This allows your normal site visitors to keep using the site, while gracefully shedding the slashdot load.

RewriteCond %{HTTP_REFERER} ^http://slashdot.org [NC]
RewriteCond %{REQUEST_URI} !^/media/slashdot.html$
RewriteRule (.*) http://www.foo.com/media/slashdot.html [NE,R=301,L]

Saturday, December 5, 2009

Measuring how fast your site really is

Google Webmaster Tools recently added a new performance section that shows the average load time for your site as reported by Google Toolbar users. This is a great addition of really unique data. Look for it in the labs section.

It's hard to measure this any other way since it's hard to predict how much bandwidth your users have, how fast DNS resolves, whether their browser cache has any content from your site already, etc etc etc.