Sunday, June 29, 2008

What I've learned about programming, part 3

I find that I write batch processing systems more than I'd think. 

The most common reason is because I have a large problem to work on and it naturally just gets input from some non-interactive source, crunches on it, and stores the data back somewhere. But sometimes, I'm building an interactive application and want to decompose it into an interactive portion and a batch portion. 

For example, a web app might need to do a time-consuming set of computations. It may make the application easier to write if the UI front end submits a job to the batch portion of the system, waits for the result to appear and then generates a response page to the user. There are some security advantages to this--if you get hit with 1000 requests to do the expensive operation the batch processing system is a natural place to start throttle them down. Also, the batch processing computers can talk to databases while the web servers that are exposed to the internet can be prevented from directly connecting to databases.

One thing I've learned is to try and make the batch processing application always make small incremental progress. The ideal is that for every N seconds the batch processing system runs, that it should get k * N percent closer to being done. This way if your system runs for 24 hours and then crashes you still have something to show for it. Breaking your tasks down into small pieces of work that can be completed and re-started without having to start from the beginning is very helpful.

For example, suppose our batch application takes as input a list of images to download, virus scan, resize and then store into a db. Depending on the size of the images and how long it takes to download them, you might consider having one thread/process downloading images, another virus scanning downloaded images and a final thread resizing them. If the downloader thread crashes, it can restart where it left off and the other threads wouldn't be affected unless they ran out of images to process.

Tuesday, June 17, 2008

What I've learned about programming, part 2

This may sound like me arguing for mom and apple pie, but this posts talks about various ways well defined software interfaces make programming easier. Everyone normally thinks of interfaces as the specification for how one piece of software can invoke another. The usual reason people care about interfaces is hiding implementation details, dividing up software development between multiple people, and allowing new implementations to be supplied at later points. But really, there are lots of other ways that good interfaces make complicated systems easier to build.

For example, the well defined interface to the processor via its instruction set allows virtual machines to be built on top of real machines. That processor instruction set might itself be implemented on top of sub-instructions such as microcode or so-called u-ops on current x86 chips. A virtual machine is a good example of how a well-specified interface allows encapsulation -- one piece of software enveloping another in ways that might not have been anticipated by the original software's authors.

Proxies are another example of using well-defined interfaces to transparently insert functionality between existing pieces of software. The proxy may be a network proxy that receives an incoming HTTP request and processes it (checks for security threats, does load balancing, terminates SSL connections etc). 

The proxy may also be one piece of software wrapping a standard piece of software to add new functionality: for example, taking a database connection factory and wrapping it with software that transparently caches connections in a connection pool. The application that used the old connection factory directly would have no idea that its now getting connections from an underlying connection pool.

Finally, language compilation is really just translating from one interface to another (or the same in the case of source-to-source translation). It is only possible to translate between the languages due to the well specified interfaces of the source and target languages.