Wednesday, October 15, 2008

The Forgotten Half of Computer Literacy

Gruber today noted that the New York Times is adding an API to their campaign finance data (bravo!). He also pointed to an article from 2006 titled "The programmer as journalist: a Q&A with Adrian Holovaty".

The interview is pretty concise and lays out how and why journalism can better be accomplished with help from software. Holovaty breaks down the journalistic process of collecting, filtering and disseminating information and how computer automation can be applied to each step, leading to better journalism. Near the end, he argues that all journalists should have at least some experience with programming, if for no other reason than just to know what is possible.

His points are persuasive, but they're also applicable to almost any field - not just journalism. I believe that everyone should at least know how to write software in the same way that everyone should be able to write in general.

Literacy describes the ability to read and write. When we talk about computer literacy, we usually think of the ability to use software - which I would liken to the "reading" part. We often ignore the "writing" part - the ability to make software. But writing software is incredibly powerful. If you write a page-turning novel, it inspires the imagination of everyone who reads it. If you write an informative article, all its readers are enlightened. Likewise, great software entertains and/or empowers every person who makes use of it.

The analogy holds up pretty well. No - it holds up too well - it's not an analogy at all! Writing software is writing. It might have an apparently arcane grammar, but making software is merely writing down instructions that a computer can understand. And the language is not more complicated than English. Have you seen English? It's the most bastardized, crazy language on the planet! The languages we use to talk to computers are incredibly simple - necessarily so. Computers aren't going to pick the correct meaning of a word out of a dozen possibilities based on context. They're not going to detect our sarcasm in step two of the instructions we give them. They need to be told exactly what to do in the simplest way possible, and programming languages are the way we do that. Consequently, programming languages have only gotten simpler over the years. Strong typing is going the way of the dodo. How often do you need to know what a pointer is any more? All the complicated things are getting hidden away in libraries or abstracted by the VM. There has never been a easier time to become a programmer than today.

I don't believe that everyone has to become a "programmer" any more than everyone has to become a "writer". But everyone should be able to write; a sentence, a paragraph, an essay, a script that their computer executes.

Tuesday, October 14, 2008

Exercising Git

At work I use SVN for source control, but I've been itching to try out Git. I also recently stumbled across Project Euler, a series of math problems intended to be solved by writing small programs. I think you see where this is going. So - as of a few minutes ago - my Project Euler solutions (such as they are) are up on my github account.

I am a complete Git newb, so I followed some instructions on setting up Git and a new github repository. Since I can see all the code on github, I guess it's working.

My Project Euler solutions have been very fun so far. You're allowed to implement your solution in whatever language you want. I chose Ruby because it is the most expressive language I've found.

My solutions involving factoring primes have been a mixed bag. I think I did ok finding the largest prime factor of 600851475143 (problem 3), but my solution to problem 10 (sum all the primes below two million) was an ugly brute force. I really should have known a better algorithm (I won't say it by name here, but it was referenced in one of the Dark Tower books).

Some of the problems have been quite easy, but I'm only up to problem 17. I've peeked ahead and it looks like they get pretty tough. Sometimes it feels like using Ruby is cheating - for instance, problem 16 is "sum all the digits of 2^1000". Pity the poor C++ programmer!

The tricks I have up my sleeve are mainly inject, memoization and recursion. Recursion is especially helpful in simplifying problems. Memoization can be invaluable when you're performing expensive math. Inject is just plain fun - who wants to write all that boring loop code?

One reason that people might shy away from Ruby in Project Euler is the perceived performance benefit of more low-level languages. That is a complete red herring. When I was writing assembly language in high school, I read Michael Abrash's "Graphics Programming Black Book". One thing I learned was that extremely optimized assembly language will only net you a 2x to 10x improvement over compiled code. "Isn't that a lot?" you say? No - it is nothing compared to the gains you can usually get from improving your algorithms - often orders of magnitude. Using an expressive high-level language like Ruby allows you to concentrate on the high-level algorithms. I was especially proud of my solution to problem 15. I first broke the problem down with recursion. When that ran for more than a minute I realized I was solving the same problem more than once so I memoized it. Now it runs in less than a second even on my super-slow EeePC. Of course, there are much more elegant solutions out there - my solution that I was so proud of was far from the best. It just goes to show how far a good algorithm can take you.

This turned out to be a rambling post - sorry about that. In conclusion, Git and github are great, Project Euler is fun and algorithms matter. Good night!