Thursday, January 13, 2005

Thoughts on Code Reuse IV

by Asim Jalis

Here is a more concrete discussion on why reuse is so hard in object-oriented systems, compared to the Unix command line. Let's consider a thought experiment. Suppose you wanted to write a blog program. How easy would it be to reuse the wiki-to-html converter (WikiText) in it? Let's not get distracted by abstract metaphors and ideas. Let's just focus on the mechanics of reuse. How easy or hard would it be to reuse? Here are the steps I imagine. First, the code would have to refactored out to eliminate all dependencies on hardwired wiki constants and other dependencies. Let's assume that this is easy, and that the code was largely independent. So that the refactoring is trivial. Next, you would have to extract WikiText into a separate library which could be called from multiple programs. It would need a new place to live. I suspect it currently lives with the wiki code. But now that it needs to be shared between the wiki and the blog it would need to live somewhere else that is more general. Third, you will need to test the wiki to make sure it still works after the WikiText code has been refactored out. Note that the refactoring was made up of two steps: first there was the refactoring in code, and then there was the refactoring on the file system. Fourth, you will have to put some magic incantations in your blog code to invoke the code. Now consider the same thought experiment, if wiki2html was a Unix utility. Here is the simple answer: nothing. You could just insert it into the pipe in your blog code and you would be done. The point of the PolyBloodyHardReuse (on c2) is not that reuse is conceptually difficult in object-oriented systems. Rather it is that reuse is physically a lot of work. There is something about the way the Unix system is designed that makes a certain kind of reuse really easy. Here are some reasons that Unix utilities are easy to reuse: 1. Everything is always staged. There is no separate staging step. As long as you drop your executable in the bin directory it is accessible. 2. The binding of components is delayed. E.g. the user can decide what to programs to bind to. The programmer just provides the pieces that the user then uses in whichever way he wants. In most programming languages there is a distinction between a user and a programmer. Outside the Unix world, a user (on a command line) cannot recombine the components in a different way. 3. The Unix commands approach empowers the user by increasing the number of options he has. The monolithic application approach disempowers the user by restricting him in what he can do with the pieces of the application. He can only combine the pieces of the application in pre-determined ways. There is another reason Unix programs are so recombinant, especialy on the command line. Suppose you wanted to write a command line program that takes a directory of wiki text files and converts them into a directory of HTML files. You can get some of the benefits of 1-3 by using a Python or a Perl shell as the user environment. Your application can load up a shell and then the user can recombine functions in whatever order he wants. I believe the Smalltalk environment gives precisely this facility which is what people like about it so much. Here the conciseness of the Unix pipe synxtax makes combining components relatively easy compared to Perl or Python. Just the amount of syntax. Let me give you an example of how this approach can be used. I wrote a program in Java and then also in Python to help me analyze stock market movements. The problem with this program was that each time I thought of a new kind of query I would have to open up the editor and modify the code. At one point out of frustration I split it up into utilities. The first was fn-highs, which returns the ticker symbols that hit their daily highs. Another was fn-returns which took a list of symbols as arguments, and returned their 3, 6, and 12 month percentage changes in stock price, one per line. fn-headlines took one symbol argument and listed the news headlines for this on the command line. fn-daily took a list of symbols and for each symbol showed its daily high, low, open, and close prices. fn-profile took one symbol and printed a one paragraph summary of the company (where it's located, what it's business is). Some observations about these functions. 1. These commands are not ignorant about the data types of their input. They are highly specialized and understand that the arguments are stock market symbols. They are not general in the way that grep and sort are. 2. Each one of them is trivial to write. Most call lynx -dump on a URL and extract their information from Yahoo, MSN MoneyCentral, and a few other financial sites. 3. And yet this environment is far more usable and useful than the interfaces provided by Yahoo, MSN and the other financial sites. The reason is that Yahoo and MSN only let me recombine their functions in pre-determined ways. 4. Here is a simple example. While Yahoo can show me the names of all the companies in the software sector, and it can show me the revenues of any company that I want, it cannot show me the revenues of all the companies in the software sector on one page. To get at this data I would have to click all day. 5. Another point to note here is that these commands play nicely with the standard Unix commands. For example, I can sort all the companies that hit their high today, in order of decreasing employee size, by combining these commands with sort. These commands were written in Bash and Perl. If I had to rewrite this I might write some of them in Ruby. Consider now how hard it would be to write a monolithic application with the same functionality. Let's consider this set of commands to be a single application. This application was easy to write because it leverages the Unix environment. The Unix environment is the glue that allows these tools to combine in fruitful ways. Also note that this is an example from a specific application domain. This is not a collection of general text utilities. And yet the collection is reusable. I can easily write a wrapper around it that publishes the most interesting pieces of information on the web, or sends them out as e-mail. I will never have to rewrite the function to download the daily highs and lows of a given stock.