Asim Jalis: Thoughts on Code Reuse IV

by Asim Jalis

Here is a more concrete discussion on why reuse is so hard in
object-oriented systems, compared to the Unix command line.

Let's consider a thought experiment. Suppose you wanted to write
a blog program. How easy would it be to reuse the wiki-to-html
converter (WikiText) in it? Let's not get distracted by abstract
metaphors and ideas. Let's just focus on the mechanics of reuse.
How easy or hard would it be to reuse? Here are the steps I
imagine. 

First, the code would have to refactored out to eliminate all
dependencies on hardwired wiki constants and other dependencies.
Let's assume that this is easy, and that the code was largely
independent. So that the refactoring is trivial.

Next, you would have to extract WikiText into a separate library
which could be called from multiple programs. It would need a new
place to live. I suspect it currently lives with the wiki code.
But now that it needs to be shared between the wiki and the blog
it would need to live somewhere else that is more general.

Third, you will need to test the wiki to make sure it still works
after the WikiText code has been refactored out. Note that the
refactoring was made up of two steps: first there was the
refactoring in code, and then there was the refactoring on the
file system.

Fourth, you will have to put some magic incantations in your blog
code to invoke the code.

Now consider the same thought experiment, if wiki2html was a Unix
utility. Here is the simple answer: nothing. You could just
insert it into the pipe in your blog code and you would be done.

The point of the PolyBloodyHardReuse (on c2) is not that reuse is
conceptually difficult in object-oriented systems. Rather it is
that reuse is physically a lot of work.

There is something about the way the Unix system is designed that
makes a certain kind of reuse really easy.

Here are some reasons that Unix utilities are easy to reuse:

1. Everything is always staged. There is no separate staging
step. As long as you drop your executable in the bin directory it
is accessible.

2. The binding of components is delayed. E.g. the user can decide
what to programs to bind to. The programmer just provides the
pieces that the user then uses in whichever way he wants. In most
programming languages there is a distinction between a user and a
programmer. Outside the Unix world, a user (on a command line)
cannot recombine the components in a different way.

3. The Unix commands approach empowers the user by increasing the
number of options he has. The monolithic application approach
disempowers the user by restricting him in what he can do with
the pieces of the application. He can only combine the pieces of
the application in pre-determined ways.

There is another reason Unix programs are so recombinant,
especialy on the command line. Suppose you wanted to write a
command line program that takes a directory of wiki text files
and converts them into a directory of HTML files. You can get
some of the benefits of 1-3 by using a Python or a Perl shell as
the user environment. Your application can load up a shell and
then the user can recombine functions in whatever order he wants.
I believe the Smalltalk environment gives precisely this facility
which is what people like about it so much.

Here the conciseness of the Unix pipe synxtax makes combining
components relatively easy compared to Perl or Python. Just the
amount of syntax.

Let me give you an example of how this approach can be used. I
wrote a program in Java and then also in Python to help me
analyze stock market movements. The problem with this program was
that each time I thought of a new kind of query I would have to
open up the editor and modify the code.

At one point out of frustration I split it up into utilities. The
first was fn-highs, which returns the ticker symbols that hit
their daily highs. Another was fn-returns which took a list of
symbols as arguments, and returned their 3, 6, and 12 month
percentage changes in stock price, one per line.

fn-headlines took one symbol argument and listed the news
headlines for this on the command line. 

fn-daily took a list of symbols and for each symbol showed its
daily high, low, open, and close prices.

fn-profile took one symbol and printed a one paragraph summary of
the company (where it's located, what it's business is).

Some observations about these functions. 

1. These commands are not ignorant about the data types of their
input. They are highly specialized and understand that the
arguments are stock market symbols. They are not general in the
way that grep and sort are.

2. Each one of them is trivial to write. Most call lynx -dump on
a URL and extract their information from Yahoo, MSN MoneyCentral,
and a few other financial sites.

3. And yet this environment is far more usable and useful than
the interfaces provided by Yahoo, MSN and the other financial
sites. The reason is that Yahoo and MSN only let me recombine
their functions in pre-determined ways. 

4. Here is a simple example. While Yahoo can show me the names of
all the companies in the software sector, and it can show me the
revenues of any company that I want, it cannot show me the
revenues of all the companies in the software sector on one page.
To get at this data I would have to click all day.

5. Another point to note here is that these commands play nicely
with the standard Unix commands. For example, I can sort all the
companies that hit their high today, in order of decreasing
employee size, by combining these commands with sort.

These commands were written in Bash and Perl. If I had to rewrite
this I might write some of them in Ruby.

Consider now how hard it would be to write a monolithic
application with the same functionality. Let's consider this set
of commands to be a single application. This application was easy
to write because it leverages the Unix environment. The Unix
environment is the glue that allows these tools to combine in
fruitful ways.

Also note that this is an example from a specific application
domain. This is not a collection of general text utilities.
And yet the collection is reusable. I can easily write a wrapper
around it that publishes the most interesting pieces of
information on the web, or sends them out as e-mail. I will never
have to rewrite the function to download the daily highs and lows
of a given stock.
Asim Jalis

Thursday, January 13, 2005

Thoughts on Code Reuse IV

Site Feeds

Main Site

Previous Articles