Eric Lippert Tree Challenge

In his blog, Eric Lippert issued an interesting programming challenge. (Follow the link for details of the requirements.) Here is my solution.

#
# Programming challenge from
#  http://blogs.msdn.com/b/ericlippert/archive/2010/09/09/old-school-tree-display.aspx
#
# Done in Python, by Michael Chermside
#

import unittest
import itertools

# ============== Provided Problem ==============
class Node:
    """This class is, by gentleman's agreement, immutable.
    Do not modify it after creation."""
    def __init__(self, text, *children):
        self.text = text
        self.children = children

# ============== SOLUTION ==============
NEWLINE = '\n'

def dumper(root):
    """Passed top node of the tree. Returns string for the tree."""
    return ''.join( dumper_helper(root, ()) )

def sequence_and_item(sequence, item):
    """Returns an iterable (one which can be iterated multiple times)
    whose items are taken from the given sequence, followed by the
    given item."""
    return [x for x  in itertools.chain(sequence, (item,))]

def dumper_helper(node, prefix):
    """Generator which is passed a node and a sequence of prefix
    strings to be applied for each line. (Note: it must be possible
    to iterate the prefix sequence multiple times.)

    Yields a series of strings that, when assembled, will contain
      (1) the node name,
      (2) a newline,
      (3) line contents for all its children, each preceeded by the specified prefix."""
    # -- the node name --
    yield node.text
    # -- newline --
    yield NEWLINE
    children = node.children
    if len(children) >= 1:
        # -- All but the last child --
        for child in children[:-1]:
            for x in prefix:
                yield x
            yield '├─'
            for x in dumper_helper(child, sequence_and_item(prefix, '│ ')):
                yield x
        # -- Last child --
        child = children[-1]
        for x in prefix:
            yield x
        yield '└─'
        for x in dumper_helper(child, sequence_and_item(prefix, '  ')):
            yield x




# ============== Unit Tests ==============

class TestDumper(unittest.TestCase):
    def test_min_tree(self):
        tree = Node('a')
        self.assertEqual(dumper(tree), 'a\n')
    def test_one_top_child(self):
        tree = Node('a', Node('b'))
        self.assertEqual(dumper(tree),
                         'a\n'
                         '└─b\n')
    def test_some_top_childs(self):
        tree = Node('a', Node('b'), Node('c'), Node('d'))
        self.assertEqual(dumper(tree),
                         'a\n'
                         '├─b\n'
                         '├─c\n'
                         '└─d\n')
    def test_three_levels(self):
        tree = Node('a', Node('b', Node('c')))
        self.assertEqual(dumper(tree),
                         'a\n'
                         '└─b\n'
                         '  └─c\n')
    def test_full_to_three_levels(self):
        tree = Node('a',
                    Node('aa',
                         Node('aaa'),
                         Node('aab'),
                         Node('aaz')
                    ),
                    Node('ab',
                         Node('aba'),
                         Node('abb'),
                         Node('abz')
                    ),
                    Node('az',
                         Node('aza'),
                         Node('azb'),
                         Node('azz')
                    )
                )
        self.assertEqual(dumper(tree),
                         'a\n'
                         '├─aa\n'
                         '│ ├─aaa\n'
                         '│ ├─aab\n'
                         '│ └─aaz\n'
                         '├─ab\n'
                         '│ ├─aba\n'
                         '│ ├─abb\n'
                         '│ └─abz\n'
                         '└─az\n'
                         '  ├─aza\n'
                         '  ├─azb\n'
                         '  └─azz\n')
    def test_erics_example(self):
        tree = Node("a",
                   Node("b",
                       Node("c",
                           Node("d")),
                       Node("e",
                           Node("f"))),
                   Node("g",
                       Node("h",
                           Node("i")),
                       Node("j")))
        self.assertEqual(dumper(tree),
                         'a\n'
                         '├─b\n'
                         '│ ├─c\n'
                         '│ │ └─d\n'
                         '│ └─e\n'
                         '│   └─f\n'
                         '└─g\n'
                         '  ├─h\n'
                         '  │ └─i\n'
                         '  └─j\n')



if __name__ == '__main__':
    unittest.main()

DEVELOPMENT AND DESIGN NOTES:

(1) I chose to write this in Python (Python 3). I don't know .Net that well and didn't have a compiler handy. Python seemed a good, readable choice. Its performance characteristics are different, but I don't care about performance.

(2) I chose to write unit tests. Unusually for me, I wrote this using "Test Driven Design": writing a test, then afterward modifying the code to make it work. TDD is perfect for this kind of small, contained problem when I have a fairly good idea of how to proceed but I need to be careful of subtle errors (always a danger with recursion).

(3) I chose a functional, not imperative, approach. This was for two reasons: because I wanted to practice that style of coding, and because the nature of the problem (tree processing) is well-suited for such an approach.

(4) I was NOT smart enough to write perfectly clean code on the first try. There was a final pass where I renamed functions and variables and removed unnecessary layers.

(5) Nor was I smart enough to see the elegant recursive subroutine on my first glance at the problem. At first, I passed the list of children (and a prefix) to my recursive subroutine and returned complete lines. Partway through, I looked at the structure of my code and realized that it was far cleaner if I passed in nodes (and a prefix) and returned "the node name, a newline, and the lines underneath it". Perhaps I have figured that out from the start by examining the sample output better... but the cool thing was that making the change was a simple, easy transformation, and realizing there was a cleaner approach came from looking at the structure of my code. So I conclude that just starting in without a perfect understanding of the ideal recursive unit was *just fine* because I could clean it up later with refactoring. This was, perhaps, the most interesting thing I learned from the exercise.

(6) I chose to write my recursive function as a "pure function" (no side effects, immutable arguments). So a call returned the output for that sub-section of the tree. The other alternative would have been to pass some sort of "output object" to the function that it would write to as it went. I chose this approach mostly to practice functional programming. I feel it still came out very readable.

(7) At first, I was returning strings. But I felt bad about constantly building strings. After all, strings are immutable (in so many modern languages, including Python, .Net, and Java) and I worried that all the string manipulation would be a performance problem. Even though I didn't care about performance, I chose to return lists of lines instead. Then I realized I could return lists of bits of string and never have to concatinate any strings until the final step. This approach would look perfectly normal in a language like Haskell. By the way, I never did actual performance testing so my belief that the string manipulation would be slow might be completely wrong. I know that intuition is unreliable for performance questions, but one often still must make a decision without complete information.

(8) Python has this nifty feature called a "generator". It's a function where you use the keyword "yield". The compiler turns the function into one returning an iterator which will give out the values you would get if you ran the function and each "yield" put an item onto a list. Except that it does not actually run any bit of code until needed, and it does not materialize the list in memory. Another way to think about it is that when you pull a value from the iterator, the function executes until the first "yield" statement, then it exits. The NEXT time you pull a value from the iterator the function picks up where it left off with all instance variable state intact and continues until the next "yield". It is a VERY cool feature, and I used it because of the amazing way it made my function be very readable. Other languages should adopt this idea!

(9) There was one place where I wanted to do something simple, but the most readable syntax I could find for it was unbearably ugly and complex. So I made it a subroutine, just for the sake of readability and of having a function comment explaining what the heck it was doing. That is why the function "sequence_and_item" exists.

Posted Thu 09 September 2010 by mcherm in Programming