How Useful is Big-O?

Here is a question I was answering on Reddit recently:

Say you have 2 algorithms and you determine both are O(log n) (or anything else that's the same). Does this mean both algorithms are just as good as one another?

Absolutely not. One algorithm might run 30x faster than the other. Or one might run faster on the size of data that your system sees typically. Or one might take far less memory. Or might be more cache-friendly. Most commonly of all, one algorithm might be far simpler than another (and therefore less prone to bugs).

Well, in that case, how useful is big-O, really?

Despite the above, it is QUITE useful. I could take 20 minutes to describe algorithm A, then another 20 minutes to describe algorithm B, then another 20 minutes to describe algorithm C. Then for my second hour, I could describe some of the various strengths and weaknesses of the algorithms. After a couple hours of discussion you might be able to understand those three options and decide which was appropriate for your use case.

OR, I could spend 30 seconds telling you that one algorithm is O(n²) while the other is O(n log n). You immediately know that if you're going to be working on data values where n is in the thousands or more, that the second algorithm is going to be enormously better.

Basically, big-O captures only one particular dimension of the differences between algorithms -- the degree to which that algorithm "blows up" when working on large problems. But that dimension is perhaps the single MOST important dimension to consider, and big-O captures it quite cleanly.

As a working programmer focused on delivering working software, rather than on theoretical computer science, I would not expect the average mid-level (or even senior) developer on my team to be able to perform the mathematical analysis of an algorithm to determine its average-case big-O performance. But I absolutely WOULD expect the average mid-level developer on my team to know the big-O performance of different data structures for different purposes and to make specific choices based on that information like "I'll keep this in a list, but ALSO in a hashtable because of the access patterns" or "I'll use a bloom filter here to get 10X performance improvement".

Posted Sun 17 July 2022 by mcherm in Programming