Buy this book on Amazon
Algorithms to Live By is a practical guide to the algorithms we use in our day to day life without even realising it.
Brian Christian highlights the common challenges we face on a daily basis and how knowledge of time and search complexity combined with the right algorithm can dramatically increase your chances of making better decisions.
The book is split into 11 sections covering different areas of computer science, from searching and sorting algorithms to caching and from Bayesian statistics to Game Theory. Each of these computer science concepts is explained along with the algorithms and strategies employed by computer scientists to deal with scale, constraints and ambiguity.
Christian then joins the dots from computer programming to human decision making, providing actionable advice for approaching complex decisions from a computer scientist’s point of view.
Who Should Read it?
This is an excellent book for anyone looking to improve their decision making and free up cognitive load by taking some tips and tricks from computer science’s approach to ambiguous problem-solving.
The book is marketed to a general audience, however, I believe to get the most value from the book the reader should have some familiarity with common computer science concepts. The book covers several technical concepts, such as algorithmic time complexity, networking protocols and caching. If you are already aware of the types of intractable problems faced by computers or have ever struggled with poorly performing algorithms yourself, you will find it easier to understand and link the concepts to your own daily decision making.
There are cases where computer scientists and mathematicians have identified good algorithmic approaches that can be transferred to human problems
We often face the same set of problems that we ask computers to complete for us. For example:
- sorting (organising emails, entering competitions, organising your kitchen cupboards etc.)
- caching (remembering things or having things easily available),
- communication (messaging)
- searching (finding things quickly)
When we face a complex problem we can apply some of the learnings from computer science to tackle the problem in more efficient ways
You can get to “good enough” in a relatively short amount of time. Perfection often takes orders of magnitude longer
- Many problems in computer science and daily life are intractable - problems where there might be an ‘optimal’ solution, however, due to complexity it cannot be found in a reasonable time period
- Using approximations and good assumptions (based on educated priors) you can get very close to an optimal solution relatively quickly. Sometimes, knowing that you are using a good algorithm to get a roughly optimal solution is the best you can do
- If you wind up stuck in an intractable scenario, remember that heuristics, approximations, and strategic use of randomness can help you find workable solutions. Make sure you evaluate whether the extra effort to achieve the optimal result is worth it - in most cases it is not.
Cognitive load is a real issue. The underlying directive of any good algorithm is to minimise the labor of thought
- The best algorithims are all about doing what makes the most sense in the least amount of time.
- Effective algorithms make assumptions and show a bias towards simpler solutions, evaluate the cost of accuracy vs timeliness and sometimes even take chances
“Optimal stopping tells us when to look and when to leap. The explore/exploit tradeoff tells us how to find the balance between trying new things and enjoying our favorites. Sorting theory tells us how (and whether) to arrange our offices. Caching theory tells us how to fill our closets. Scheduling theory tells us how to fill our time."
”…the tactical dropping of balls is a critical part of getting things done under overload"
"…the best algorithms are all about doing what makes the most sense in the least amount of time, which by no means involves giving careful consideration to every factor and pursuing every computation to the end. Life is just too complicated for that."
"…people are almost always confronting what computer science regards as the hard cases. Up against such hard cases, effective algorithms make assumptions, show a bias toward simpler solutions, trade off the costs of error against the costs of delay, and take chances"
1. Optimal Stopping - When to Stop Looking
The best optimal stopping algorithms have surprisingly low success rates. Allow yourself a calibration phase to understand the search space but then be decisive when the next best option comes along
Optimal Stopping is analogous to a ‘look-then-leap’ scenario.
Time is an important factor in decision making. There is often a cost to taking too much time when making a decision. Consider whether the extra time spent searching all options to find the ‘best’ really better than a slightly less optimal (but still good) decision found in much less time.
When the number of possibilities is uncertain (or too large to completely iterate), allowing yourself a limited search or ‘calibration’ phase to build up knowledge of the search space will improve your ‘priors’ and get you ‘most of the way there’.
A famous optimal stopping problem is the Secretary Problem . The problem involves finding the best person for a secretary role from n applicants. Each candidate is interviewed in order and after each interview you have to make a decision to hire the candidate or not. The decision is irrevocable - i.e. at what point should you stop interviewing candidates and take the next best applicant you find (even if there is potentially a better one later in the list of applicants).
The best algorithm for solving this problem is only correct 37% of the time - surprisingly low!
The best stopping rule defines an arbitrary search period where you assess the search space before making any hiring decisions. After this period (i.e. the first 30 candidates), you should immediately hire the next applicant you see that was better than any of the previous applicants already interviewed.
Even the most ‘optimal’ algorithms yield surprisingly poor success rates for finding the best applicant. This demonstrates problems involving uncertainty are hard, even for computers! We should stop stressing about whether we have made the absolute best possible decision and accept an almost optimal solution is about as good as we can get.
2. Explore vs. Exploit - Latest vs Greatest
Give yourself enough time to explore. Exploration has a higher payoff than exploitation in the early stages. However, when time is running out, exploit what you know!
Explore vs Exploit is a central tenant of reinforcement learning (see multi-armed bandit problems and the Bellman Equation).
The value of exploration can only go down over time. The value of exploitation can only go up over time. Explore when you have time to use the resulting knowledge, exploit when you are ready to cash in.
The Gittin's Index provides a formal justification for favouring exploration early on, provided we have some opportunity to exploit the results later on. The act of searching is valuable in itself even if the payoff in the short term is not as big as an already found option. The Gittin’s Index shows that, early on, a previously untried option has a higher chance of a payoff than an option that you already know to payout.
Trying new things increases our chances of finding the best. Regret is the result of comparing what we actually did with what would have been best in hindsight. Exploration should reduce this eventual regret.
A/B testing is an example of exploration vs exploitation. You are risking worse performance for some customers in the hope the knowledge you gain will be more beneficial in the long run.
Be sensitive to how much time you have left. Explore early, exploit late. This can be seen in human psychology. The young are fickle whereas the old are set in their ways. This is perfectly logical when time is taken into consideration - the young are in their exploration phase, the old are exploiting. The old do not have the time left to exploit any new information/experiences they find, so why bother?
3. Sorting: Making Order
Sorting is ubiquitous in our daily lives. You sort something so it can be searched later. When sorting, err on the side of messiness - almost perfectly sorted is better than no sorting but significantly quicker than perfect sorting.
As obvious as it sounds - “You sort something so it can be searched later” - I never really consciously thought about it. Most of the time I think I am sorting things so they are ‘organised’ and not ‘messy’, rather than thinking “is this the best place for me to put things so I can find it later quickly?”.
Sorting is essential for working with almost any kind of information. Sorted items are ubiquitous in our daily life and we probably never notice. E.g. Email inboxes, todo lists, Google search, Twitter feeds, items in a kitchen cupboard etc..
Sorting is a problem humans have always faced and was the first major application for computers. The company IBM was created out of the need to sort records.
“Effort expended on sorting materials is just a preemptive strike against the effort it will take to search through them later”
There is a trade-off between searching and sorting. Whenever you sort something you are making an assumption about when and how often you will need to search that information at a later date. Sometimes the time and effort spent sorting is not worth the effort if you don’t need to regularly find that information/item.
Scale hurts. Sorting algorithms do not scale well. The best accurate searching algorithms, such as merge sort, have a time complexity of O(nlog(n)). Before starting a task you should think of the ‘worst-case’ scenario - what is the maximum amount of time you could end up spending on the task - it might not be worth it!
There are some strategies which can dramatically reduce the complexity of sorting at the expense of absolute accuracy. For example, Bucket sort. Sorting items into buckets (but unordered within buckets) will quickly achieve some level of sorting provided the number of buckets is relatively small compared to the number of items. This ‘messy’ sorting is much better than no sorting and will reduce the search space when you eventual try and look for a given item.
Err on the side of messiness - sorting something that you will never search is a complete waste; searching something that was never sorted is only inefficient.
Generally it is a waste of time to sort email inboxes into folders. The number of times you actually need to find an email is low compared to the effort of sorting in the first place. And even when you need to find something there is a global search function.
Competitions are an application of sorting. Pairwise comparison (e.g. knockout competition) is inefficient. Having a benchmark (e.g. fastest times) removes the need for pairwise comparison and reduces the computational complexity.
4. Caching - Forget About it
Least Recently Used (LRU) algorithm is the most effective form of caching. Keep documents or items you used recently close to you as you are most likely to use them again in the near future
Caching is about storing something for future use.
Taking out a library book is an example of caching. Rather than going to the library every day to read it, you check it out so you can read it at home without the ‘overhead’ of travelling to the library. We do this without thinking - it is a logical thing to do.
“Cache management is to minimise the number of times you can’t find what you’re looking for in the cache and must go to the slower main memory to find it”
The modern internet uses CDNs (content delivery networks) to store a cache of website content closer to users.
The Noguchi filing system is simple but surprisingly efficient. Noguchi filing is a left-side insertion rule: every time you pull out a file to use its content, you put it back as the leftmost file when you return it to the box. The most recently accessed files are fastest to find.
Tossing things back on the top of the pile is actually one of the most efficient filing systems for finding content, shy of knowing the future. ‘Messy’ filing is actually a near optimal solution!
“the fundamental challenge of memory really is one of organisation rather than total storage” - memory is more about your ability to easily find something when you need it than it is about total storage.
Cognitive decline - lags and retrieval errors - in human memory as we age may not be about the search process slowing or deteriorating, but could be (partly) an unavoidable consequence of the amount of information we have to navigate getting bigger and bigger.
5. Scheduling - First Things First
Optimal scheduling is hard and sometimes impossible. Identify blockers, beware of context switching overheads and use heuristics to identify which tasks to complete first. Don’t stress about it - starting in the wrong order is better than not starting at all!
Before you can have a plan, you must first choose a metric, which metric you choice will directory affect which scheduling approaches are best.
In computer science, there is a concept called blocking . Blocking occurs when the main task is ‘blocked’ because resources are being used by another, less important, task. In which case, the main priority should be to clear the trivial task rather than waiting it out and returning to the main task.
Similarly, in day to day life, sometimes that which matters most cannot be done until that which matters least is finished. There is no choice but to treat that unimportant thing as being every bit as important as whatever it is blocking. Make sure you identify any trivial blockers and get them out of the way!
Most scheduling problems have no easy solutions. Stressing over the most optimal workflow or schedule may actually be slower than just starting, even if it is in a sub-optimal fashion. Doing something in the wrong order is better than doing nothing at all.
Context switching comes with an overhead - sometimes it is better to just get what you are currently doing done before moving to a ‘higher priority’ task.
Optimal scheduling is hard, don’t strive for perfection, but you can use some algorithmic strategies to improve your changes. E.g. Weighted shortest processing time - divide a task’s importance by the amount of time it will take to complete. If that figure is higher than for the task you are currently doing, switch to the new one.
6. Bayes' Rule - Predicting the Future
Sometimes we don’t need many data points to make good decisions. Knowledge of a couple data points (priors) and their expected distribution (e.g. normal, exponential, binomial etc.) is extremely valuable
Even when we have only a few observations - or only one - it offers us practical guidance for predicting future events
Copernican Principle: If we want to predict how long something will last, and have no other knowledge about it, the best guess we can make is that it will continue just as long as it has gone on so far.
Knowing what distribution you are up against can make all the difference and allow you to make better guesses in the absence of hard data.
Good priors are essential for good predictions. Make sure your good priors collected from your own experiences are not biased by others. E.g. Negative news about crime may unfairly bias you own experience of low crime in your own neighbourhood.
7. Overfitting - When to Think Less
Considering too many options when making a decision can lead to overfitting. We can make better decisions by applying regularisation - deliberately thinking and doing less.
In machine learning overfitting is a very common concept. Considering more and more factors for a model can actually lead to worse performing models. We should apply this thinking to our daily lives. Over analysing a situation or decision can actually be counter-productive.
Giving yourself more time to decide about something does not necessarily mean you will make a better decision. But does guarantee you will end up considering more factors, more hypotheticals, more pros and cons and thus risk overfitting.
We can make better decisions by deliberately thinking and doing less (regularisation).
8. Relaxation - Let it Slide
If a problem is difficult, try relaxing the constraints and solve a simpler problem. You can apply these learning to the harder problem or might find that the solution to the simpler problem is sufficient.
If you can’t solve the problem in front of you, solve an easier version of it. The solution to the easier problem might provide additional insight for solving the original solution or at least provides an almost optimal solution which might be ‘good enough’.
“There are consequences to everything, you just decide whether you want to face those consequences” - when applying constraints to problems think of the ‘cost’ of the constraint. Is the penalty/cost actually tolerable.
There are three effective techniques for relaxing a difficult problem:
- Constraint Relaxation - remove or add constraints and make progress on a loser form of the problem before coming back.
- Continuous Relaxation - turn discrete or binary choices into continua.
- Lagrangian Relaxation - turn impossibilities into penalties (bending the rules), sometimes the penalties are worth it.
9. Randomness - When to Leave it to Chance
Sampling is a very effective method for learning about an unknown quantity. If you are stuck, try sampling the quantity with trial and error to learn more about it.
Sampling can be a highly effective method for estimating a quantity, particularly when it is complex and has a probabilistic component.
There will always be an error associated with sampling, but if you take many samples this can be minimised.
Sampling can succeed where analysis fails. You can never be certain but you can be awfully close.
10. Networking - How we Connect
In our modern world of technology we are ‘always buffered’ - every message we receive is recorded and stored for future processing. This leads to a build up of messages which takes cognitive effort to clear. Tactical dropping of messages can dramatically reduce cognitive effort
Networking in computer science concerns the way computers send messages to each other. For example, which protocol to use TCP/HTTP etc. and how those protocols relay information.
“the tactical dropping of balls is a critical part of getting things done under overload”
Modern technology has meant we are ‘always connected’, people can reach us anywhere, anytime. This can lead to communication overload. However, the main problem is not that we are always connected, it is that we are always ‘buffered’.
Historically, if someone wanted to contact us via letter or telephone call (before voicemail) if we didn’t receive the letter or call, the message was ‘dropped’. If it was important the sender would try again.
However, today if we miss a message it goes into our ‘buffer’ for us to deal with later. E.g. Email inbox, list of notifications etc. We have to expend cognitive effect to deal with even the most trivial notification than Facebook sends us.
Tactically dropping messages that aren’t important will dramatically reduce cognitive effort and workload
11. Game Theory - The Minds of Others
People do not always act rationally. Be wary of information cascades where you are acting based on another actors actions rather than knowing why they are doing those actions.
If the rules of the game force a bad strategy, may we shouldn’t try to change strategies, we should try to change the game. Reducing the number of options can make decisions less computationally challenging and also lead to better outcomes.
For example, organising a meeting with someone it is better to suggest a couple of arbitrary times rather than say let me know when is convenient. In the latter case, the recipient has to use mental energy to choose a time from their dairy (potentially also worrying about whether it would work for you). By limiting the options it becomes easier for the recipient and you are more likely to get a reply as it is less effort for them to decide.
When interacting with others where you need them to make a decision, you should narrow their choices to have a higher chance of a quicker response.
‘Information cascades’ can lead to rational decision making. Be wary of cases where public information seems to exceed private information, where you know more about what people are doing rather than why they are doing it.
People’s actions are not necessarily their beliefs. Information cascades get caused in part when we misinterpret what others think based on what they do.