There are many people that seem to believe that the solution to every problem in economics can be solved by removing regulation and "letting the markets decide". Other people disagree and will produce all sorts of "hand waving" arguments to explain why that's sub-optimal. I recently realized that a certain, reasonably well studied, statistical conundrum has a striking parallel to a problem in economics and its study could remove the need for some of these "hand waving" arguments and replace them with mathematical proof. The conundrum is known as "

the multi-armed bandit problem" - but before I explain what it is, or its solution, I'd better explain the problem in economics that I believe it so neatly parallels.

The problem in economics is this: who should make new product X. Communists might say "lets have an expert government committee choose a single company Y and only allow them to make it" whereas the free marketeers would say "let multiple companies A, B, C, D and E make it and the market will decide which is the best and let the others go bust - and for gods sake don't let the government interfere with this process!".

Now I will introduce the statistical conundrum. Its called the multi-armed bandit problem:

Imagine you have a collection of one armed bandits in a casino. Each one has a certain "payout rate" which corresponds to the percentage of the money paid in to it, that it will pay out (in the long run). In real casinos this is often set at something like 80-90 percent, but imagine that this particular model of one armed bandit can be set to a any predefined payout rate (0% to 100%) using a dial inside the machine that the casino owner can set with a screwdriver. Now let’s say that one night the casino owner comes in and sets each bandit to a different payout rate, no two are the same. Now you arrive the following morning with a great big bag of coins. You are determined to spend the whole day playing on these bandits and you have complete freedom to choose which ones you play on... you are allowed to switch from one to another at will. Now the question is, what is your strategy for selecting bandits such that you come home with the greatest winnings?

I'll give you one possible solution: put 50 coins in each of the bandits in order to make an estimate of the payout rates. Then stick to the one that appeared to have the highest rate for the rest of the day. This solution is certainly better that simply selecting the bandits at random, but can be mathematically proven to be sub-optimal, i.e. there are known strategies that will lead to greater winnings. One problem with this strategy is that if two bandits paid out rather good, but very similar, amounts than it may not be very clear which is better. It may be more profitable to continue playing these two for a greater number of trials to gain more confidence in your determination of which is the best one. The problem illustrates what is known as an "exploitation-exploration dilemma". The "exploration" referring to the effort exploring which bandit may be the best (e.g. the 50 coin trial at the start) and the "exploitation" refers to simply repeatedly playing the bandit which you estimate is the best.

I believe this conundrum is analogous to the process of choosing companies to make products in a free market. The bandits are like the companies, the payouts are like the goods and the gambler is like the public, choosing the “company” that produces the best “goods”. At the start of the process the gambler/public does not know for sure who can make the best version of product X so he must try each one. Then, if it becomes obvious that some companies are better than others, the known bad companies will cease to be tried (= “go bust”) while the still-possibly-best will get tried some more.

Now there is one more complication that needs to be added to the standard multi-armed bandit problem to make it even more analogous to real life business. There is a variation called the “restless bandit problem” where the payout rates are not fixed but rather, evolve over time. This is more like a real company where the management and employees will change over time. Their manufacturing equipment may wear our, break or become redundant and a host of other things may happen that will change the ability of the company to produce good products. Now in the restless bandit problem it is essential to do more “exploration” than in the case of the standard multi-armed bandit problem. You would never want to entirely give up trying a previously poorly performing bandit because it may have now evolved into a better performing bandit.

It can be mathematically proven that for the restless bandit problem, a strategy of “playing all for a short while and then exclusively playing on the one that appeared best forever more” is a sub-optimal strategy. There is too little “exploration”. It is sub optimal for at least two reasons.

1. You may be mistaken in your estimate of which one is the best (how could this be?

**)

2. The true best bandit may change over time.

This result has important implications for free marketeers. I believe it proves that the free market is sub optimal. This is because a free market acts like the “too little exploration” strategy. The thing is that in a free market, companies that fall short of producing the best goods tend to go bust even if they only fall short by a small margin. Obviously when a company goes bust, it can never be “tried” again, it doesn’t get a second chance. The succeeding company (or very small number of companies) tends to grow and dominate the market. Once a company dominates a market then it can start to raise its prices and employ a plethora of strategies to suppress rivals, that have nothing to do with producing the best goods for the consumer. For example:

• Tying up exclusive distribution channels

• Using your size to get raw materials for less than any new rivals can

• Using your size to negotiate higher prices from retailers than any new rivals could.

These factors make the “too little exploration” strategy even more sub-optimal in the restless bandit domain because it’s as if, as soon as we make up our minds and settle on the bandit that we think is best, it almost certainly reduces its payout rate.

Now in the exploration-exploitation dilemma, it is perfectly possible to do

too much exploration. In the extreme that would be like playing all of the bandits equally often. And this can easily be proven to be sub optimal. So there is a balance to be struck.

In the real world there are many things that could be done to make sure that there is enough “exploration” in an economy, many of which are already in place to a greater or lesser extent in many countries around the world. Any laws that aim to prevent monopolies or encourages overly large companies to split in to smaller parts are a good start. So, one might say that effectively the world is already aware of the problem. But I hope that this article A) gives some mathematical support for these kind of policies and B) proves that free market fundamentalism is a sub-optimal strategy.

---

** Say you have 2 bandits A and B. A has a payout rate of 60% (in the long run), B has a payout rate of 70% (in the long run). If the sample you have measured so far is small (e.g. 10 coins or so) then it is very easy for A to have paid out more than B just by fluke. The same kind of "mistake" can happen in the economic world with two companies. Say you have two companies A and B. Say that company A is currently fundamentally better than company B, it's management are smarter, it's workers are more hard working etc and in the long run, given a choice of 10 yet-to-be-invented products to make, it would make 9 of them better than B would. Unfortunately the first of these products to be invented was the one product it makes worse than B so the consumer would incorrectly guess that B was the better company.