Thoughts on "Markets Are Efficient if and only if P=NP", Part 1

Ruminations on an essay relating the efficient market hypothesis and the P = NP problem, including discussions of the methods used

By: TheHans255

5/28/2023

View Part 2 here

This blog post is a response to the 2010 paper "Markets are efficient if and only if P = NP" by Philip Maymin. The paper's synopsis is as follows:

I prove that if markets are weak-form efficient, meaning current prices fully reflect all information available in past prices, then P = NP, meaning every computational problem whose solution can be verified in polynomial time can also be solved in polynomial time. I also prove the converse by showing how we can "program" the market to solve NP-complete problems. Since P probably does not equal NP, markets are probably not efficient. Specifically, markets become increasingly inefficient as the time series lengthens or becomes more frequent. An illustration by way of partitioning the excess returns to momentum strategies based on data availability confirms this prediction.

The paper has never been published to my knowledge, but is available on Arxiv here: https://arxiv.org/abs/1002.2284

ALSO: A DISCLAIMER: I AM NOT A FINANCIAL ADVISER AND CLAIM NO SPECIAL KNOWLEDGE OF THE FINANCIAL SECTOR. Most of my understanding of finances comes from self-study of asset classes and experience with my own funds, and it is quite possible that this understanding is partially or completely wrong. While I aim to provide correct information for the best of my ability, I do not intend to encourage or discourage any specific financial decisions in this post, and do not take responsibility for any financial decisions thus made.

I first learned about this paper in 2019 from a coworker. I did not have the chance to read it fully at the time, but I was intrigued by its claim that it could connect the free market, a prevalent concept in American economy and politics, to the P = NP problem, a notoriously difficult computer science problem that has incredible implications in computation and cryptography. In particular, I was impressed by its claim that the two were apparently pitted against each other:

The majority of financial academics believe in weak form efficiency and the majority of computer scientists believe that P ≠ NP. The result of this paper is that they cannot both be right: either P = NP and the markets are weakly efficient, or P ≠ NP and the markets are not weakly efficient. (Maymin 4)

Earlier this year, I took the time to read the paper in full and analyze its methods. While I would say the connection is still interesting, I don't really believe the result is valid, which would explain why the paper had never been published. In this post, I will discuss my analysis, while in Part 2, I will discuss some of my own ideas for how this relationship could be tested.

Background

Of the 30 or so double-spaced pages of the paper, the first 10 or so are spent providing background for the efficient market hypothesis and the P = NP problem, including definitions, literature review, and expert polling. I will paraphrase the definitions below.

The efficient market hypothesis claims that "all information relevant to future prices is immediately reflected in the current prices of assets". In other words, you cannot consistently make money by analyzing and exploiting patterns found in publicly available information. This hypothesis has multiple forms, and this essay focuses on the weakest form, which claims that future prices cannot be predicted by analyzing prices from the past (though other information is fair game).

The mechanism where this is specifically explored is the ability to find and exploit patterns. Essentially, if being able to find a consistent strategy for making money is difficult and inefficient, then a single actor in the market would find it and continually use it to make money. However, if more people are able to find the strategy, the strategy itself is affected and eventually collapses, ruining the effectiveness of technical analysis.

The P versus NP problem concerns itself with two "computational complexity classes" of problems:

P, short for "Polynomial", is the set of problems that a computer can solve from scratch in time polynomial to the length of the input n. For instance, sorting a list of n elements can be done by comparing every element to every other element, for a total of n² comparisons.
NP, short for "Nondeterministic Polynomial", is the set of problems for which a solution that has already been found can be verified in time polynomial to the length of the input n. For instance, sorting a list of n elements is also in NP, since you can run through the proposed sorted list and verify that everything is indeed in order. Another problem that we know is in NP is factoring numbers - we don't actually know how efficiently we can figure out the prime factorization of an arbitrarily large number, but if we are given a proposed prime factorization, we can quickly tell if it's right by doing long division.

The P versus NP problem asks if these two sets are equal, rather than P just being a subset of NP. There are a great many problems that we know to be in NP (such as the traveling salesman problem, knapsack problem, and graph coloring problem) that we do not know whether they are in P - learning that P = NP would mean that they could all be solved easily. Indeed, there is a set of problems in NP, known as NP-Complete problems, that can all be efficiently restated in terms of each other - if a polynomial-time solution to even one of them is found, then P = NP.

Analysis of the Paper

The paper argues its claim from two directions. It first argues that being able to efficiently find patterns in the market is an NP-Complete problem, meaning that the markets are efficient if P = NP. It then argues that if the market is efficient, then the market could be "programmed" with carefully placed orders to solve an arbitrarily large problem in 3-SAT, a known NP-Complete problem.

Analyzing the "Market Programming" Argument

We will start by discussing the second argument because, frankly, it is laughably bad. It does a fair enough job of setting up its mechanism: each variable of the 3-SAT expression is represented as a stock, and each clause is encoded by placing an "order-cancels-order" order, which groups together three buy or sell orders and requires that only one of them be filled. For instance, a clause of (a OR NOT b OR c) might be encoded as "buy 100 shares of KO, sell 100 shares of PEP, or buy 100 shares of KDP, order-cancels-order".

It places one of these orders for each clause, setting the price at the midpoint and setting the volume to the minimum possible in order to minimize stock disruption, proposes waiting a constant amount of time in order to give the market time to process the orders - and then gives a punchline that fails to expound on any of the market's actual mechanisms and instead reeks of smarm:

So what should the market do? If it is truly efficient, and there exists some way to execute all of those separate OCO orders in such a way that an overall profit is guaranteed, including taking into account the larger transactions costs from executing more orders, then the market, by its assumed efficiency, ought to be able to find a way to do so. (Maymin 25)

By itself, this suggests that the purpose of the paper is to defeat the efficient market hypothesis, not to simply provide an equivalence between the two as originally stated. More importantly, however, this response hides any of the pitfalls that might arise when giving the problem to solve to the market like this - in particular, the fact that since the market consists of multiple independent entities, the most rational decision for the market to make would be to execute orders in an inconsistent manner (i.e. taking one clause's buy order while taking a different clause's sell order for the same security).

Analyzing the "Efficient Analysis" Argument

The first argument, by contrast, is a little better thought-out. It focuses on a more specific aspect of the efficient market hypothesis, namely that if there is a particular pattern that can be found in the price data, it can be found "immediately". For instance, if it can be calculated that a set of three UP days is followed by another UP day far too often (and thus money can be made by buying on day 3), this pattern will be found quickly. (This is paired with the "no free lunch" assertion, which states that if the pattern becomes public knowledge, people will trade on it earlier and earlier and thus destroy its advantage).

To perform this prediction, we can first generalize a run of price change data (that is, the prices of a security at regular intervals along a given time scale) as simple UP or DOWN movements (e.g. a sequence of UP, UP, DOWN, UP, DOWN, UP, DOWN, etc.). We can then feed this sequence into a computer and ask it whether we should buy or sell (i.e. go long on the security or get out of it). If the sequence is kept to a fixed time interval t, then there are 2^t possible inputs, meaning that there are 2^t+1 total possible buy/sell recommendation programs you could write (since each one will map each of those price sequences to either going long or going out). We will call these programs "strategies".

We then take a critical profit value K (the amount of money that a strategy should make for it to be statistically significant, which would depend on the market's equilibrium), and a budget constraint B (the amount of capital you have available to spend), and ask the question: is there a strategy that, given a collection of n price series each of length t, makes at least K on each of them without ever exceeding the budget constraint B?

This is shown (mostly) to be equivalent to the knapsack problem, which is NP-Complete - each object is one of our time-series, whose size is the price of the security involved and whose value is the security's return, and the set of objects we put in our knapsack is the set of time-series that result in a "go long" decision. Hence, according to the paper, if P = NP, then we can efficiently find strategies even as the time series lengthen out and we consider more and more of them.

This argument kind of works. However, there are some issues with the argument that significantly weaken it.

The introduction of a budget constraint is an assumption drawn both from needing to consider multiple securities and having no access to leverage/debt. Any cursory look at how move-fast-and-break-things businesses operate (and the United States' gigantic debt ceiling) would tell you that this assumption is false - businesses and traders take seemingly endless debt all the time. (The paper does show that if this budget constraint does not exist, the problem of finding an efficient mapping is in P, which, if everything else about the argument works, suggests that the relative confidence of economists in the efficient market hypothesis comes from how easy it is to find strategies when you have endless cash to play with).
The analogue to the knapsack problem is not quite correct - the knapsack problem requires that the sum of the included items be less than the budget constraint, while this problem requires only that the maximum cost from any one time series be less than the budget constraint. I believe this is still NP-Complete, though I'm not sure.
The critical profit K is meant to encode all assumptions of the market's equilibrium. While the NP-completeness of finding a statstically significant pattern does not depend on how K is chosen, there is still the fact that choosing K is left as an exercise to the reader, and may indeed only be able to be approximated and never calculated exactly.
The computational ease of finding a consistent money-making technical strategy is only a problem if such a strategy exists, since the efficient market hypothesis is only challenged if some agents, by luck, are able to find a pattern that no one else can (according to the "no free lunch" assertion). Even if it is intractable for us to know whether or not there is an effective technical strategy, if there isn't one, then no one is actually getting a technical advantage and the efficient market hypothesis continues to hold.
Indeed, the ability to find a consistent technical strategy only matters if the "no free lunch" assertion is true! That is, if agents in the market don't actually try trading on a strategy earlier and earlier in order to profit from it (or worse, if the additional piling on of agents does not cause the strategy to collapse), then the ability to efficiently calculate patterns would not uphold the efficient market hypothesis. (I think this one is the least likely to be true, but should be mentioned anyway unless an emprical proof for this assertion exists).

That's it for Part 1! In Part 2, I will go further into a possible avenue for how to analyze the relationship between these two concepts, one involving simulation of agents in an actual market and the computational complexity in reacting to a piece of information that changes the value of a security.