A Dissection of NKS

Stephen Wolfram, myself, and coworkers in Harvard

Stephen Wolfram, myself, and coworkers in Harvard

Wolfram’s A New Kind of Science (NKS) probably tops the list of computer science monoliths accessible to everyone. Unfortunately, this work doesn’t help itself with the grandiose appellation (although it did improve sales), and in many academic circles, the work was disparaged or shunned. However, Stephen is one of the most brilliant people I have ever worked with. If you can look past the seeming hubris of the cover page, the book does contain some very interesting (if not paradigm shifting) ideas which I will try to explain in the fewest strokes possible.

Background

Every year since 2003, Stephen Wolfram and his group of instructors from Wolfram Research organize a summer school, and in 2010 and 2011, I participated as a student and instructor, respectively. Since 2003, more than 250 people have participated, some of whom continued developing their 3-week research projects as their Master’s or Ph.D. theses or publications.

The primary subject of NKS is the study of simple abstract rules, that is, elementary computer programs. In 1988 Wolfram discovered a class of one-dimensional cellular automata which he called the elementary cellular automata (ECA). Simple programs are capable of a remarkable range of behavior and NKS argues that this is evidence that simple programs are enough to capture the essence of almost any complex system.

Generally speaking, NKS has three interpretations:

  1. The experimental science and enumeration of simple programs

  2. The technological application of interesting simple programs

  3. A razor for thinking of the universe in computational terms

The central thesis of NKS posits two ideas: that the nature of computation must be explored experimentally, and that the results of these experiments have relevance to understanding the natural world - which is assumed to possess fundamentally digital properties. The author conjectures that the universe can be modeled by mobile automata (Turing spider) crawling over some graphical structure at subatomic level - perhaps far below Quark scales).

In physics and cosmology, digital physics is a collection of theoretical perspectives based on the premise that the world is, at heart, describable by information, and is therefore computable. But in fact it was Edward Fredkin and Konrad Zuse who long ago co-pioneered the idea of a computable universe. This is not to be confused with the term computational universe, which is the space of all computer programs.

 

Mapping and mining the computational universe

In order to study simple rules and their often complex behavior, Wolfram believes it is necessary to systematically explore all of these computational systems and document what they do. Traditional wisdom holds that the real world is complex, but that in formal deterministic systems everything is simple. Rule 30 shows that this is not the case. The richness of the world can be faithfully reproduced within simple symbolic systems or simple programs.

The randomness displayed in Rule 30 is a completely deterministic one. Indeed, in NKS randomness almost always refers to the outcome of a deterministic process. Now, there exist are three mechanisms for randomness:

  1. Randomness from initial conditions

  2. Randomness from the environment

  3. Intrinsic randomness generation

Sensitive dependence on initial conditions in deterministic nonlinear systems has been well studied since Henri Poincaré introduced the three-body problem by in 1890. Randomness from the environment is not a true source of randomness because the environment is just another system. That is, the external environment of one system is the internal environment of a larger system. The third type of randomness was investigated by Wolfram and encountered on numerous occasions in his exploration of the computational universe. A common question is “why wasn’t intrinsic randomness discovered earlier?” Page 48 replies: “Indeed, even very early in the history of traditional mathematics there were already signs of the basic phenomenon of complexity. One example known for well over two thousand years concerns the distribution of prime numbers: the rules for generating primes are simple, yet their distribution seems in many respects random. However almost without exception mathematical work on primes has concentrated not on this randomness, but rather on proving the presence of various regularities in the distribution.”

The NKS strategy for exploring the computational universe can be summarized in six steps:

  1. Identify a computational system to explore

  2. Devise an enumeration scheme for your system

  3. Construct an efficient implementation of your system

  4. Carry out a visual exploration of the space of system behavior

  5. Use machine learning to build detectors and filters for programmatic searching

  6. Investigate in more detail specific features and properties of selected rules

 

Computational irreducibility

Loosely speaking, a process is computationally irreducible [p 737-750] if it cannot be computed by a faster process. An example of reducibility is Rule 90. Given the initial condition and the coordinates of some cell, there is a quick procedure [p 608-609] to determine whether the cell will be black or white. On the other hand, irreducibility means that there can be no way to predict how a system will behave except by going through almost as many steps of computation as the evolution of the system itself. [p 739] Rule 150 is computationally reducible because there exists a program that computes the color of a cell on row T faster than the automaton itself can compute it. Rule 30 is computationally irreducible because (as far as anyone knows) there is no such ”shortcut” program. But we must ask if intrinsic randomness and computational irreducibility are the same thing. I think that computational irreducibility implies intrinsic randomness generation. But intrinsic randomness generation does not imply computational irreducibility, because one can imagine modifying rule 30 to slow it down. Wolfram posits a conjecture on computational irreducibility:

Theorem - Let WorstTime(T, n) be the longest time that Turing machine T takes to halt over all inputs of length ≤ n on which it does halt. (See computations of this, pp. 761, 763.) T is reducible if some Turing machine U with the same inputs and outputs which has

WorstTime(U, n)/WorstTime(T,n) → 0 as n → ∞. 

Then as the number of states grows infinite the fraction of reducible Turing machines vanishes.

The previous conjecture is an instance of the more general Principle of Computational Equivalence (PCE). The principle states that systems found in the natural world can perform computations up to a maximal universal level of computational power. Most systems can attain this level and in principle can compute the same things as a universal Turing machine.

  • Weak PCE: Almost every rule is either trivial or universal.

  • Strong PCE: Almost every computation is either trivial or universal.

Irreducibility is present within the theory of graphs, for instance, the theory of finite cubic planar graphs is undecidable. Consider a Turing machine with a two way tape single head s1, s2, · · · , sn. Let the tape have a certain initial configuration and allow the machine to run. Assume the machine performs a finite computation and halts. The main idea is to code the printout of all such computations as finite cubic planar graphs. Notice that in order to do this for a given machine we need only a fixed number of configuration to code our information i.e. black, s1, s2, · · · , sn, head, lest end of the tape, right end of the tape and halt. The requirement that the graphs be cubic appears to offer no more difficulty than that of guaranteeing this fixed number of unambiguous configurations. Thus, we can identify certain vertices with squares of the tape while the others exist in configurations describing what is in those squares and how the machine moves.

Theorem - The theory of finite cubic planar graphs is undecidable.

Proof: Consider a Turing machine as above. For each possible beginning position of the tape consider the sentence in the language of graphs which says

[there exists (x1, ..., xn) where xi are in the configurations which represent this initial position] implies that [there exists (z1, ..., zm) where zi are in the halt configuration].

A solution to the decision problem for finite planar cubic graphs would imply a solution to the collection of all sentences (*), but these simply express the halting problem for the machine, thus proving the theorem. QED

Note that the PCE is not about functions or sets but rather concerns repeated application of some updating procedure to some state, and the history of states that results. In these terms, PCE states: For any updating procedure, either that procedure always yields simple results, or that procedure can be made to emulate any computation via an appropriate choice of initial conditions. Computation is therefore simply a question of translating input and outputs from one system to another. Consequently, most systems are computationally equivalent. Proposed examples of such systems are the workings of the human brain and the evolution of weather systems.

Originally, in 1936, Alan Turing proposed that all functions which could be effectively computed by a human could also be computed by a Turing Machine. (Of course, the converse is obviously true anything computed by a Turing Machine can be computed by a human.) Since that time, Turing Machines have become widely accepted as the formal definition of computation, and Turing’s claim that this definition conforms to our intuitive understanding of the concept of computation is known as the Church-Turing Thesis. Note that the Church-Turing Thesis, as all of science, cannot be definitively proved. It is an empirical claim supported by observation, like Newton's Law of Gravity or the Laws of Quantum Mechanics. Just as it is conceivable that those laws could be proved wrong by future observations, so too could the Church-Turing Thesis be proved wrong by future observations (which it probably will).

Ok so in every system, there is some hidden threshold for complexity. And although it is often low, there are many things that fall below it and can therefore be reduced in one form or another. Traditional mathematics has very successfully explored what lies below this threshold. But most systems from traditional mathematics are in fact capable of showing irreducible complexity. In particular, the digits of real numbers, continued fractions, Diophantine equations, differential and difference equations, solvability by radicals, etc. But phenomena such as the complexity of the digits of π were just considered meaningless rather than indicative of a general class behavior.

We know there is no general procedure do determine if a computation halts. Nevertheless, there is a partial solution: if a computation does halt, then can we always verify that it halts by explicitly performing the computation and waiting. But could there be a more efficient procedure of which we are unaware? Wolfram suggests that the answer is no. One of his central claims is that there exist computational systems whose behavior is computationally irreducible, meaning that there is no shorter way to determine the system’s behavior than by explicitly following its computation. If computationally irreducible systems do indeed exist, then there is no faster way to determine whether an arbitrary computation halts, then to explicitly perform that computation.

Computational Universality

Wolfram discovered the ECA decades earlier, and that these simple systems are universal was in my opinion, the most interesting point in the book. However, the PCE and many other claims in the book hinge on the strength of classical computation and that any universal system is equivalent to Turing machines in computational power/

Whenever a system is equivalent to Turing machines in its computational power, we say that it is computationally universal

Here the name universal derives from the Church-Turing thesis, which claims Turing machines attain the upper limit of computational sophistication i.e. they can compute all that can in principle be computed.

The Strong Church-Turing thesis states the equivalence between the mathematical concepts of algorithmic computation and the Turing machine. It asserts that all universal computers can efficiently simulate each other.

The technique used to show that one system is equivalent in computational power to another system is called emulation. The following graphic shows a cellular automaton (on the right) emulating a particular Turing Machine (on the left).

popup_1.jpg

To perform the emulation, there are three key ideas:

  1. Associate the Turing machine’s tape with the cellular automaton’s line of cells.

  2. Associate the white and black cells in the Turing machine with white and light-gray cells in the cellular automaton.

  3. Identify the Turing machine’s head with a single cell in the cellular automaton, where the color of the cell encodes the state of the head together with the color that the head is reading.

The Turing machine halts when the head (or the cell simulating the head) moves left of its initial position. It is clear that the behavior of this cellular automaton corresponds directly to the behavior of the Turing machine.

Having had a small taste of the intuition behind emulation, here’s the full proof that cellular automata are universal (which was the big result in NKS):

Two colors and nearest neighbor rules are sufficient for producing universality in a 1-dimensional cellular automaton (2002, pp. 675-691). The rule 110 elementary cellular automaton was proven universal by Matthew Cook in 2004.

So even one-dimensional cellular automata can be universal, which is neat but sadly, the philosophical ramifications are not quite as astounding as Wolfram thinks.

Where it Falls Down

r.jpg

The real reason Wolfram’s core premise breaks down is because of Quantum Supremacy. Stephen believes that practical quantum computation is impossible, and no amount of error correction will allow scaling beyond some tens of Qubits.

To be fair, supremacy hasn’t actually been established (yet). But to make this conclusion now is far too hasty. Indeed, Google has already constructed a such a processor with 72 Qubits! I predict that in the next few years the Strong Church Turning Thesis will be disproven and general purpose quantum will soon solve many search problems exponentially faster than our quaint central processing units.