## The Investors vs. the Tabby

There's an amusing article making its rounds of the internet today, about the successful investment strategy of a cat named Orlando..

A group of people at the Observer put together a fun experiment.
They asked three groups to pretend that they had 5000 pounds, and asked each of them to invest it, however they wanted, in stocks listed on the FTSE. They could only change their investments at the end of a calendar quarter. At the end of the year, they compared the result of the three groups.

Who were the three groups?

1. The first was a group of professional investors - people who are, at least in theory, experts at analyzing the stock market and using that analysis to make profitable investments.
2. The second was a classroom of students, who are bright, but who have no experience at investment.
3. The third was an orange tabby cat named Orlando. Orlando chose stocks by throwing his toy mouse at a
targetboard randomly marked with investment choices.

As you can probably guess by the fact that we're talking about this, Orlando the tabby won, by a very respectable margin. (Let's be honest: if the professional investors came in first, and the students came in second, no one would care.) At the end of the year, the students had lost 160 pounds on their investments. The professional investors ended with a profit of 176 pounds. And the cat ended with a profit of 542 pounds - more than triple the profit of the professionals.

Most people, when they saw this, had an immediate reaction: "see, those investors are a bunch of idiots. They don't know anything! They were beaten by a cat!"
And on one level, they're absolutely right. Investors and bankers like to present themselves as the best of the best. They deserve their multi-million dollar earnings, because, so they tell us, they're more intelligent, more hard-working, more insightful than the people who earn less. And yet, despite their self-alleged brilliance, professional investors can't beat a cat throwing a toy mouse!

It gets worse, because this isn't a one-time phenomenon: there've been similar experiments that selected stocks by throwing darts at a news-sheet, or by rolling dice, or by picking slips of paper from a hat. Many times, when people have done these kinds of experiments, the experts don't win. There's a strong implication that "expert investors" are not actually experts.

Does that really hold up? Partly yes, partly no. But mostly no.

Before getting to that, there's one thing in the article that bugged the heck out of me: the author went out of his/her way to make sure that they defended the humans, presenting their performance as if positive outcomes were due to human intelligence, and negative ones were due to bad luck. In fact, I think that in this experiment, it was all luck.

For example, the authors discuss how the professionals were making more money than the cat up to the last quarter of the year, and it's presented as the human intelligence out-performing the random cat. But there's no reason to believe that. There's no evidence that there's anything qualitatively different about the last quarter that made it less predictable than the first three.

The headmaster at the student's school actually said "The mistakes we made earlier in the year were based on selecting companies in risky areas. But while our final position was disappointing, we are happy with our progress in terms of the ground we gained at the end and how our stock-picking skills have improved." Again, there's absolutely no reason to believe that the students stock picking skills miraculously improved in the final quarter; much more likely that they just got lucky.

The real question that underlies this is: is the performance of individual stocks in a stock market actually predictable, or is it dominantly random. Most of the evidence that I've seen suggests that there's a combination; on a short timescale, it's predominantly random, but on longer timescales it becomes much more predictable.

But people absolutely do not want to believe that. We humans are natural pattern-seekers. It doesn't matter whether we're talking about financial markets, pixel-patterns in a bitmap, or answers on a multiple choice test: our brains look for patterns. If you randomly generate data, and you look at it long enough, with enough possible strategies,
you'll find a pattern that fits. But it's an imposed pattern, and it has no predictive value. It's like the images of jesus on toast: we see patterns in noise. So people see patterns in the market, and they want to believe that it's predictable.

Second, people want to take responsibility for good outcomes, and excuse bad ones. If you make a million dollars betting on a horse, you're going to want to say that it was your superiour judgement of the horses that led to your victory. When an investor makes a million dollars on a stock, of course he wants to say that he made that money because he made a smart choice, not because he made a lucky choice. But when that same investor loses a million dollars, he doesn't want to say that the lost a million dollars because he's stupid; he wants to say that he lost money because of bad luck, of random factors beyond his control that he couldn't predict.

The professional investors were doing well during part of the year: therefore, during that part of the year, they claim that their good performance was because they did a good job judging which stocks to buy. But when they lost money during the last quarter? Bad luck. But overall, their knowledge and skills paid off! What evidence do we have to support that? Nothing: but we want to assert that we have control, that experts understand what's going on, and are able to make intelligent predictions.

The students performance was lousy, and if they had invested real money, they would have lost a tidy chunk of it. But their teacher believes that their performance in the last quarter wasn't luck - it was that their skills had improved. Nonsense! They were lucky.

On the general question: Are "experts" useless for managing investments?

It's hard to say for sure. In general, experts do perform better than random, but not by a huge margin, certainly not by as much as they'd like us to believe. The Wall Street Journal used to do an experiment where they compared dartboard stock selection against human experts, and against passive investment in the Dow Jones Index stocks over a one-year period. The pros won 60% of the time. That's better than chance: the experts knowledge/skills were clearly benefiting them. But: blindly throwing darts at a wall could beat experts 2 out of 5 times!

When you actually do the math and look at the data, it appears that human judgement does have value. Taken over time, human experts do outperform random choices, by a small but significant margin.

What's most interesting is a time-window phenomenon. In most studies, the human performance relative to random choice is directly related to the amount of time that the investment strategy is followed: the longer the timeframe, the better the humans perform. In daily investments, like day-trading, most people don't do any better than random. The performance of day-traders is pretty much in-line with what you'd expect from probability from random choice. Monthly, it's still mostly a wash. But if you look at yearly performance, you start to see a significant difference: humans do typically outperform random choice by a small but definitely margin. If you look at longer time-frames, like 5 or ten years, then you start to see really sizeable differences. The data makes it look like daily fluctuations of the market are chaotic and unpredictable, but that there are long-term trends that we can identify and exploit.

## Define Distance Differently: the P-adic norm

As usual, sorry for the delay. Most of the time when I write this blog, I'm writing about stuff that I know about. I'm learning about the p-adic numbers as I write these posts, so it's a lot more work for me. But I think many of you will be happy to hear about the other reason for the delay: I've been working on a book based on some of my favorite posts from the history of this blog. It's nearly ready! I'll have more news about getting pre-release versions of it later this week!

But now, back to p-adic numbers!

In the last post, we looked a bit at the definition of the p-adic numbers, and how p-adic arithmetic works. But what makes p-adic numbers really interesting and valuable is metrics.

Metrics are one of those ideas that are simultaneously simple and astonishingly complicated. The basic concept of a metric is straightforward: I've got two numbers, and I want to know how far apart they are. But it becomes complicated because it turns out that there are many different ways of defining metrics. We tend to think of metrics in terms of euclidean geometry and distance - the words that I to describe metrics "how far apart" come from our geometric intuition. In math, though, you can't ever just rely on intuition: you need to be able to define things precisely. And precisely defining a metric is difficult. It's also fascinating: you can create the real numbers from the integers and rationals by defining a metric, and the metric will reveal the gaps between the rationals. Completing the metric - filling in those gaps - gives you the real numbers. Or, in fact, if you fill them in differently, the p-adic numbers.

To define just what a metric is, we need to start with fields and norms. A field is an abstract algebraic structure which describes the behavior of numbers. It's an abstract way of talking about the basic structure of numbers with addition and multiplication operations. I've talked about fields before, most recently when I was debunking the crackpottery of E. E. Escultura here.

A norm is a generalization of the concept of absolute value. If you've got a field , then a norm of is a function, the norm of is a function from values in to non-negative numbers.

1. if and only if .

A norm on can be used to define a distance metric between and in as .

For example, the absolute value is clearly a norm over the real numbers, and it defines the euclidean distance between them.

So where do the gaps come from?

You can define a sequence of values in as
for some set of values . There's a special kind of sequence called
a Cauchy sequence, which is a sequence where .

You can show that any Cauchy sequence converges to a real number. But even if every element of a Cauchy sequence is a rational number, it's pretty easy to show that many (in fact, most!) Cauchy sequences do not converge to rational numbers. There's something in between the rational numbers which Cauchy sequences of rational numbers can converge to, but it's not a rational number. When we talk about the gaps in the rational numbers, that's what we mean. (Yes, I'm hand-waving a bit, but getting into the details
would be a distraction, and this is the basic idea!)

When you're playing with number fields, the fundamental choice that you get is just how to fill in those gaps. If you fill them in using a metric based on a Euclidean norm, you get the real numbers. What makes the p-adic numbers is just a different norm, which defines a different metric.

The idea of the p-adic metric is that there's another way of describing the distance between numbers. We're used to thinking about distance measured like a ruler on a numberline, which is what gives us the reals. For the p-adics, we're going to define distance in a different way, based on the structure of numbers. The way that the p-adic metric works is based on how a number is built relative to the prime-number base.

We define the p-adic metric in terms of the p-adic norm exactly the way that we defined Euclidean distance in terms of the absolute value norm. For the p-adic number, we start off with a norm on the integers, and then generalize it. In the P-adic integers, the norm of a number is based around the largest power of the base that's a factor of that number: for an integer , if is the largest power of that's a factor of , then the the p-adic norm of (written ) is . So the more times you multiply a number by the p-adic base, the smaller the p-adic norm of that number is.

The way we apply that to the rationals is to extend the definition of p-factoring: if is our p-adic base, then we can define the p-adic norm of a rational number as:

• For other rational numbers : where:
• If is a natural number, then is the exponent of the largest power of that divides .
• If is a rational number , then .

Another way of saying that is based on a property of rational numbers and primes. For any prime number , you can take any rational number , and represent it as a p-based ratio , where neither nor is divisible by . That representation is unique - there's only one possible set of values for , , and where that's true. In that case, p-adic norm of ,
.

Ok, that's a nice definition, but what on earth does it mean?

Two p-adic numbers and are close together if is divisible by a large power of .

In effect, this is the exact opposite of what we're used to. In the real numbers written out in decimal for as a series of digits, the metric says that the more digits numbers have in common moving from left to right, the closer together they are. So 9999 and 9998 are closer than 9999 and 9988.

But with P-adic numbers, it doesn't work that way. The P-adic numbers are closer together if, moving right to left, they have a common prefix. The distance ends up looking very strange. In 7-adic, the distance between 56666 and 66666 is smaller than the distance between 66665 and 66666!

As strange as it looks, it does make a peculiar kind of sense. The p-adic distance is measuring a valuable and meaningful kind of distance between numbers - their distance in terms of
their relationship to the base prime number p. That leads to a lot of interesting stuff, much of which is, to be honest, well beyond my comprehension! For example, the Wiles proof of Fermat's last theorem uses properties of the P-adic metric!

Without getting into anything as hairy as FLT, there are still ways of seeing why the p-adic metric is valuable. Next post, we'll look at something called Hensel's lemma, which both shows how something like Newton's method for root-finding works in the p-adic numbers, and also shows some nice algebraic properties of root-finding that aren't nearly as clear for the real numbers.

I've been having a pretty rough month. The day after thanksgiving,
I was diagnosed with shingles. Shingles is painful - very, very painful. Just the friction of clothing brushing over the shingles when I move caused so much pain that I spent weeks being as immobile as possible.

Needless to say, this has been extremely boring. When I wasn't fuzzed out on pain-killers, I've been doing some reading lately on something called p-adic numbers. p-adics are a very strange kind of number: They're strange-looking, strange-acting, and strangely useful.

P-adics are an alternative to the real numbers. In fact, in a way, they are the alternative to real numbers.

If you go back to number theory, there's a reason why the rational numbers aren't sufficient, and why we have to extend them to the reals by adding irrational numbers. If you take all of the possible rational numbers, you find that there are gaps - places where we know that there must be a number, and yet, if we limit ourselves to fractions, there isn't anything to fit. For example, we know that there must be some number where if we multiply it by itself, it's equal to 2. It's more than 1 4/10ths, by about 1/100th. But it's more than 141/100ths, but about 4/1,000ths. It's more that 1,414/1,000s, by about 2/10,000ths. No matter how far we go, there's no fraction that's exactly right. But we know that there's a number there! If we look at those gaps carefully, we'll find that most numbers are actually in those gaps! The real numbers and the p-adics are both ways of creating number systems that allow us to define the numbers that fit in to those gaps.

The easiest way to understand p-adic numbers is think about how we
represent real numbers in base-P. In real numbers, we represent them using the powers of the base. So, for example, we'd write 24.31 in base-5 to mean 2*51 + 4*50 + 3*5-1 + 1*5-2. Using our normal notation for real numbers, we fill in the gaps by writing numbers as an integer part, followed by a decimal point, followed by a fractional part. For real numbers that aren't rationals, we say that the fractional part goes on forever. So the square root of two starts 1.4142135624, and continues on and on, forever, without repeating. That gives us the real numbers. In that system, every number can be written using a finite number of digits to the left of the decimal point, and an infinite number of digits to the right.

P-adic numbers are exactly the opposite: every P-adic number has a finite number of digits to the right of the decimal, but it can have an infinite number to the left!

Defining p-adic numbers starts off being pretty similar to how we compute the representation of numbers in a standard numeric base. Take a number for your base, like 10. For a number n in base 10, take n modulo 10. That's the right-most digit of your number. Divide by ten, dropping the fractional part, and take the result modulo 10, and that's the second-rightmost digit. Continue this until there's nothing left.

For example, take the number 222 in base-10. If we wanted to represent that in base-7, we'd do:

1. If we divide 222 by 7, we get 31 with a remainder of 5. So the rightmost digit is 5.
2. We take 31, and divide it by 7, giving us 4, with a remainder of 3. So the second digit is 3.
3. We're left with 4, so the last digit is 4.

So - 222 in base-7 is 435. It's the same in 7-adic. For a particular base B, all positive integers are written the same in both base-B and B-adic. Integer arithmetic that doesn't involve negative numbers is also the same.

There's one reallybig catch, and it's one that would make a lot of cranks (like E. E. Escultura) very happy: for real numbers, the decimal notation (for example) is a just a representation - 35 in base 10 is written differently from 43 in base 8 but they're the same actual number, just written differently. 35 in 10-adic is not the same number as 43 in 8-adic. As we'll see when we get to metrics, they're quite different. Each p-adic base produces a distinct system of p-adic numbers, and you can't convert between them as freely as you can in the conventional reals. Decimal notation and hexidecimal notation are writing numbers in the same number system; 2-adic and 3-adic are different number systems!

The first essential difference between p-adic numbers and the reals comes when you try to do arithmetic.

As I said above, for integers, if you don't
do anything with negative numbers, P-adic arithmetic is the same as real number integer arithmetic. In fact, if you're working with P-adic numbers, there are no negative numbers at all! In a P-adic number system, subtracting 1 from 0 doesn't give you -1. It "rolls over" like an infinite odometer. So for 7-adic, 0-1 = ...666666666! That means that arithmetic gets a bit different. Actually, it's really pretty easy: you just do arithmetic from the right to the left, exactly the way that you're used to.

For addition and subtraction, P-adic works almost like normal real-number arithmetic using decimals. Addition is basically what you know from decimal arithmetic. Just go right to left, adding digits, and carrying to the left.

So, for example, in 5-adic, if you have a number ...33333, and 24, to add them, you'd go through the following steps.

1. 3 + 4 is 7, which is 12 in base-5. So the first digit of the sum is 2, and we carry 1.
2. 3 + 2 is 6, plus the carried one is 6 - so again, 12 in base-5. So the second digit is also 2, and we carry 1.
3. 3 + 0 is 3, plus the carried one is 4, se the third digit is 4.
4. For all the rest of the infinite digits streaming off to the left, it's 3+0=3.

So the sum is ...3333422.

To do subtraction, it's still basically the same as what you're used to from decimal. There's just one simple change: infinite borrowing. In normal subtraction, you can borrow from the position to your left if there's anything to your left to borrow from. For example, in decimal, if you wanted to subtract 9 from 23, you'd borrow 10 from the 2, then subtract 9 from 13, giving you a result of 14. But if you wanted to substract 3-9, you couldn't borrow, because there's nothing to the left to borrow from. In p-adic, you can always borrow. If you're subtracting 3-9 in 10-adic, then you can borrow from the 0 to the left of the 3. Of course, there's nothing there - so it needs to borrow from its left. And so on - giving you an infinite string of 9s. So 3-9 in 10-adic gives you ....999999994.

Let's do a full example: ...33333 - 42 in 5-adic.

1. As always, we start from the right. 3 - 2 = 1, so the first digit is 1.
2. Since 3 is smaller than 4, we need to borrow 1 - so we have 13 base 5, which is 8. 8 - 4 = 4. So
the second digit is 4.
3. For the third digit, we just subtract the borrowed 1, so it's 2.

So the result is ...3333241.

Multiplication and division get even stranger in P-adic. Because we can't have an infinite number of digits to the right of the decimal, p-adic ends up representing fractions using infinite numbers of digits on the right of the decimal. And that means that we get collisions between fractions and negative numbers. (This should start to give you a clue why each p-adic base is really a different number system: the correspondance between roll-over negatives and infinitely long fractions is different in each base.) It's easiest to see how this works by looking at a concrete example.

The fraction 1/3 can't be written as finite-length string in base-5. In 5-adic, that means we can't write it using digits to the right of the decimal point - we would need an infinite number of digits, and we're not allowed to do that. Instead, we need to write it with an infinite number of digits to the left! 1/3 in 5-adic is: ...1313132.

Looks crazy, but it does work: if you do a tabular multiplication, right to left, multiplying ...1313132 by 3 gives you one! Let's work it through:

• Start from the right: the rightmost digit is 2. 3*2 is 6, which is 11 in base 5; so the rightmost digit is 1, and you carry a one.
• The next digit is 3: 3 times 3 is - 9, plus the carried 1, gives 10 - which is 20 in base-5, so the next digit is a 0, and you carry 2.
• The next digit is 1: 3*1 is 3 plus the carried 2 = 5; so 0, carry 1.

And so on - ....13131313132 * 3 = 1, so ....131313132 == 1/3 in 5-adic.

How can we compute the value of 1/3? Just like decimal real numbers: division.

Division in p-adics is actually easy. The trick is that like all of the other arithmetic, it goes from right to left. Suppose we want to divide N by M. To make it easy, we'll talk about the digits of M and N using subscripts. The rightmost digit of a number X is X1; the second-rightmost is X2, etc. The multiplication algorithm is:

1. Start at the rightmost digit of both numbers.
2. Find the smallest number d which, multiplied by M, has as Ni as its rightmost digit.
3. Subtract d*Mi from N.
4. Drop the trailing last digits from N, giving N'.
5. Now divide N' by M, and put d on the right.

Let's walk through 1/3 in 5-adic:

• The rightmost digit of 1 is 1.
• What, multiplied by 3 will have a trailing digit of 1 in base-5? 2*3=6, which is 11 in base-5. So d = 2.
• Now we subtract the "11" from 1 - which is really ...00001. So it becomes ...444440.
• We drop the trailing 0, and N' is ...4444.
• So now we divide ...4444 by 3. What's the smallest number which, multiplied by 3, ends in a 4 in base-5? 3*3=9, which is 14 in base-5. So the next digit of the result is 3.
• Now, we subtract 14 from ...4444. That gives us ...44430. Drop the zero, and it's ...4443
• Next digit is a 1, leaving ...444

Crazy, huh? There is one catch about division: it only really works if the p-base in your p-adic system is a prime number. Otherwise, you get trouble - your p-adic system of numbers isn't a field if p is non-prime.

If you think about it, arithmetic with the P-adics is actually simpler than it is with conventional real numbers. Everything goes right to left. It's all more uniform. In fact, p-adic has been seriously proposed as a number representation for computer hardware, because the hardware is much easier to build when everything can be done uniformly right to left!

There's a lot more wierdness in the p-adics. We haven't even started to talk about metrics and distances in p-adic numbers. That's both where the p-adics get even stranger, and where they actually get useful. That'll be the next post, hopefully within a day or two!

## Everyone should program, or Programming is Hard? Both!

I saw something on twitter a couple of days ago, and I promised to write this blog post about it. As usual, I'm behind on all the stuff I want to do, so it took longer to write than I'd originally planned.

My professional specialty is understanding how people write programs. Programming languages, development environment, code management tools, code collaboration tools, etc., that's my bread and butter.

So, naturally, this ticked me off.

The article starts off by, essentially, arguing that most of the programming tutorials on the web stink. I don't entirely agree with that, but to me, it's not important enough to argue about. But here's where things go off the rails:

But that's only half the problem. Victor thinks that programming itself is broken. It's often said that in order to code well, you have to be able to "think like a computer." To Victor, this is absurdly backwards-- and it's the real reason why programming is seen as fundamentally "hard." Computers are human tools: why can't we control them on our terms, using techniques that come naturally to all of us?

For some reason, so many people have this bizzare idea that programming is this really easy thing that programmers just make difficult out of spite or elitism or clueless or something, I'm not sure what. And as long as I've been in the field, there's been a constant drumbeat from people to say that it's all easy, that programmers just want to make it difficult by forcing you to think like a machine. That what we really need to do is just humanize programming, and it will all be easy and everyone will do it and the world will turn into a perfect computing utopia.

First, the whole "think like a machine" think is a verbal shorthand that attempts to make programming as we do it sound awful. It's not just hard to program, but those damned engineers are claiming that you need to dehumanize yourself to do it!

To be a programmer, you don't need to think like a machine. But you need to understand how machines work. To program successfully, you do need to understand how machines work - because what you're really doing is building a machine!

When you're writing a program, on a theoretical level, what you're doing is designing a machine that performs some mechanical task. That's really what a program is: it's a description of a machine. And what a programming language is, at heart, is a specialized notation for describing a particular kind of machine.

No one will go to an automotive engineer, and tell him that there's something wrong with the way transmissions are designed, because they make you understand how gears work. But that's pretty much exactly the argument that Victor is making.

How hard is it to program? That all depends on what you're tring to do. Here's the thing: The complexity of the machine that you need to build is what determines the complexity of the program. If you're trying to build a really complex machine, then a program describing it is going to be really complex.

Period. There is no way around that. That is the fundamental nature of programming.

In the usual argument, one thing that I constantly see is something along the lines of "programming isn't plumbing: everyone should be able to do it". And my response to that is: of course so. Just like everyone should be able to do their own plumbing.

That sounds like an amazingly stupid thing to say. Especially coming from me: the one time I tried to fix my broken kitchen sink, I did over a thousand dollars worth of damage.

But: plumbing isn't just one thing. It's lots of related but different things:

• There are people who design plumbing systems for managing water distribution and waste disposal for an entire city. That's one aspect of plubing. And that's an incredibly complicated thing to do, and I don't care how smart you are: you're not going to be able to do it well without learning a whole lot about how plumbing works.
• Then there are people who design the plumbing for a single house. That's plumbing, too. That's still hard, and requires a lot of specialized knowledge, most of which is pretty different from the city designer.
• There are people who don't design plumbing, but are able to build the full plumbing system for a house from scratch using plans drawn by a designer. Once again, that's still plumbing. But it's yet another set of knowledge and skills.
• There are people who can come into a house when something isn't working, and without ever seeing the design, and figure out what's wrong, and fix it. (There's a guy in my basement right now, fixing a drainage problem that left my house without hot water, again! He needed to do a lot of work to learn how to do that, and there's no way that I could do it myself.) That's yet another set of skills and knowledge - and it's still plumbing.
• There are non-professional people who can fix leaky pipes, and replace damaged bits. With a bit of work, almost anyone can learn to do it. Still plumbing. But definitely: everyone really should be able to do at least some of this.

• And there are people like me who can use a plumbing snake and a plunger when the toilet clogs. That's still plumbing, but it requires no experience and no training, and absolutely everyone should be able to do it, without question.

All of those things involve plumbing, but they require vastly different amounts and kinds of training and experience.

Programming is exactly the same. There are different kinds of programming, which require different kinds of skills and knowledge. The tools and training methods that we use are vastly different for those different kinds of programming - so different that for many of them, people don't even realize that they are programming. Almost everyone who uses computers does do some amount of programming:

• When someone puts together a presentation in powerpoint, with things that move around, appear, and disappear on your command: that is programming.
• When someone puts formula into a spreadsheet: that is programming.
• When someone builds a website - even a simple one - and use either a set of tools, or CSS and HTML to put the site together: that is programming.
• When someone writes a macro in Word or Excel: that is programming.
• When someone sets up an autoresponder to answer their email while they're on vacation: that is programming.

People like Victor completely disregard those things as programming, and then gripe about how all programming is supercomplexmagicalsymbolic gobbledygook. Most people do write programs without knowing about it, precisely because they're doing it with tools that present the programming task as something that's so natural to them that they don't even recognize that they are programming.

But on the other hand, the idea that you should be able to program without understanding the machine you're using or the machine that you're building: that's also pretty silly.

When you get beyond the surface, and start to get to doing more complex tasks, programming - like any skill - gets a lot harder. You can't be a plumber without understanding how pipe connections work, what the properties of the different pipe materials are, and how things flow through them. You can't be a programmer without understanding something about the machine. The more complicated the kind of programming task you want to do, the more you need to understand.

Someone who does Powerpoint presentations doesn't need to know very much about the computer. Someone who wants to write spreadsheet macros needs to understand something about how the computer processes numbers, what happens to errors in calculations that use floating point, etc. Someone who wants to build an application like Word needs to know a whole lot about how a single computer works, including details like how the computer displays things to people. Someone who wants to build Google doesn't need to know how computers render text clearly on the screen, but they do need to know how computers work, and also how networks and communications work.

To be clear, I don't think that Victor is being dishonest. But the way that he presents things often does come off as dishonest, which makes it all the worse. To give one demonstration, he presents a comparison of how we teach programming to cooking. In it, he talks about how we'd teach people to make a soufflee. He shows a picture of raw ingredients on one side, and a fully baked soufflee on the other, and says, essentially: "This is how we teach people to program. We give them the raw ingredients, and say fool around with them until you get the soufflee."

The thing is: that's exactly how we really teach people to cook - taken far out of context. If we want them to be able to prepare exactly one recipe, then we give them complete, detailed, step-by-step instructions. But once they know the basics, we don't do that anymore. We encourage them to start fooling around. "Yeah, that soufflee is great. But what would happen if I steeped some cardamom in the cream? What if I left out the vanilla? Would it turn out as good? Would that be better?" In fact, if you never do that experimentation, you'll probably never learn to make a truly great soufflee! Because the ingredients are never exactly the same, and the way that it turns out is going to depend on the vagaries of your oven, the weather, the particular batch of eggs that you're using, the amount of gluten in the flour, etc.

To write complicated programs is complicated. To write programs that manipulate symbolic data, you need to understand how the data symbolizes things. To write a computer that manipulates numbers, you need to understand how the numbers work, and how the computer represents them. To build a machine, you need to understand the machine that you're building. It's that simple.

## DNS and Yesterday's Outage

As I'm sure at least some of you noticed, Scientopia - along with a huge number of other sites - was unreachable for most of yesterday. I say "unreachable" instead of "down" because the site was up. But if you wanted to get to it, and you didn't know it's numeric address, you couldn't reach it. That's because your computer uses a service called DNS to find things.

So what's this DNS thing? And what brought down so much of the net yesterday?

DNS stands for "domain name service". What it does is provide a bridge between the way that you think of the name of a website, and the way that your computer actually talks to the website. You think of a website by a URL, like http://scientopia.org/blogs/goodmath. That URL basically says "the thing named ''blogs/goodmath'' on a server named ''scientopia.org'', which you can access using a system called ''http''.".

The catch is that your computer can't just send a message to something named "scientopia.org". Scientopia is a server in a datacenter somewhere, and to send a message to it, it needs to know a numeric address. Numeric addresses are usually written as a set of four numbers, each ranging from 0 to 255. For example, scientopia.org is currently 184.106.221.182.

You don't want to have to remember those four meaningless numbers in order to connect to scientopia. But there's more to it than just remembering: those numbers can change. The physical computer that scientopia works on has been moved around several times. It lives in a datacenter somewhere, and as that datacenter has grown, and hardware has needed to be either expanded or replaced, the numeric address has changed. Even though I'm the administrator of the site, there've been a couple of times where it's changed, and I can't pinpoint exactly when!

The reason that all of that works is DNS. When you tell your computer to connect to another computer by name, it uses DNS to ask "What's the numeric address for this name?", and once it gets that numeric address, it uses that to connect to the other computer.

When I first got access to the internet in college, it was shortly before DNS first got deployed. At the time, there was one canonical computer somewhere in California. Every other computer on the (then) arpanet would contact that computer every night, and download a file named "hosts.txt". hosts.txt contained a list of every computer on the internet, and its numeric address. That was clearly not working any more, and DNS was designed as the replacement. When they decided to replace hosts.txt, they wanted to replace it with something much more resilient, and something that didn't need to have all of the data in one place, and which didn't rely on any single machine in order to work.

The result was DNS. It's a fascinating system. It's the first real distributed system that I ever learned about, and I still think it's really interesting. I'm not going to go into all of the details: distributed systems always wind up with tons of kinks, to deal with the complexity of decentralized information in the real world. But I can explain the basics pretty easily.

The basic way that DNS works is really simple. You take a domain name. For example, my personal machine is named "sunwukong", and my family network is "phouka.net" - so the hostname that I'm typing this on right now is "sunwukong.phouka.net". To look it up, what happens is you take the host name, and split it at the dots. So for the example, that's ["sun-wukong", "phouka", "net"].

You start with the last element at the list, which is called the top-level-domain, and you contact a special server called the root nameserver and ask "who do I ask for addresses ending with this tld? ("Who do I ask for '.net'?") It sends a response to you, giving you the address of another nameserver. Then you contact that, and ask it "who do I ask for the next element"? ("Who do I ask for phouka?") Finally, when you get to the last one, the address that it gives you is the address of the machine you're looking for, instead of the name of another DNS server. At each level of the heirarchy, there's a server or set of servers that's registered with the level above to be the canonical server for some set of names. The ".com" servers are registered with the root level servers. The next level of domains is registered with the TLD servers. And so on.

So, to look up sunwukong.phouka.net, what happens is:

1. You send a request to the root nameserver, and ask for the server for ".net"?
2. The root server responds, giving you the ".net" server.
3. You send a request to the .net server address that you just got, asking "Who can tell
4. The .net server responds, and gives you an address for a DNS server that can tell
you where things are in phouka.net.

This distributes the information nicely. You can have lots of root name servers - all they need to be able to do is find DNS servers for the list of top-level domains. And there can be - and are - thousands and thousands of nameservers for addresses in those top-level domains. And then each administrator of a private network can have a top-level nameserver for the computers that they directly manage. In this mess, any nameserver at any level can disappear, and all that will be affected are the names that it manages.

The problem is, there's no one place to get information; and more importantly, every time you need to talk to another computer, you need to go through this elaborate sequence of steps.

To get around that last issue, you've got something called caches, on multiple levels. A cache is a copy of the canonical data, which gets kept around for some period of time. In the DNS system, you don't need to talk to the canonical DNS server for a domain if someone has the data they need in a cache. You can (and generally do) talk to cacheing DNS servers. With a cacheing DNS server, you tell it the whole domain name that you want to look up, and it does the rest of the lookup process, and gives you the address you're looking for. Every time it looks something up, it remembers what it did, so that it doesn't need to ask again. So when you look up a ".com" address, your cacheing service will remember who it can ask for ".com" addresses. So most of the time, you only talk to one server, which does the hard work, and does it in a smart way so that it doesn't need to unnecessarily repeat requests.

So now, we can get to what happened yesterday. GoDaddy, one of the major american DNS registries, went down. Exactly why it went down is not clear - a group of jerks claimed to have done a denial-of-service attack against them; but the company claims to have just had a configuration glitch. Honestly, I'm inclined to believe that it was the hackers, but I don't know for sure.

But - the canonical server for scientopia.org was gone. So if you needed to look up scientopia.org and it wasn't in your cache, then your cacheing nameserver would go to the .org server, and ask for scientopia; the .org nameserver would return the address of GoDaddy's nameserver, and that wouldn't respond. So you'd never reach us.

That's also why some people were able to reach the site, and others weren't. If your cacheing server had cached the address for scientopia, then you'd get the server address, and since the server was up the whole time, you'd connect right up, and everything would work.

## Nulls

So, my post on monads apparently set of a bit of a firestorm over my comments about avoiding null pointer exceptions. To show you what I mean, here's a link to one of the more polite and reasonable posts.

Ok. So. Let me expand a bit.

The reason that I think it’s so great that I don’t get NPEs when I use an option with something like Option isn’t because it makes me a super-programmer who's better than the lowly slime who deal with NPEs. It's because it changes how that I write code, in a way that helps me avoid making mistakes - mistakes that I make because I'm just a fallible idiot of a human.

There are two fundamental things about an option type, like what we have in Scala, that make a huge difference.

First, it narrows the field of errors. When I'm programming in Java, any call that returns a pointer could return a null. The language makes no distinction between a function/expression that could return a null, and one that can't. That means that when I get an NPE, the source of that null pointer could be anything in the dynamic slice leading to the error. With an option type, I've got two kinds of functions: functions that always return a non-null value, and functions that sometimes return a non-null value, and sometimes return a None. That's incredibly valuable.

Second, it forces me to explicitly deal with the None case. In Java, programmers constantly build code without null checks, because they know that a function won't return null. And then it does, and ker-splat. With an option type, I have no choice: I have to explicitly deal with the potential error case. Sure, I can forcibly code around it - in Scala, I can use Option.get, which will turn into an error analagous to an NPE. But it forces me to make that choice, and make it explicitly.

Even if I'm going the stupid, brute-force route and assuming that I know, without fail, that a function is going to return a non-null value... Consider an example:

Java: :
T g = f.doSomething()
g.doSomethingElse()

Scala:
val g: Option[T] = f.doSomething()
g.get.doSomethingElse()

The scala case has to explicitly deal with the fact that it's dealing with a potentially empty value, and using a statement that asserts the non-emptiness.

But in reality, if you're a decent programmer, you never use .get to directly access an option. (The only exception is in cases where you call the .get in a context dominated by a non-empty test; but even then, it's best to not, to avoid errors when the surrounding code is modified.) In real code, you pretty much always explicitly de-option a value using a function like getOrElse:

val f: User = getUser("markcc").getOrElse(new User("markcc"))

As I hope it has become plain, the point of avoiding NPEs through option-like type structures isn't that somehow it makes the entire category of unexpected result value disappear. It's that it changes the way that you code to distinguish where those errors can occur, and to force you to deal with them.

I think that ultimately, things like this are really just manifestations of the good-old static vs dynamic type wars. Type errors in a dynamically typed language are really just unexpected value errors. Strong typing doesn't stop you from making those errors. It just takes a bunch of common cases of those errors, and converts them from a run-time error to a compile-time error. Whether you want them to be run-time or compile-time depends on the project your working on, on your development team, and on your personal preferences.

I find in practice that I get many fewer errors by being forced to explicitly declare when a value might be null/None, and by being required to explicitly deal with the null/None case when it might occur. I've spent much less time debugging that kind of error in my year at foursquare than in the 15 years of professional development that I did before. That's not because I magically became a better programmer a year ago when I joined foursquare. It's because I'm using a better tool that helps me avoid mistakes.

Sorry things have been so slow around here. I know I keep promising that I'm going to post more frequently, but it's hard. Life as an engineer at a startup is exhausting. There's so much work to do! And the work is so fun - it's easy to let it eat up all of your time.

Anyway... last good-math post 'round these parts was about monoids and programming. Today, I'm going to talk about monads and programming!

If you recall, monoids are an algebraic/categorical construct that, when implemented in a programming language, captures the abstract notion of foldability. For example, if you've got a list of numbers, you can fold that list down to a single number using addition. Folding collections of values is something that comes up in a lot of programming problems - capturing that concept with a programming construct allows you to write code that exploits the general concept of foldability in many different contexts.

Monads are a construct that have become incredibly important in functional programming, and they're very, very poorly understood by most people. That's a shame, because the real concept is actually simple: a monad is all about the concept of sequencing. A monad is, basically, a container that you can wrap something in. Once it's wrapped, you can form a sequence of transformations on it. The result of each step is the input to the next. That's really what it's all about. And when you express it that way, you can begin to see why it's such an important concept.

I think that people are confused by monads for two reasons:

1. Monads are almost always described in very, very abstract terms. I'll also get into the abstract details, but I'll start by elaborating on the simple description I gave above.
2. Monads in Haskell, which are where most people are introduced to them, are very confusing. The basic monad operations are swamped with tons of funny symbols, and the basic monad operations are named in incredibly misleading ways. ("return" does almost the exact opposite of what you expect return to do!)

In programming terms, what's a monad?

Basically, a monadic computation consists of three pieces:

1. A monadic type, M which is a parametric type wrapper that can wrap a value of any type.
2. An operation which can wrap a value in M.
3. An operation which takes a function that transforms a value wraped in M into another value (possibly with a different type) wrapped in M.

Whenever you describe something very abstractly like this, it can seem rather esoteric. But this is just a slightly more formal way of saying what I said up above: it's a wrapper for a series of transformations on the wrapped value.

Let me give you an example. At foursquare, we do all of our server programming in Scala. In a year at foursquare, I've seen exactly one null pointer exception. That's amazing - NPEs are ridiculously common in Java programming. But in Scala at foursquare, we don't allow nulls to be used at all. If you have a value which could either be an instance of A, or no value, we use an option type. An Option[T] can be either Some(t: T) or None.

So far, this is nice, but not really that different from a null. The main difference is that it allows you to say, in the type system, whether or not a given value might be null. But: Option is a monad. In Scala, that means that I can use map on it. (map is one of the transformation functions!)

val t: Option[Int] = ...
val u: Option[String] = t.map( _ + 2 ).map(_.toString)

What this does is: if t is Some(x), then it adds two to it, and returns Some(x+2); then it takes the result of the first map, and converts it toa string, returning an Option[String]. If t is None, then running map on it always returns None. So I can write code which takes care of the null case, without having to write out any conditional tests of nullness - because optionality is a monad.

In a good implementation of a monad, I can do a bit more than that. If I've got a Monad[T], I can use a map-like operation with a function that takes a T and returns a Monad[T].

For an example, we can look at lists - because List is a monad:

val l: List[Int] = List(1, 2, 3)
l.flatMap({ e => List( (e, e), (e+1, e+2) )  })
res0: List[(Int, Int)] = List((1,1), (2,3), (2,2), (3,4), (3,3), (4,5))

The monad map operation does a flatten on the map steps. That means a lot of things. You can see one in the rather silly example above.

You can take values, and wrap them as a list. THen you can perform a series of operations on those elements of a list - sequencing over the elements of the list. Each operation, in turn, returns a list; the result of the monadic computation is a single list, concatenating, in order, the lists returned by each element. In Scala, the flatMap operation captures the monadic concept: basically, if you can flatmap something, it's a monad.

Let's look at it a bit more specifically.

2. A function to wrap a value into the monad: the constructor function from List def apply[T](value: T): List[T]
3. The map operation: def flatMap[T, U](op: T => List[U]): List[U].

(In the original version of this post, the I put the wrong type in flatMap in the list above. In the explanation demonstrating flatMap, the type is correct. Thanks to John Armstrong for catching that!)

You can build monads around just about any kind of type wrapper where it makes sense to map over the values that it wraps: collections, like lists, maps, and options. Various kinds of state - variable environments (where the wrapped values are, essentially, functions from identifiers to values), or IO state. And plenty of other things. Anything where you perform a sequence of operations over a wrapped value, it's a monad.

Now that we have some understanding of what this thing we're talking about it, what is it in mathematical terms? For that, we turn to category theory.

Fundamentally, in category theory a monad is a category with a particular kind of structure. It's a category with one object. That category has a collection of arrows which (obviously) are from the single object to itself. That one-object category has a functor from the category to itself. (As a reminder, a functor is an arrow between categories in the category of (small) categories.)

The first trick to the monad, in terms of theory, is that it's fundamentally about the functor: since the functor maps from a category to the same category, so you can almost ignore the category; it's implicit in the definition of the functor. So we can almost treat the monad as if it were just the functor - that is, a kind of transition function.

The other big trick is closely related to that. For the programming language application of monads, we can think of the single object in the category as the set of all possible states. So we have a category object, which is essentially the collection of all possible states; and there are arrows between the states representing possible state transitions. So the monad's functor is really just a mapping from arrows to different arrows - which basically represents the way that changing the state causes a change in the possible transitions to other states.

So what a monad gives us, in terms of category theory, in a conceptual framework that captures the concept of a state transition system, in terms of transition functions that invisibly carry a state. When that's translated into programming languages, that becomes a value that implicitly takes an input state, possibly updates it, and returns an output state. Sound familiar?

Let's take a moment and get formal. As usual for category theory, first there are some preliminary definitions.

1. Given a category, C, 1C is the identity functor from C to C.
2. Given a category C with a functor T : CC, T2 = T º T.
3. Given a functor T, 1T : TT is the natural transformation from T to T.

Now, with that out of the way, we can give the complete formal definition of a monad. Given a category C, a monad on C is a triple: (T:CC, η:1CT, μ:T2T), where T is a functor, and η and μ are natural transformations. The members of the triple must make the following two diagrams commute.

Commutativity of composition with μ

Commutativity of composition with η

What these two diagrams mean is that successive applications of the state-transition functor over C behave associatively - that any sequence of composing monadic functors will result in a functor with full monadic structure; and that the monadic structure will always preserve. Together, these mean that any sequence of operations (that is, applications of the monad functor) are themselves monad functors - so the building a sequence of monadic state transformers is guaranteed to behave as a proper monadic state transition - so what happens inside of the monadic functor is fine - to the rest of the universe, the difference between a sequence and a single simple operation is indistinguishable: the state will be consistently passed from application to application with the correct chaining behavior, and to the outside world, the entire monadic chain looks like like a single atomic monadic operation.

Now, what does this mean in terms of programming? Each element of a monadic sequence in Haskell is an instantiation of the monadic functor - that is, it's an arrow between states - a function, not a simple value - which is the basic trick to monads. They look like a sequence of statements; in fact, each statement in a monad is actually a function from state to state. And it looks like we're writing sequence code - when what we're actually doing is writing function compositions - so that when we're done writing a monadic sequence, what we've actually done is written a function definition in terms of a sequence of function compositions.

Understanding that, we can now clearly understand why we need the return function to use a non-monad expression inside of a monadic sequence - because each step in the sequence needs to be an instance of the monadic functor; an expression that isn't an instance of the monadic functor couldn't be composed with the functions in the sequence. The return function is really nothing but a function that combines a non-monadic expression with the id functor.

In light of this, let's go back and look at the definition of Monad in the Haskell standard prelude.

class  Functor f  where
fmap              :: (a -> b) -> f a -> f b

(>>=)  :: m a -> (a -> m b) -> m b
(>>)   :: m a -> m b -> m b
return :: a -> m a
fail   :: String -> m a

-- Minimal complete definition:
--      (>>=), return
m >> k  =  m >>= \_ -> k
fail s  = error s

The declaration of monad is connected with the definition of functor - if you look, you can see the connection. The fundamental operation of Monad is ">>=" - the chaining operation, which is basically the haskell version of the map operation, which is type m a -> (a -> m b) -> m b is deeply connected with the fmap operation from Functor's fmap operation - the a in m a is generally going to be a type which can be a Functor. (Remember what I said about haskell and monads? I really prefer map and flatMap to >> and >>=).

So the value type wrapped in the monad is a functor - in fact, the functor from the category definition! And the ">>=" operation is just the functor composition operation from the monad definition.

A proper implementation of a monad needs to follow some fundamental rules - the rules are, basically, just Haskell translations of the structure-preserving rules about functors and natural transformations in the category-theoretic monad. There are two groups of laws - laws about the Functor class, which should hold for the transition function wrapped in the monad class; and laws about the monadic operations in the Monad class. One important thing to realize about the functor and monad laws is that they are not enforced - in fact, cannot be enforced! - but monad-based code using monad implementations that do not follow them may not work correctly. (A compile-time method for correctly verifying the enforcement of these rules can be shown to be equivalent to the halting problem.)

There are two simple laws for Functor, and it's pretty obvious why they're fundamentally just strucure-preservation requirements. The functor class only has one operation, called fmap, and the two functor laws are about how it must behave.

1. fmap id = id
(Mapping ID over any structured sequence results in an unmodified sequence)
2. fmap (f . g) = (fmap f) . (fmap g)
("." is the function composition operation; this just says that fmap preserves the structure to ensure that that mapping is associative with composition.)

The monad laws are a bit harder, but not much. The mainly govern how monadic operations interact with non-monadic operations in terms of the "return" and ">>=" operations of the Monad class.

1. return x >>= f = f x
(injecting a value into the monad is basically the same as passing it as a parameter down the chain - return is really just the identity functor passing its result on to the next step. I hate the use of "return". In a state functor, in exactly the right context, it does sort-of look like a return statement in an imperative language. But in pretty much all real code, return is the function that wraps a value into the monad.)
2. f >>= return = f
(If you don't specify a value for a return, it's the same as just returning the result of the previous step in the sequence - again, return is just identity, so passing something into return shouldn't affect it.)
3. seq >>= return . f = fmap f seq
(composing return with a function is equivalent to invoking that function on the result of the monad sequence to that point, and wrapping the result in the monad - in other words, it's just composition with identity.)
4. seq >>= (\x -> f x >>= g) = (seq >>= f) >>= g
(Your implementation of ">>=" needs to be semantically equivalent to the usual translation; that is, it must behave like a functor composition.)

## The Meaning of the Higgs

This isn't exactly my area of expertise, but I've gotten requests by both email and twitter to try to explain yesterday's news about the Higgs' boson.

The questions.

• What is this Higgs' boson thing?
• How did they find it?

• What does the five sigma stuff mean?
• Why do they talk about it as a "Higgs'-like particle"?

So, first things first. What is a Higgs' boson?

When things in the universe interact, they usually don't actually interacts by touching each other directly. They interact through forces and fields. What that means is a bit tricky. I can define it mathematically, but it won't do a bit of good for intuition. But the basic idea is that space itself has some properties. A point in space, even when it's completely empty, it has some properties.

Outside of empty space, we have particles of various types. Those particles interact with each other, and with space itself. Those interactions are what end up producing the universe we see and live in.

Fields are, essentially, a property of space. A field is, at its simplest, a kind of property of space that is defined at every point in space.

When particles interact with fields, they can end up exchanging energy. They do that through a particular kind of particle, called an exchange particle. For example, think about an electromagnetic field. An electron orbits an atomic nucleus, due to forces created by the electromagnetic fields of the electrons and protons. When an electron moves to a lower-energy orbital, it produces a photon; when it absorbs a photon, it can jump to a higher orbital. The photon is the exchange particle for the electromagnetic field. Exchange particles are instances of a kind of particle called a boson.

So.. one of the really big mysteries of physics is: why do some particles have mass, and other particles don't? That is, some particles, like protons, have masses. Others, like photons, don't. Why is that?

It's quite a big mystery. Based on our best model - called the standard model - we can predict all of the basic kinds of particles, and what their masses should be. But we didn't have a clue about why there's mass at all!

So, following the usual pattern in particle physics, we predict that there's a field. Particles moving through that field, if they interact with the field, experience a sort of drag. That drag is mass. So - just like particles like neutrinos aren't affected by electromagnetic fields, some particles like photons won't have mass because they don't interact with the field that produces mass. We call that field the Higgs' field.

(The previous paragraph formerly contained an error. The higgs field produces mass, not gravity. Just a stupid typo; my fingers got ahead of my brain.)

So physicists proposed the existence of the Higgs' field. But how could they test it?

It's a field. Fields have exchange particles. What would the exchange particles of the Higgs' field be? Exchange particles are bosons, so this one is, naturally, called a Higgs' boson. So if the Higgs' field exists, then it will have an exchange particle. If the standard model of physics is right, then we can use it to predict the mass that that boson must have.

So - if we can find a particle whose mass matches what we predict, and it has the correct properties for a mass-field exchange particle, then we can infer that the Higgs' field is real, and is the cause of mass.

How did they find the Higgs' boson?

We have a pretty good idea of what the mass of the Higgs' boson must be. We can describe that mass in terms of a quantity of energy. (See the infamous Einstein equation!) If we can take particles that we can easily see and manipulate, and we can accelerate them up to super-super high speed, and collide them together. If the energy of a collision matches the mass of a particle, it can create that kind of particle. So we slam together, say, two protons at high enough energy, we'll get a Higgs' boson.

But things are never quite that easy. There are a bunch of problems. First, the kind of collision that can produce a Higgs' doesn't always produce one. It can produce a variety of results, depending on the specifics of the collision as well as purely random factors. Second, it produces a lot more than just a Higgs'. We're talking about an extremely complex, extremely high energy collision, with a ton of complex results. And third, the Higgs' boson isn't particularly stable. It doesn't really like to exist. So like many unstable things in particle physics, it decays, producing other particles. And many of those particles are themselves unstable, and decay into other particles. What we can observe is the last products of the collision, several steps back from the Higgs'. But we know what kind of things the Higgs' can decay into, and what they can decay into, etc.

So, we slam these things together a couple of thousand, or a couple of million times. And we look at the results. We look at all of the results of all of the collisions. And we specifically look for a bump: if there's really a specific collision energy level at which Higgs' bosons are produced, then we'll see a bump in the number of Higgs' decay products that are produced by collisions at that energy. And what the announcement yesterday showed is that that's exactly what they saw: a bump in the observations inside the expected range of values of the mass of a Higgs' boson.

What does five sigmas mean?

Whenever we're making observations of a complex phenomenon, there are all sorts of things that can confound our observations. There are measurement errors, calculation errors, random noise, among many other things. So we can't just look at one, or two, or ten data points. We need to look at a lot of data. And when you've got a lot of data, there's always a chance that you'll see what appears to be a pattern in the data, which is really just the product of random noise

For example, there are some people who've won the lottery multiple times. That seems crazy - it's so unlikely to win once! To win multiple times seems crazy. But probabilistically, if you keep observing lotteries, you'll find repeat winners. Or you'll find apparent patterns in the winning numbers, even though they're being drawn randomly.

We don't want to be fooled by statistics. So we create standards. We can compute how unlikely a given pattern would be, if it were occuring do to pure randomness. We can't even absolutely rule out randomness, but for any degree of certainty, we can determine just how unlikely a given observation is to be due to randomness.

We describe that in terms of standard deviations. An observation of a phenomenon has a roughly 68% chance of being measured within one standard deviation (one sigma) of the actual value, or a roughly 32% chance of being observed outside of one sigma. At two sigmas, there's only a roughly 5% chance of being outside. At three sigmas out, you're down to a roughly 0.3% chance of randomly observing an event outside. The odds continue that way.

So, the Higgs' hunters computed probabilities of observing the data that they found if they assumed that there was no Higgs'. The amount of data that they found exceeded 5 sigmas away from what you would expect by random chance if there was no Higgs'. That translates as damned unlikely. The ultimate choice of 5 sigmas is arbitrary, but it's accepted as a threshold for certainty in particle physics. At five sigmas, we realistically rule out random chance.

Why do they keep saying Higgs'-like particle?

Remember up above, I said: "So - if we can find a particle whose mass matches what we predict, and it has the correct properties for a mass-field exchange particle, then we can infer that the Higgs' field is real, and is the cause of mass"? There are two thing we need to show to conclude that we've found the mediator of the Higgs' field. There needs to be a particle with the right mass, and it needs to have the properties of a mass-mediator. What we've got right now is an observation that yes, there is a particle at the mass we'd expect for a Higgs'. But we don't have observations yet of any properties of the particle other than its mass. Assuming the standard model is right, the odds of finding another particle with that mass is damned unlikely, but the standard model could be wrong. It's not likely at this point, but people like to be careful. So at this point, to be precise, we've observed a Higgs'-like particle - a particle that according to all of the observations we've made so far appears to be a Higgs'; but until we observe some properties other than mass, we can't be absolutely certain that it's a Higgs'.

## Turing Machines - what they are, what they aren't.

It's the anniversary of the birth of Alan Turing. So there's a ton of people writing commemorative stories. And naturally, a ton of those stories get it wrong. And they get it wrong in a very sad way.

Of course, when you talk about Turing, you talk about Turing machines. Despite the fact that Turing did lots of stuff besides just that machine, it's always the machine that people focus on. Partly that's laziness - the machine has his name after all, so it's one of the first things you find if you Google Turing. It's also the easiest thing to write about.

What do they say about the Turing machine? It' "the simplest computing device". It's the "basis for modern computers". It's "the theoretical model for the microchips in your laptop". It's the "mathematical description of your computer". None of those things are true. And they all both over and under-state what Turing really did. In terms of modern computers, the Turing machine's contribution to the design of real computers is negligible if not non-existent. But his contrubition to our understanding of what computers can do, and our understanding of how mathematics really works - they're far, far more significant than the architecture of any single machine.

The Turing machine isn't a model of real computers. The computer that you're using to read this has absolutely nothing to do with the Turing machine. As a real device, the turing machine is absolutely terrible.

The turing machine is a mathematical model not of computers, but of computation. That's a really important distinction. The Turing machine is an easy to understand model of a computing device. It's definitely not the simplest model. There are simpler computing devices (for example, I think that the rule 111 machine is simpler) - but their simplicitly makes them harder to understand.

The reason that the Turing machine is so important comes down to two important facts. First, which machine you use to talk about computation doesn't matter. There's a limit to what a mechanical device can do. There are lots of machines out there - but ultimately, no machine can go past the limit. Any machine that can reach that limit is, for the purposes of the theory of computation, pretty much the same. When we talk about studying computation, what we're talking about is the set of things that can be done by a machine - not by a specific machine, but by any machine. The specific choice of machine isn't important. And that's the point: computation is computation. That's what Turing figured out.

The Turing machine is a brilliant creation. It's a simple machine. It's really easy to understand. And it's easy to tweak - that is, it's easy to do experiments where you can modify the machine, and see what effect it has.

So let's take a step back, and see: what is a Turing machine?

The Turing machine is a very simple kind of theoretical computing device. In fact, it's almost downright trivial. But according to everything that we know and understand about computation, this trivial device is capable of any computation that can be performed by any other computing device.

The basic idea of the Turing machine is very simple. It's a machine that runs on top of a tape, which is made up of a long series of little cells, each of which has a single character written on it. The machine is a read/write head that moves over the tape, and which can store a little bit of information. Each step, the machine looks at the symbol on the cell under the tape head, and based on what it sees there, and whatever little bit of information it has stored, it decides what to do. The things that it can do are change the information it has store, write a new symbol onto the current tape cell, and move one cell left or right.

That's really it. People who like to make computing sound impressive often have very complicated explanations of it - but really, that's all there is to it. The point of it was to be simple - and simple it certainly is. And yet, it can do anything that's computable.

To really understand how that trivial machine can do computations, it helps to be a bit formal. In formal terms, we talk about a turing machine as a tuple: (S, s0, A, T), where:

• S is a finite list of possible states that the machine can be in. The state is the information that the machine can store in the head to make decisions. It's a very limited amount of information; you can think of it as either a specific list of states, or a small set of small numbers. For most Turing machines, we'll use it as a specific list of states. (You'll see what I mean in a minute.)
• s0S is the initial state - the state that the machine will be in when it starts a computation.
• A is the machine's alphabet, which is the set of symbols which can appear on the machine's tape.
• T is the machines transition function. This is the real heart of the machine, where the computation is defined. It's a function from the machine state and the alphabet character on the current tape cell to the action that the machine should take. The action is a triple consisting of a new value for the machine's state, a character to write onto the current tape cell, and a direction to move the tape head - either left or right.

So, for example, let's look at a simple machine. This is one of the classic examples: a Turing machine that does subtraction using unary numbers. A unary number "N" is written as a series of N 1s. For the program, we'll give the machine an input in the format "N-M=" written onto the tape; after running the machine, the tape will contain the value of M subtracted from N. So, for example, we could use "1111-11=" as an input; the output would be "11".

The alphabet is the characters "1", " " (blank space), "-" (minus sign), and "=" (equal sign). The machine has four states: {"scanright", "eraseone", "subone", "skip"}. It starts in the state "scanright". It's transition function is given by the following table:

FromState Symbol ToState WriteChar Dir
scanright space scanright space right
scanright 1 scanright 1 right
scanright minus scanright minus right
scanright equal eraseone space left
eraseone 1 subone equal left
eraseone minus HALT space n/a
subone 1 subone 1 left
subone minus skip minus left
skip space skip space left
skip 1 scanright space right

What this machine does is move to the right until it sees the equal sign; it erases the equal sign and moves to the left, erases one digit off of the second number, and replaces it with the equal sign (so the second number is reduced by one, and the equal sign is moved over one position). Then it scans back to the minus sign (which separates the two numbers), and erases one digit of the first number, and then switches back to scanning to the right for the equal. So one at a time, it erases one digit from each of the two numbers. When it reaches the equal sign, and starts going back to erase a digit from the second number, and hits the "-" before it finds a digit, that means its done, so it stops. Let's look at a simple execution trace. In the trace, I'll write the machine state, followed by a colon, followed by the tape contents surrounded by "[]", with the current tape cell surrounded by "{}".

scanright:  [ {1}1111111-111= ]"
scanright:  [ 1{1}111111-111= ]"
scanright:  [ 11{1}11111-111= ]"
scanright:  [ 111{1}1111-111= ]"
scanright:  [ 1111{1}111-111= ]"
scanright:  [ 11111{1}11-111= ]"
scanright:  [ 111111{1}1-111= ]"
scanright:  [ 1111111{1}-111= ]"
scanright:  [ 11111111{-}111= ]"
scanright:  [ 11111111-{1}11= ]"
scanright:  [ 11111111-1{1}1= ]"
scanright:  [ 11111111-11{1}= ]"
scanright:  [ 11111111-111{=} ]"
eraseone :  [ 11111111-11{1}  ]"
subone   :  [ 11111111-1{1}=  ]"
subone   :  [ 11111111-{1}1=  ]"
subone   :  [ 11111111{-}11=  ]"
skip     :  [ 1111111{1}-11=  ]"
scanright:  [ 1111111 {-}11=  ]"
scanright:  [ 1111111 -{1}1=  ]"
scanright:  [ 1111111 -1{1}=  ]"
scanright:  [ 1111111 -11{=}  ]"
eraseone :  [ 1111111 -1{1}   ]"
subone   :  [ 1111111 -{1}=   ]"
subone   :  [ 1111111 {-}1=   ]"
skip     :  [ 1111111{ }-1=   ]"
skip     :  [ 111111{1} -1=   ]"
scanright:  [ 111111 { }-1=   ]"
scanright:  [ 111111  {-}1=   ]"
scanright:  [ 111111  -{1}=   ]"
scanright:  [ 111111  -1{=}   ]"
eraseone :  [ 111111  -{1}    ]"
subone   :  [ 111111  {-}=    ]"
skip     :  [ 111111 { }-=    ]"
skip     :  [ 111111{ } -=    ]"
skip     :  [ 11111{1}  -=    ]"
scanright:  [ 11111 { } -=    ]"
scanright:  [ 11111  { }-=    ]"
scanright:  [ 11111   {-}=    ]"
scanright:  [ 11111   -{=}    ]"
eraseone :  [ 11111   {-}     ]"
Halt:       [ 11111  { }-     ]"

See, it works!

One really important thing to understand here is that we do not have a program. What we just did was define a Turing machine that does subtraction. The machine does not take any instructions: the states and the transition function are an intrinsic part of the machine. So the only thing this machine can do is to subtract.

The real genius of Turing wasn't just defining this simple computing machine. It was realizing the concept of the program: you could design a Turing machine whose input tape contained a description of a Turing machine - that is a program - followed by an input to the program. This single machine - the Universal Turing machine - could simulate any other Turing machine; one machine which could perform any computation.

Turing machines are a lot of fun to play with. Figuring out how to do things can be fascinating. Figuring out how to define a Universal turing machine's transition function is an amazing thing to do; it's astonishing how simple the universal machine can be!

As I said earlier - you can't make a mechanical computing device that does anything that a Turing machine can't do. One of the beauties of the Turing machine is that it lets you explore that. You can try adding and removing features to the basic machine, and see what happens.

For example: if you can do lots of great stuff with a Turing machine with one tape, what if you had a two-tape turing machine? That is, take the basic turing machine, and say that it's got two tapes, each with a read/write head. Each state transition rule on this machine depends on the pair of values found on the two tapes. For now, we'll say that the tapes move together - that is, the transition rule says "move the heads right" or "move the heads left".

Seems like this should represent a real increase in power, right? No. Take a single-tape turing machine. Take the alphabet for tape one, and call it A1; take the alphabet for tape 2, and call it A2. We can create a single-tape turing machine whose alphabet is the cross-product of A1 and A2. Now each symbol on the tape is equivalent of a symbol on tape 1 and a symbol on tape 2. So we've got a single-tape machine which is equivalent to the two-tape machine. Bingo.

We can lift the restriction on the heads moving together, but it's a lot more work. A two-tape machine can do things a lot faster than a one-tape, and the simulation will necessarily adapt to that. But it's entirely doable. How about a two-dimensional tape? We can simulate that pretty easily with a two-tape machine, which means we can do it with a one-tape machine. For a two tape machine, what we do is map the two-D tape onto the one-D-tape, as seen in the diagram below - so that cell 0 on the one-D tape corresponds to cell (0,0) on the two tape; cell (0,1) on the two-D corresponds to cell 1 on the one-D; cell (1,1) on the 2-D is cell 2 on the 1-D; etc. Then we use the second tape for the book-keeping necessary to do the equivalent of T-D tape moves. And we've got a two-D turing machine simulated with a two-tape one-D; and we know that we can simulate a two-tape one-D with a one-tape one-D.

This is, to me, the most beautiful thing about the Turing machine. It's not just a fundamental theoretical construction of a computing device; it's a simple construction of a computing device that's really easy to experiment with. Consider, for a moment, lambda calculus. It's more useful that a Turing machine for lots of purposes - we write real programs in lambda calculus, when no one would build a real application using a Turing machine program. But imagine how you'd try things like the alternate constructions of the Turing machine? It's a whole lot harder to build experiments like those in lambda calculus. Likewise for Minsky machines, Markov machines, etc.

For your enjoyment, I've implemented a Turing machine programming language. You feed it a Turing machine description, and an input string, and it will give you a trace of the machines execution like the one above. Here's the specification of the subtraction machine written in my little Turing language:

states { "scanright" "eraseone" "subtractOneFromResult" "skipblanks" } initial "scanright"
alphabet { '1' ' ' '=' '-' } blank ' '
trans from "scanright" to "scanright" on (' ') write ' ' move right
trans from "scanright" to "scanright" on ('1') write '1' move right
trans from "scanright" to "scanright" on ('-') write '-' move right
trans from "scanright" to "eraseone" on ('=') write ' ' move left
trans from "eraseone" to "subtractOneFromResult" on ('1') write '=' move left
trans from "eraseone" to "Halt" on ('-') write ' ' move left
trans from "subtractOneFromResult" to "subtractOneFromResult" on ('1') write '1' move left
trans from "subtractOneFromResult" to "skipblanks" on ('-') write '-' move left
trans from "skipblanks" to "skipblanks" on (' ') write ' ' move left
trans from "skipblanks" to "scanright" on ('1') write ' ' move right

I think it's pretty clear as a syntax, but it still needs explanation.

• The first line declares the possible states of the machine, and what state it starts in. This machine has four possible states - "scanright", "eraseone", "subtractOneFromResult", and "skipblanks". When the machine starts, it will be in the state "skipright".
• The second line declares the set of symbols that can appear on the tape, including what symbol will initially appear on any tape cell whose value wasn't specified by the input. For this machine the symbols are the digit 1, a blank space, the equal sign, and the subtraction symbol; the blank symbol is on any tape cell whose initial value wasn't specified.
• After that is a series of transition declarations. Each declaration specifies what the machine will do for a given pair of initial state and tape symbol. So, for example, if the machine is in state "scanright", and the current tape cell contains the equal-to sign, then the machine will change to state "eraseone", write a blank onto the tape cell (erasing the "=" sign), and move the tape cell one position to the left.

Finally, the code. You'll need GHCI to run it; at the moment, it won't work in Hugs (I don't have the parsing library in my version of Hugs), and I haven't worked out the linkage stuff to make it work under the GHC compiled mode.

--
-- A Simple Turing machine interpreter
-- Copyright 2007 by Mark C. Chu-Carroll
--    markcc@gmail.com
--   http://scienceblogs.com/goodmath
--
-- You can do anything you want with this code so long as you
-- leave this copyright notice intact.
--
module Turing where
import Text.ParserCombinators.Parsec
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language
import System.Environment

data Motion = MoveLeft  | MoveRight deriving (Show)

-- Transition from to on move write
data Transition = Transition String String [Char] Motion  Char deriving (Show)

-- TuringMachine states initial alphabet blank transitions
data TuringMachine = Machine [String] String   [Char]    Char    [Transition] deriving (Show)

data MachineState = TMState [Char] Char [Char]  String  deriving (Show)
--                           tape-left curcell  curstate

getBlankSym :: TuringMachine -> Char
getBlankSym (Machine _ _ _ bl _) = bl

findMatchingTransition :: [Transition] -> String -> Char -> Maybe Transition
findMatchingTransition [] _ _ =  Nothing
findMatchingTransition translist state c =
let matches = filter (transitionMatches state c) translist
in case matches of
[] -> Nothing
t:[] -> Just t
_ -> error "Ambiguous transition"
where transitionMatches state c (Transition from to on move write) =
(from == state) && (elem c on)

runTransition :: TuringMachine -> [Char] -> Char -> [Char] -> String -> Transition -> MachineState
runTransition m (l:left) c right state (Transition from tostate on MoveLeft write) =
TMState left l (write:right) tostate
runTransition m left c [] state (Transition from tostate on MoveRight write) =
TMState (write:left) (getBlankSym m) [] tostate
runTransition m left c (r:right) state (Transition from tostate on MoveRight write) =
TMState (write:left) r right tostate

stepMachine :: TuringMachine -> MachineState -> MachineState
stepMachine machine@(Machine _ _ _ _ translist) st@(TMState tapeleft c taperight state) =
let trans = findMatchingTransition translist state c
in case trans of
(Just t) -> runTransition machine tapeleft c taperight state t
Nothing -> error "No applicable transition from state"

getMachineStateString (TMState left c right state) =
(state ++ ":[" ++ (reverse left) ++ "{" ++ [c] ++ "}" ++ right ++ "]")

runTracedMachine :: TuringMachine -> [Char] -> [String]
runTracedMachine tm@(Machine states initial alphabet blank transitions) (t:tape) =
runTracedMachineSteps tm (TMState [] t tape initial) where
runTracedMachineSteps machine state =
let newstate@(TMState left c right st) = stepMachine machine state
in if st == "Halt"
then [getMachineStateString newstate]
else let trace=runTracedMachineSteps machine newstate
in ((getMachineStateString newstate):trace)

runMachine :: TuringMachine -> [Char] -> [Char]
runMachine tm@(Machine states initial alphabet blank transitions) (t:tape) =
runMachineSteps tm (TMState [] t tape initial) where
runMachineSteps machine state =
let newstate@(TMState left c right st) = stepMachine machine state
in if st == "Halt"
then (concat [(reverse left), [c], right])
else runMachineSteps machine newstate

---------------------------------------------------------------------------
-- Parsing code - implemented using the Parsec monadic parser library.
---------------------------------------------------------------------------

-- Basic setup stuff - use a standard haskell style lexer; set up the reserved words
-- and symbols in the lexer.
lexer :: P.TokenParser ()
{ P.reservedNames = ["states","alphabet","trans","from","to","on","write","move","left","right","initial","blank"] })

reserved = P.reserved lexer
symbol = P.symbol lexer
braces = P.braces lexer
parens = P.parens lexer
charlit = P.charLiteral lexer
strlit = P.stringLiteral lexer
ident = P.identifier lexer
whitespace = P.whiteSpace lexer

states = reserved "states"
alphabet = reserved "alphabet"
trans = reserved "trans"
from = reserved "from"
to = reserved "to"
on = reserved "on"
write = reserved "write"
move = reserved "move"
initial = reserved "initial"
blank = reserved "blank"

left = do { reserved "left"
; return MoveLeft
}

right = do { reserved "right"
; return MoveRight
}

-- statesDecl ::= "states" "{"  strlit+  "}" "initial" strlit
statesDecl = do { states
; strlst <- braces (many1 strlit)
; initial
; i <- strlit
; return (i,strlst)
}

-- alphaDecl ::= "alphabet" "{" charlit+  "}" "blank" charlit
; charlst <- braces (many1 charlit)
; blank
; bl <- charlit
; return (bl, charlst)
}

-- transDecl ::= "trans" "from" strlit "to" strlit "on" "(" charlit+ ")" "write" charlit "move" ("left" | "right")
transDecl = do { trans
; from
; fromState <- strlit
; to
; targetState <- strlit
; on
; chrs <- parens (many1 charlit)
; write
; wrchar <- charlit
; move
; dir <- ( left <|> right)
; return (Transition fromState targetState chrs dir wrchar)
}

-- machine ::= statesDecl alphaDecl transDecl+
machine = do { (i,sts) <- statesDecl
; trans <- many1 transDecl
; return (Machine sts i alpha bl trans)
}

run :: (Show a) => Parser a -> String -> IO a
run p input
= case (parse p "" input) of
Left err -> do{ putStr "parse error at "
; print err
; error "Parse error"
}
Right x -> return x

runTParser ::  String -> IO TuringMachine
runTParser input =
run (do { whitespace
; x <- machine
; eof
; return x })  input

--------------------------------------------------------------------------
-- A sample Turing program - first in the comment, and then coded into
-- a string literal, to make it easy to run tests. This sample program
-- is a basic Turing subtract - it takes a unary equation "111111-111=",
-- and leaves the result of subtracting the second from the first.
--
-- states { "one" "two" "three" "four" } initial "one"
-- alphabet { '1' ' ' '=' '-' } blank ' '
-- trans from "one" to "one" on (' ') write ' ' move right
-- trans from "one" to "one" on ('1') write '1' move right
-- trans from "one" to "one" on ('-') write '-' move right
-- trans from "one" to "two" on ('=') write ' ' move left
-- trans from "two" to "three" on ('1') write '=' move left
- trans from "two" to "Halt" on ('-') write '-' move left
-- trans from "three" to "three" on ('1') write '1' move left
-- trans from "three" to "four" on ('-') write '-' move left
-- trans from "four" to "four" on (' ') write ' ' move left
-- trans from "four" to "one" on ('1') write ' ' move right

sampleMachine = concat ["states { \"one\" \"two\" \"three\" \"four\" } initial \"one\"\n ",
" alphabet { '1' ' ' '=' '-' } blank ' '\n ",
"trans from \"one\" to \"one\" on (' ') write ' ' move right\n ",
"trans from \"one\" to \"one\" on ('1') write '1' move right\n ",
"trans from \"one\" to \"one\" on ('-') write '-' move right\n ",
"trans from \"one\" to \"two\" on ('=') write ' ' move left\n ",
"trans from \"two\" to \"three\" on ('1') write '=' move left\n ",
"trans from \"two\" to \"Halt\" on ('-') write '-' move left\n ",
"trans from \"three\" to \"three\" on ('1') write '1' move left\n ",
"trans from \"three\" to \"four\" on ('-') write '-' move left\n ",
"trans from \"four\" to \"four\" on (' ') write ' ' move left\n ",
"trans from \"four\" to \"one\" on ('1') write ' ' move right"  ]

runTracedMachineOnString :: String -> String -> IO ([String])
runTracedMachineOnString m str =
do
tm <- runTParser m
return (runTracedMachine tm str)

runMachineOnString :: String -> String -> IO String
runMachineOnString m str =
do
tm <- runTParser m
return (runMachine tm str)

sampleInput = " 11111111-111= "

------------------------------------------------------------------------
-- Main program execution scaffolding
-- main still needs a bit of work so that ghci will link correctly;
-- runs fine in GHCI, but linkage errors in GHC. For now, just load
-- this file, and then execute "runFromFile" from the command line.
------------------------------------------------------------------------
main =
do
[file] <- getArgs
m <- parseFromFile (do { whitespace
; x <- machine
; eof
; return x }) file
case m of
Right machine -> do
print "Enter input for parser:"
s <- getLine
result <- return (runMachine machine s)
print (concat ["Result:[", result, "]"])
Left x -> do
print (concat ["Parse error"])

runFromFile :: String -> IO ()
runFromFile filename =
do
m <- parseFromFile (do { whitespace
; x <- machine
; eof
; return x }) filename
case m of
Right machine -> do
print "Enter input for parser:"
s <- getLine
result <- return (runMachine machine s)
print (concat ["Result:[", result, "]"])
Left x -> do
print (concat ["Parse error"])

## A Neat Trick with Partial Evalutors

This is something I was talking about with a coworker today, and I couldn't quickly find a friendly writeup of it online, so I decided to write my own. (Yes, we do have a lot of people who enjoy conversations like this at foursquare, which is one of the many reasons that it's such a great place to work. And we are hiring engineers! Feel free to get in touch with me if you're interested, or just check our website!)

Anyway - what we were discussing was something called partial evaluation (PE). The idea of PE is almost like an eager form of currying. Basically, you have a two parameter function, f(x, y). You know that you're going to invoke it a bunch of times with the same value of x, but different values of y. So what you want to do is create a new function, fx(y), which evaluates as much as possible of f using the known parameter.

It's often clearer to see it in programmatic form. Since these days, I pretty much live Scala, I'll use Scala syntax. We can abstractly represent a program as an object which can be run with a list of arguments, and which returns some value as its result.

trait Program {
def run(inputs: List[String]): String
}

If that's a program, then a partial evaluator would be:

object PartialEvaluator {
def specialize(prog: Program, knownInputs: List[String]): Program = {
...
}
}

What it does is take and program and a partial input, and returns a new program, which is the original program, specialized for the partial inputs supplied to the partial evaluator. So, for example, imagine that you have a program like:

object Silly extends Program {
def run(inputs: List[String]): String = {
val x: Int = augmentString(inputs[0]).toInt
val y: Int = augmentString(inputs[0]).toInt
if (x % 2 == 0) {
(y * y).toString
} else {
((y+1)*(y+1)).toString
}
}
}

This is obviously a stupid program, but it's good for an example, because it's so simple. If we're going to run Silly a bunch of times with 1 as the first parameter, but with different second parameters, then we could generate a specialized version of Silly:

val sillySpecialOne = PartialEvaluator.specialize(Silly, List("1"))

Here's where the interesting part comes, where it's really different from
currying. A partial evaluator evaluates everything that it
possibly can, given the inputs that it's supplied. So the value produced by the specializer would be:

object Silly_1 extends Program {
def run(inputs: List[String]): String = {
val y: Int = augmentString(inputs[0]).toInt
((y+1)*(y+1)).toString
}
}

This can be a really useful thing to do. It can turn in to a huge
optimization. (Which is, of course, why people are interested in it.) In compiler terms, it's sort of like an uber-advanced form of constant propagation.

But the cool thing that we were talking about is going a bit meta. Suppose that you run a partial evaluator on a programming language interpreter?

object MyInterpreter extends Program {
def run(inputs: List[String]): String = {
val code = inputs[0]
val inputsToCode = inputs.tail
interpretProgram(code, inputsToCode)
}

def interpretProgram(code: String, inputs: List[String]): String = {
...
}
}

We can run our partial evaluator to generate a version of the interpreter that's specialized for the code that the interpreter is supposed to execute:

1. We have an interpreter for language M, written in language L.
2. We have a partial evaluator for language L.
3. We run the partial evaluator on the interpreter, with a program
written in M that we want to interpret:
PartialEvaluator.specialize(M_Interpreter, M_Code).
4. The partial evaluator produces a program written in L.

So the partial evaluator took a program written in M, and transformed it into a program written in L!

We can go a step further - and generate an M to L compiler. How? By running the partial evaluator on itself. That is, run the partial
evaluator like this: PartialEvaluator.specialize(PartialEvaluator, M_Interpreter). The result is a program which takes an M program as
input, and generates an L program: that is, it's an M to L compiler!

We can go yet another step, and turn the partial evaluator into a
compiler generator: PartialEvaluator(PartialEvaluator,
List(source(PartialEvalutor)))
. What we get is a program which takes an interpreter written in L, and generates a compiler from it.

It's possible to actually generate useful tools this way. People have actually implemented Lisp compilers this way! For example, you can see someone do a simple version here.

• Scientopia Blogs