Finally: Gödel's Proof of Incompleteness!

Mar 12 2013 Published by under Incompleteness

Finally, we're at the end of our walkthrough of Gödel great incompleteness proof. As a refresher, the basic proof sketch is:

  1. Take a simple logic. We've been using a variant of the Principia Mathematica's logic, because that's what Gödel used.
  2. Show that any statement in the logic can be encoded as a number using an arithmetic process based on the syntax of the logic. The process of encoding statements numerically is called Gödel numbering.
  3. Show that you can express meta-mathematical properties of logical statements in terms of arithemetic properties of their Gödel numbers. In particular, we need to build up the logical infrastructure that we need to talk about whether or not a statement is provable.
  4. Using meta-mathematical properties, show how you can create an unprovable statement encoded as a Gödel number.

What came before:

  1. Gödel numbering: The logic of the Principia, and how to encode it as numbers. This was step 1 in the sketch.
  2. Arithmetic Properties: what it means to say that a property can be expressed arithemetically. This set the groundwork for step 2 in the proof sketch.
  3. Encoding meta-math arithmetically: how to take meta-mathematical properties of logical statements, and define them as arithmetic properties of the Gödel numberings of the statements. This was step 2 proper.

So now we can move on to step three, where we actually see why mathematical logic is necessarily incomplete.

When we left off with Gödel, we'd gone through a very laborious process showing how we could express meta-mathematical properties of logical statements as primitive recursive functions and relations. We built up to being able to express one non-primitive recursive property, which describes the property that a given statement is provable:

pred provable(x) =
  some y {
    proofFor(y, x)
  }
}

The reason for going through all of that was that we really needed to show how we could capture all of the necessary properties of logical statements in terms of arithmetic properties of their Gödel numbers.

Now we can get to the target of Gödel's effort. What Gödel was trying to do was show how to defeat the careful stratification of the Principia's logic. In the principia, Russell and Whitehead had tried to avoid problems with self-reference by creating a very strict stratification, where each variable or predicate had a numeric level, and could only reason about objects from lower levels. So if natural numbers were the primitive objects in the domain being reasoned about, then level-1 objects would be things like specific natural numbers, and level-1 predicates could reason about specific natural numbers, but not about sets of natural numbers or predicates over the natural numbers. Level-2 objects would be sets of natural numbers, and level-2 predicates could reason about natural numbers and sets of natural numbers, but not about predicates over sets of natural numbers, or sets of sets of natural numbers. Level-3 objects would be sets of sets of natural numbers... and so on.

The point of this stratification was to make self-reference impossible. You couldn't make a statement of the form "This predicate is true": the predicate would be a level-N predicate, and only a level N+1 predicate could reason about a level-N predicate.

What Gödel did in the laborious process we went through in the last
post is embed a model of logical statements in the natural numbers. That's the real trick: the logic is designed to work with a set of objects that are a model of the natural numbers. By embedding a model of logical statements in
the natural numbers, he made it possible for a level-1 predicate (a predicate about a specific natural number) to reason about any logical statement or object. A level-1 predicate can now reason about a level-7 object! A level-1 predicate can reason about the set defined by a level-1 predicate: a level-1 predicate can reason about itself!.

Now, we can finally start getting to the point of all of this: incompleteness! We're going to use our newfound ability to nest logical statements into numbers to construct an unprovable true statement.

In the last post, one of the meta-mathematical properties that we defined for the Gödel-numbered logic was immConseq, which defines when some statement x is an immediate consequence of a set of statements S. As a reminder, that means that x can be inferred from statements in S in one inferrence step.

We can use that property to define what it means to be a consequence of a set of statements: it's the closure of immediate consequence. We can define it in pseudo-code as:

def conseq(κ) = {
  K = κ + axioms
  added_to_k = false
  do {
    added_to_k = false
    for all c in immConseq(K) {
      if c not in K {
        add c to K
        added_to_k = true
      }
    }
  } while added_to_k
  return K
}

In other words, Conseq(κ) is the complete set of everything that can possibly be inferred from the statements in κ and the axioms of the system. We can say that there's a proof for a statement x in κ if and only if x ∈ Conseq(κ).

We can the idea of Conseq use that to define a strong version of what it means for a logical system with a set of facts to be consistent. A system is ω-consistent if and only if there is not a statement a such that: a ∈ Conseq(κ) ∧ not(forall(v, a)) ∈ Conseq(κ).

In other words, the system is ω-consistent as long as it's never true that both a universal statement and it. But for our purposes, we can treat it as being pretty much the same thing. (Yes, that's a bit hand-wavy, but I'm not trying to write an entire book about Gödel here!)

(Gödel's version of the definition of ω-consistency is harder to read than this, because he's very explicit about the fact that Conseq is a property of the numbers. I'm willing to fuzz that, because we've shown that the statements and the numbers are interchangable.)

Using the definition of ω-consistency, we can finally get to the actual statement of the incompleteness theorem!

Gödel's First Incompleteness Theorem: For every ω-consistent primitive recursive set κ of formulae, there is a primitive-recursive predicate r(x) such that neither forall(v, r) nor not(forall(v, r)) is provable.

To prove that, we'll construct the predicate r.

First, we need to define a version of our earlier isProofFigure that's specific to the set of statements κ:

pred isProofFigureWithKappa(x, kappa) = {
  all n in 1 to length(x) {
    isAxiom(item(n, x)) or
    item(n, x) in kappa or
    some p in 0 to n {
      some q in 0 to n {
        immedConseq(item(n, x), item(p, x), item(q, x))
      }
    }
  } and length(x) > 0
}

This is the same as the earlier definition - just specialized so that it ensures that every statement in the proof figure is either an axiom, or a member of κ.

We can do the same thing to specialize the predicate proofFor and provable:

pred proofForStatementWithKappa(x, y, kappa) = {
  isProofFigureWithKappa(x, kappa) and
  item(length(x), x) = y
}

pred provableWithKappa(x, kappa) = {
  some y {
    proofForStatementWithKappa(y, x, kappa)
  }
}

If κ is the set of basic truths that we can work with, then
provable in κ is equivalent to provable.

Now, we can define a predicate UnprovableInKappa:

pred NotAProofWithKappa(x, y, kappa) = {
  not (proofForKappa(x, subst(y, 19, number(y))))
}

Based on everything that we've done so far, NotAProofWithKappa is primitive recursive.

This is tricky, but it's really important. We're getting very close to the goal, and it's subtle, so let's take the time to understand this.

  • Remember that in a Gödel numbering, each prime number is a variable. So 19 here is just the name of a free variable in y.
  • Using the Principia's logic, the fact that variable 19 is free means that the statement is parametric in variable 19. For the moment, it's an incomplete statement, because it's got an unbound parameter.
  • What we're doing in NotAProofWithKappa is substituting the numeric coding of y for the value of y's parameter. When that's done, y is no longer incomplete: it's unbound variable has been replaced by a binding.
  • With that substitution, NotAProofWithKappa(x, y, kappa) is true when x does not prove that y(y) is true.

What NotAProofWithKappa does is give us a way to check whether a specific sequence of statements x is not a proof of y.

We want to expand NotAProofWithKappa to something universal. Instead of just saying that a specific sequence of statements x isn't a proof for y, we want to be able to say that no possible sequence of statements is a proof for y. That's easy to do in logic: you just wrap the statement in a "∀ x ( )". In Gödel numbering, we defined a function that does exactly that. So the universan form of provability is:
∀ a (NotAProofWithKappa(a, y, kappa)).

In terms of the Gödel numbering, if we assume that the Gödel number for the variable a is 17, and the variable y is numbered as 19, we're talking about the statement p = forall(17, ProvableInKappa(17, 19, kappa).

p is the statement that for some logical statement (the value of variable 19, or y in our definition), there is no possible value for variable 17 (a) where a proves y in κ.

All we need to do now is show that we can make p become self-referential. No problem: we can just put number(p) in as the value of y in UnprovableInKappa. If we let q be the numeric value of the statement UnprovableInKappa(a, y), then:

r = subst(q, 19, p)

i = subst(p, 19, r)

i says that there is no possible value x that proves p(p). In other words, p(p) is unprovable: there exists no possible proof that there is no possible proof of p!

This is what we've been trying to get at all this time: self-reference! We've got a predicate y which is able to express a property of itself. Worse, it's able to express a negative property of itself!

Now we're faced with two possible choices. Either i is provable - in which case, κ is inconsistent! Or else i is unprovable - in which case κ is incomplete, because we've identified a true statement that can't be proven!

That's it: we've shown that in the principia's logic, using nothing but arithmetic, we can create a true statement that cannot be proven. If, somehow, it were to be proven, the entire logic would be inconsistent. So the principia's logic is incomplete: there are true statements that cannot be proven true.

We can go a bit further: the process that we used to produce this result about the Principia's logic is actually applicable to other logics. There's no magic here: if your logic is powerful enough to do Peano arithmetic, you can use the same trick that we demonstrated here, and show that the logic must be either incomplete or inconsistent. (Gödel proved this formally, but we'll just handwave it.)

Looking at this with modern eyes, it doesn't seem quite as profound as it did back in Gödel's day.

When we look at it through the lens of today, what we see is that in the Principia's logic, proof is a mechanical process: a computation. If every true statement was provable, then you could take any statement S, and write a program to search for a proof of either S or ¬ S, and eventually, that program would find one or the other, and stop.

In short, you'd be able to solve the halting problem. The proof of the halting problem is really an amazingly profound thing: on a very deep level, it's the same thing as incompleteness, only it's easier to understand.

But at the time that Gödel was working, Turing hadn't written his paper about the halting problem. Incompletess was published in 1931; Turing's halting paper was published in 1936. This was a totally unprecedented idea when it was published. Gödel produced one of the most profound and surprising results in the entire history of mathematics, showing that the efforts of the best mathematicians in the world to produce the perfection of mathematics were completely futile.

5 responses so far

Models and Why They Matter

Feb 19 2012 Published by under lambda calculus, Logic, Numbers

As I said in the last post, Church came up with λ-calculus, which looks like it's a great formal model of computation. But - there was a problem. Church struggled to find a model. What's a model, and why would that matter? That's the point of this post. To get a quick sense of what a model is, and why it matters?

A model is basically a mapping from the symbols of a logical system to some set off objects, such that all statements that you can prove in the logical system will be true about the corresponding objects. Note that when I say object here, I don't necessarily mean real-world physical objects - they're just something that we can work with, which is well-defined and consistent.

Why does it matter? Because the whole point of a system like λ-calculus is because we want to use it for reasoning. When you have a logical system like λ-calculus, you've built this system with its rules for a reason - because you want to use it as a tool for understanding something. The model provides you with a way of saying that the conclusions you derive using the system are meaningful. If the model isn't correct, if it contains any kind of inconsistency, then your system is completely meaningless: it can be used to derive anything.

So the search for a model for λ-calculus is really important. If there's a valid model for it, then it's wonderful. If there isn't, then we're just wasting our time looking for one.

So, now, let's take a quick look at a simple model, to see how a problem can creep in. I'm going to build a logic for talking about the natural numbers - that is, integers greater than or equal to zero. Then I'll show you how invalid results can be inferred using it; and finally show you how it fails by using the model.

One quick thing, to make the notation easier to read: I'm going to use a simple notion of types. A type is a set of atoms for which some particular one-parameter predicate is true. For example, if P(x) is true, I'll say that x is a member of type P. In a quantifier, I'll say things like \forall x \in P: \mbox{\em foo} to mean \forall x : P(x) \Rightarrow \mbox{\em foo}. Used this way, we can say that P is a type predicate.

How do we define natural numbers using logic?

First, we need an infinite set of atoms, each of which represents one number. We pick one of them, and call it zero. To represent the fact that they're natural numbers, we define a predicate {\cal N}(x), which is true if and only if x is one of the atoms that represents a natural number.

Now, we need to start using predicates to define the fundamental properties of numbers. The most important property of natural numbers is that they are a sequence. We define that idea using a predicate, \mbox{\em succ}(x,y), where \mbox{\em succ}(x,y) is true if and only if x = y + 1. To use that to define the ordering of the naturals, we can say: \forall x \in {\cal N}: \exists y: \mbox{\em succ}(x, y).

Or in english: every natural number has a successor - you can always add one to a natural number and get another natural number.

We can also define predecessor similarly, with two statements:

  1. \forall x \in {\cal N}: \exists y \in {\cal N}: \mbox{\em pred}(x, y).
  2. \forall x,y \in {\cal N}: \mbox{\em pred}(y,x) \Leftrightarrow \mbox{\em succ}(x,y)

So every number has a predecessor, and every number has a successor, and x is the predecessor of y if y is the successor of x.

To be able to define things like addition and subtraction, we can use successor. Let's define addition using a predicate Sum(x,y,z) which means "z = x + y".

  1. \forrall x,y \in {\cal N}: \exists z \in {\cal N} : Sum(x,y,z)
  2. \forall x,y \in {\cal N}: Sum(x, 0, x)
  3. \forall x,y,z \in {\cal N}: \exists a,b \in {\cal N}: Sum(a,b,z) \land \mbox{\em succ}(a,x) \land \mbox{\em succ}(y,b) \Rightarrow Sum(x, y, z)

Again, in english: for any two natural numbers, there is a natural number that it their sum; x + 0 always = x; and for any natural number, x + y = z is true if (x + 1) + (y - 1) = z.

Once we have addition, subtraction is easy: \forall x,y,z \in {\cal N} : diff(x,y,z) \Leftrightarrow sum(z,y,x)

That's: x-y=z if and only if x=y+z.

We can also define greater than using addition:

\forall x,y \in {\cal N} : x > y \Leftrightarrow

  • \mbox{\em succ}(x,y)) \lor
  • \exists z \in {\cal N}: \mbox{\em succ}(x, z)) \land exists z \in {\cal N}: \mbox{\em succ}(x,z) and z > y.

That's x > y if you can get to x from y by adding one repeatedly.

So, we've got a nice definition of natural numbers here, right?

Almost. There's one teeny little mistake.

We said that every natural number has a successor and a predecessor, and we also said that a number x is greater than a number y if you can get from y to x using a sequence of successors. That means that 0 has a predecessor, and that the predecessor of 0 is less than 0. But we're supposed to be defining the natural numbers! And one of the standard axioms of the natural numbers is that \forall x \in {\cal N}: 0 \le x. But we've violated that - we have both \forall x \in {\cal N}: 0 \le x, and
\exists x \in {\cal N}: 0 > x. And with a contradiction like that in the system, we can prove anything we want, anything at all. We've got a totally worthless, meaningless system.

That's why mathematicians are so particular about proving the validity of their models: because the tiniest error can mean that anything you proved with the logic might not be true - your proofs are worthless.

10 responses so far

Sidetrack from the CCCs: Lambda Calculus

Jan 18 2012 Published by under Category Theory, lambda calculus

So, last post, I finally defined closed cartesian categories. And I alluded to the fact that the CCCs are, essentially, equivalent to the simply typed λ calculus. But I didn't really talk about what that meant.

Before I can get to that, you need to know what λ calculus is. Many readers are probably familiar, but others aren't. And as it happens, I absolutely love λ calculus.

In computer science, especially in the field of programming languages, we tend to use λ calculus a whole lot. It's also extensively used by logicians studying the nature of computation and the structure of discrete mathematics. λ calculus is great for a lot of reasons, among them:

  1. It's very simple.
  2. It's Turing complete: if a function can be computed by any possible computing device, then it can be written in λ-calculus.
  3. It's easy to read and write.
  4. Its semantics are strong enough that we can do reasoning from it.
  5. It's got a good solid model.
  6. It's easy to create variants to explore the properties of various alternative ways of structuring computations or semantics.

The ease of reading and writing λ calculus is a big deal. It's led to the development of a lot of extremely good programming languages based, to one degree or another, on the λ calculus: Lisp, ML, Haskell, and my current favorite, Scala, are very strongly λ calculus based.

The λ calculus is based on the concept of functions. In the pure λ calculus, everything is a function; there are no values except for functions. In fact, we can pretty much build up all of mathematics using λ-calculus.

With the lead-in out of the way, let's dive in a look at λ-calculus. To define a calculus, you need to define two things: the syntax, which describes how valid expressions can be written in the calculus; and a set of rules that allow you to symbolically manipulate the expressions.

Lambda Calculus Syntax

The λ calculus has exactly three kinds of expressions:

  1. Function definition: a function in λ calculus is an expression, written: λ param . body, which defines a function with one parameter.
  2. Identifier reference: an identifier reference is a name which matches the name of a parameter defined in a function expression enclosing the reference.
  3. Function application: applying a function is written by putting the function value in front of its parameter, as in x y to apply the function x to the value y.

There's a trick that we play in λ calculus: if you look at the definition above, you'll notice that a function (lambda expression) only takes one parameter. That seems like a very big constraint - how can you even implement addition with only one parameter?

It turns out to be no problem, because of the fact that functions are, themselves, values. Instead of writing a two parameter function, you can write a one parameter function that returns a one parameter function, which can then operate on the second parameter. In the end, it's effectively the same thing as a two parameter function. Taking a two-parameter function, and representing it by two one-parameter functions is called currying, after the great logician Haskell Curry.

For example, suppose we wanted to write a function to add x and y. We'd like to write something like: λ x y . x + y. The way we do that with one-parameter functions is: we first write a function with one parameter, which returns another function with one parameter.

Adding x plus y becomes writing a one-parameter function with parameter x, which returns another one parameter function which adds x to its parameter: λ x . (λ y . x + y).

Now that we know that adding multiple parameter functions doesn't really add anything but a bit of simplified syntax, we'll go ahead and use them when it's convenient.

One important syntactic issue that I haven't mentioned yet is closure or complete binding. For a λ calculus expression to be evaluated, it cannot reference any identifiers that are not bound. An identifier is bound if it a parameter in an enclosing λ expression; if an identifier is not bound in any enclosing context, then it is called a free variable. Let's look quickly at a few examples:

  • λ x . p x y: in this expression, y and p are free, because they're not the parameter of any enclosing λ expression; x is bound because it's a parameter of the function definition enclosing the expression p x y where it's referenced.
  • λ x y.y x: in this expression both x and y are bound, because they are parameters of the function definition, and there are no free variables.
  • λ y . (λ x . p x y). This one is a tad more complicated, because we've got the inner λ. So let's start there. In the inner λ, λ x . p x y, y and p are free and x is bound. In the full expression, both x and y are bound: x is bound by the inner λ, and y is bound by the other λ. "p" is still free.

We'll often use "free(x)" to mean the set of identifiers that are free in the expression "x".

A λ calculus expression is valid (and thus evaluatable) only when all of its variables are bound. But when we look at smaller subexpressions of a complex expression, taken out of context, they can have free variables - and making sure that the variables that are free in subexpressions are treated right is very important.

Lambda Calculus Evaluation Rules

There are only two real rules for evaluating expressions in λ calculus; they're called α and β. α is also called "conversion", and β is also called "reduction".

α is a renaming operation; basically it says that the names of variables are unimportant: given any expression in λ calculus, we can change the name of the parameter to a function as long as we change all free references to it inside the body.

So - for instance, if we had an expression like:


λ x . if (= x 0) then 1 else x^2

We can do an α to replace X with Y (written "α[x/y]" and get):


λ y . if (= y 0) then 1 else y^2

Doing α does not change the meaning of the expression in any way. But as we'll see later, it's important because without it, we'd often wind up with situations where a single variable symbol is bound by two different enclosing λs. This will be particularly important when we get to recursion.

β reduction is where things get interesting: this single rule is all that's needed to make the λ calculus capable of performing any computation that can be done by a machine.

β basically says that if you have a function application, you can replace it with a copy of the body of the function with references to the parameter identifiers replaced by references to the parameter value in the application. That sounds confusing, but it's actually pretty easy when you see it in action.

Suppose we have the application expression: (λ x . x + 1) 3. By performing a beta reduction, we can replace the application by taking the body x + 1 of the function, and substituting (or αing) the value of the parameter (3) for the parameter variable symbol (x). So we replace all references to x with 3. So the result of doing a beta reduction xs 3 + 1.

A slightly more complicated example is the expression:

λ y . (λ x . x + y)) q

It's an interesting expression, because it's a λ expression that when applied, results in another λ expression: that is, it's a function that creates functions. When we do beta reduction in this, we're replacing all references to the parameter y with the identifier q; so, the result is λ x . x + q.

One more example, just for the sake of being annoying. Suppose we have: (λ x y. x y) (λ z . z * z) 3

That's a function that takes two parameters, and applies the first one to the second one. When we evaluate that, we replace the parameter x in the body of the first function with λ z . z * z; and we replace the parameter y with 3, getting: (λ z . z * z) 3. And we can perform beta on that, getting 3 * 3.

Written formally, beta says: λ x . B e = B[x := e] if free(e) ⊂ free(B[x := e])

That condition on the end, "if free(e) ⊂ free(B[x := e]" is why we need α: we can only do beta reduction if doing it doesn't create any collisions between bound identifiers and free identifiers: if the identifier "z" is free in "e", then we need to be sure that the beta-reduction doesn't make "z" become bound. If there is a name collision between a variable that is bound in "B" and a variable that is free in "e", then we need to use α to change the identifier names so that they're different.

As usual, an example will make that clearer: Suppose we have a expression defining a function, λ z . (λ x . x+z). Now, suppose we want to apply it: (λ z . (λ x . x + z)) (x + 2). In the parameter (x + 2), x is free. Now, suppose we break the rule and go ahead and do beta. We'd get "λ x . x + x + 2". The variable that was free in x + 2 is now bound! We've changed the meaning of the function, which we shouldn't be able to do. If we were to apply that function after the incorrect β, we'd get (λ x . x + x + 2) 3. By beta, we'd get 3 + 3 + 2, or 8.

What if we did α the way we were supposed to?

First, we'd do an α to prevent the name overlap. By α[x/y], we would get λ z . (λ y . y + z) (x+2).

Then by β, we'd get "λ y . y + x + 2". If we apply this function the way we did above, then by β, we'd get 3+x+2.
3+x+2 and 3+3+2 are very different results!

And that's pretty much it. There's another optional rule you can add called η-conversion. η is a rule that adds extensionality, which provides a way of expressing equality between functions.

η says that in any λ expression, I can replace the value f with the value g if/f for all possible parameter values x, f x = g x.

What I've described here is Turing complete - a full effective computation system. To make it useful, and see how this can be used to do real stuff, we need to define a bunch of basic functions that allow us to do math, condition tests, recursion, etc. I'll talk about those in my next post.

It'l also important to point out that while I've gone through a basic definition of λ calculus, and described its mechanics, I haven't yet defined a model for λ-calculus. That's quite an important omission! λ-calculus was played with by logicians for several years before they were able to come up with a complete model for it, and it was a matter of great concern that although it looked correct, the early attempts to define a model for it were failures! And without a valid model, the results of the system are meaningless. An invalid model in a logical system like calculus is like a contradiction in axioms: it means that nothing that it produces is valid.

3 responses so far

3-Valued Semantics

Mar 23 2011 Published by under Fuzzy Logic

Before we can move from three-valued logic to fuzzy logic, we need to take a look at semantics - both how conventional two-valued logic handle semantics, and how three-valued logics extend the basic semantic structure. This isn't exactly one of the more exciting topics I've ever written about - but it is important, and going through it now will set the groundwork for the interesting stuff - the semantics of a true fuzzy logic.

What we've looked at so far has been propositional 3-valued logics. Propositional logics aren't particularly interesting. You can't do or say much with them. What we really care about is predicate logics. But all we need to do is take the three-valued logics we've seen, and allow statements to be predicate(object).

In a conventional first-order predicate logic, we define the semantics in terms of a model or interpretation of the logic. (Technically, a logic and an interpretation aren't quite the same thing, but for our purposes here, we don't need to get into the difference.)

An interpretation basically takes a domain consisting of a set of objects or values, and does two things:

  1. For each atomic symbol in the logic, it assigns an object from the domain. That value is called the interpretation of the symbol.
  2. For each predicate in the logic, it assigns a set, called the extension of the predicate. The extension contains the tuples for which the predicate is true.

For example, we could use logic to talk about Scientopia. The domain would be the set of bloggers, and the set of blogs. Then we could have predicates like "Writes", which takes two parameters - A, and B - and which is true is A is the author of the blog B.

Then the extension of "Writes" would be a set of pairs: { (MarkCC, Good Math/Bad Math), (Scicurious, Neurotic Physiology), ... }.

We can also define the counterextension, which is the set of pairs for whiche the predicate is not true. So the counterextension of "writes" would contain values like { (MarkCC, Neurotic Physiology), ...}.

Given a domain of objects, and the extension of each predicate, we know the meaning of statements in the logic. We know what objects we're reasoning about, and we know the truth or falsehood of every statement. Importantly, we don't need to know the counterextension of a predicate: if we know the extension, then the counterextension is simple the complement of the extension.

In three-valued Lukasiewicz logic, that's not true, for reasons that should be obvious: if I(A) is the interpretation of the predicate A, then the complement of I(A) is not the same thing as I(\lnot A). L_3 requires three sets for a predicate: the extension, the counterextension, and the fringe. The fringe of a predicate P is the set of values x for which P(x) is N.

To be more precise, an interpretation I for first order L_3 consists of:

  1. A set of values, D, called the domain of the logic. This is the set of objects that the logic can be used to reason about.
  2. For each predicate P of arity n (that is, taking n arguments), three sets ext(P), cext(P), and fringe(P), such that:
    • the values of the members of all three sets are members of D^n.
    • the sets are mutually exclusive - that is, there is no value that is in more than one of the sets.
    • the sets are exhaustive: ext(P) \cup cext(P) \cup fringe(P) = D^n.
  3. For each constant symbol a in the logic, an assignment of a to some member of D: I(a) \in D

With the interpretation, we can look at statements in the logic, and determine their truth or falsehood. But when we go through a proof, we'll often have statements that don't operate on specific values - they use variables inside of them. In order to make a statement have a truth value, all of the variables in that statement have to be bound by a quantifier, or assigned to a specific value by a variable assignment. So given a statement, we can frequently only talk about its meaning in terms of variable assignments for it.

So, for example, consider a simple statement: P(x,y,z). In the interpretation I, P(x, y, z) is satisfied if (I(x), I(y), I(z)) \in ext(P); it's dissatisfied if (I(x), I(y), I(z)) \in cext(P). Otherwise, (I(x), I(y), I(z)) must be in fringe(P), and then the statement is undetermined.

The basic connectives - and, or, not, implies, etc., all have defining rules like the above - they're obvious and easy to derive given the truth tables for the connectives, so I'm not going to go into detail. But it does get at least a little bit interesting when we get to quantified statements. But to talk about we need to first define a construction called a variant. Given a statement with variable assignment v, which maps all of the variables in the statement to values, an x-variant v' of v is a variable assignment where for every variable y except x, v'(y) == v(y). In other words, it's an assignment where all of the variables except x have the same value as in v.

Now we can finally get to the interpretation of quantified statements. Given a statement \forall x P, P is satisfied by a variable assignment v if P is satisfied by every x-variant of v; it's dissatisfied if P is dissatisfied by at least one x-variant of v. Otherwise, it's undetermined.

Similarly, an existentially quantified statement \exists x P is satisfied by v if P is satisfied by at least one x-variant of v; it's dissatisfied if P is dissatisfied by every x-variant of v. Otherwise, it's undetermined.

Finally, now, we can get to the most important bit: what it means for a statement to be true or false in L_3. A statement S is T (true) if it is satisfied by every variable assignment on I; it's F (false) if it's dissatisfied by every variable assignment on I, and it's N otherwise.

9 responses so far

Computability

Feb 06 2011 Published by under Computability, Computation

I just recently realized that I only wrote about computability back in the earliest days of this blog. Those posts have never been re-run, and they only exist back on the original blogger site. When I wrote them, I was very new to blogging - looking back, I think I can do a much better job now. So I'm going to re-do that topic. This isn't just going to be a re-post of those early articles, but a complete rewrite.

The way that I'm going to cover this is loosely based on the way that it was first taught to me by a wonderful professor, Dr. Eric Allender at Rutgers, where I went to college. Dr. Allender was a really tremendous professor: he managed to take an area of computer science that could seem hopelessly abstract and abstruse, and turned it into something fun and even exciting to learn about.

Computability is the most basic and fundamental sub-field of theoretical computer science. It's the study of what a mechanical computing device can do. Not just what a specific mechanical computing device can do, but what can any mechanical computing device do? What are the limits of what you can do mechanically? And once we know the limits, what can we discover about the nature of computation?

Continue Reading »

25 responses so far

Fuzzy Logic vs Probability

Feb 02 2011 Published by under Fuzzy Logic

In the comments on my last post, a few people asked me to explain the difference between fuzzy logic and probability theory. It's a very good question.

The two are very closely related. As we'll see when we start looking at fuzzy logic, the basic connectives in fuzzy logic are defined in almost the same way as the corresponding operations in probability theory.

The key difference is meaning.

There are two major schools of thought in probability theory, and they each assign a very different meaning to probability. I'm going to vastly oversimplify, but the two schools are the frequentists and the Bayesians

First, there are the frequentists. To the frequentists, probability is defined by experiment. If you say that an event E has a probability of, say, 60%, what that means to the frequentists is that if you could repeat an experiment observing the occurrence or non-occurrence of E an infinite number of times, then 60% of the time, E would have occurred. That, in turn, is taken to mean that the event E has an intrinsic probability of 60%.

The other alternative are the Bayesians. To a Bayesian, the idea of an event having an intrinsic probability is ridiculous. You're interested in a specific occurrence of the event - and it will either occur, or it will not. So there's a flu going around; either I'll catch it, or I won't. Ultimately, there's no probability about it: it's either yes or no - I'll catch it or I won't. Bayesians say that probability is an assessment of our state of knowledge. To say that I have a 60% chance of catching the flu is just a way of saying that given the current state of our knowledge, I can say with 60% certainty that I will catch it.

In either case, we're ultimately talking about events, not facts. And those events will either occur, or not occur. There is nothing fuzzy about it. We can talk about the probability of my catching the flu, and depending on whether we pick a frequentist or Bayesian interpretation, that means something different - but in either case, the ultimate truth is not fuzzy.

In fuzzy logic, we're trying to capture the essential property of vagueness. If I say that a person whose height is 2.5 meters is tall, that's a true statement. If I say that another person whose height is only 2 meters is tall, that's still true - but it's not as true as it was for the person 2.5 meters tall. I'm not saying that in a repeatable experiment, the first person would be tall more often than the second. And I'm not saying that given the current state of my knowledge, it's more likely than the first person is tall than the second. I'm saying that both people possess the property tall - but in different degrees.

Fuzzy logic is using pretty much the same tools as probability theory. But it's using them to trying to capture a very different idea. Fuzzy logic is all about degrees of truth - about fuzziness and partial or relative truths. Probability theory is interested in trying to make predictions about events from a state of partial knowledge. (In frequentist terms, it's about saying that I know that if I repeated this 100 times, E would happen in 60; in Bayesian, it's precisely a statement of partial knowledge: I'm 60% certain that E will happen.) But probability theory says nothing about how to reason about things that aren't entirely true or false.

And, in the other direction: fuzzy logic isn't particularly useful for talking about partial knowledge. If you allowed second-order logic, you could have fuzzy meta-predicates that described your certainty about crisp first-order predicates. But with first order logic (which is really where we want to focus our attention), fuzzy logic isn't useful for the tasks where we use probability theory.

So probability theory doesn't capture the essential property of meaning (partial truth) which is the goal of fuzzy logic - and fuzzy logic doesn't capture the essential property of meaning (partial knowledge) which is the goal of probability theory.

30 responses so far

More 3-valued logic: Lukasiewicz and Bochvar

Jan 31 2011 Published by under Fuzzy Logic

Last time I wrote about fuzzy logic, we were looking at 3-valued logics, and I mentioned that there's more than one version of 3-valued logic. We looked at one, called K^S_3, Kleene's strong 3-valued logic. In K^S_3, we extended a standard logic so that for any statement, you can say that it's true (T), false (F), or that you don't know (N). In this kind of logic, you can see some of the effect of uncertainty. In many ways, it's a very natural logic for dealing with uncertainty: "don't know" behaves in a very reasonable way.

For example, suppose I know that Joe is happy, but I don't know if Jane is happy. So the truth value of "Happy(Joe)" is T; the truth value of "Happy(Jane)" is N. In Kleene, the truth value of "Happy(Joe) ∨ Happy(Jane)" is T; since "Happy(Joe)" is true, then "Happy(Joe) ∨ anything" is true. And "Happy(Joe) ∧ Happy(Jane)" is N; since we know that Joe is happy, but we don't know whether or not Jane is happy, we can't know whether both Joe and Jane are happy. It works nicely. It's a rather vague way of handling vagueness, (that is, it lets you say you're not sure, but it doesn't let you say how not sure you are) but in so far as it goes, it works nicely.

A lot of people, when they first see Kleene's three-valued logic think that it makes so much sense that it somehow defines the fundamental, canonical three-valued logic in the same way that, say, first order predicatelogic defines the fundamental two-valued predicate logic.

It isn't.

There are a bunch of different ways of doing three-valued logic. The difference between them is related to the meaning of the third value - which, in turn, defines how the various connectives work.

There are other 3-valued logics. We'll talk about two others. There's Bochvar's logic, and there's Lukasiewicz's. In fact, we'll end up building our fuzzy logic on Lukasiewicz's. But Bochvar is interesting in its own right. So we'll take a look at both.

Continue Reading »

13 responses so far

Grandiose Crackpottery Proves Pi=4

Nov 16 2010 Published by under Bad Geometry, Bad Logic, Bad Math, Bad Physics

Someone recently sent me a link to a really terrific crank. This guy really takes the cake. Seriously, no joke, this guy is the most grandiose crank that I've ever seen, and I doubt that it's possible to top him. He claims, among other things, to have:

  1. Demonstrated that every mathematician since (and including) Euclid was wrong;
  2. Corrected the problems with relativity;
  3. Turned relativity into a unification theory by proving that magnetism is part of the relativistic gravitational field;
  4. Shown that all of gravitational/orbital dynamics is completely, utterly wrong; and, last but not least:
  5. proved that the one true correct value of \pi is exactly 4.

I'm going to focus on the last one - because it's the simplest illustration of both his own comical insanity, of of the fundamental error underlying all of his rubbish.

Continue Reading »

86 responses so far

Mr. Spock is Not Logical (book draft excerpt)

Mar 17 2009 Published by under Logic

As I mentioned, I'll be posting drafts of various sections of my book here on the blog. This is a rough draft of the introduction to a chapter on logic. I would be extremely greatful for comments, critiques, and corrections.

I'm a big science fiction fan. In fact, my whole family is pretty
much a gaggle of sci-fi geeks. When I was growing up, every
Saturday at 6pm was Star Trek time, when a local channel show
re-runs of the original series. When Saturday came around, we
always made sure we were home by 6, and we'd all gather in front of
the TV to watch Trek. But there's one one thing about Star Trek for
which I'll never forgive Gene Roddenberry or Star Trek:
"Logic". As in, Mr. Spock saying "But that would
not be logical.".

The reason that this bugs me so much is because it's taught a
huge number of people that "logical" means the same
thing as "reasonable". Almost every time I hear anyone
say that something is logical, they don't mean that it's logical -
in fact, they mean something almost exactly opposite - that it
seems correct based on intuition and common sense.

If you're being strict about the definition, then saying that
something is logical by itself is an almost meaningless
statement. Because what it means for some statement to be
logical is really that that statement is inferable
from a set of axioms in some formal reasoning system. If you don't
know what formal system, and you don't know what axioms, then the
statement that something is logical is absolutely meaningless. And
even if you do know what system and what axioms you're talking
about, the things that people often call "logical" are
not things that are actually inferable from the axioms.

Continue Reading »

17 responses so far

Bad Behavior has blocked 1559 access attempts in the last 7 days.