Categorical Computation Characterized By Closed Cartesian Categories

Jan 03 2012 Published by under Category Theory, lambda calculus

One of my favorite categorical structures is a thing called a closed cartesian category, or CCC for short. Since I'm a computer scientist/software engineer, it's a natural: CCCs are, basically, the categorical structure of lambda calculus - and thus, effectively, a categorical model of computation. However, before we can talk about the CCCs, we need - what else? - more definitions.

Cartesian Categories

A cartesian category C (note not cartesian closed category) is a category:

  1. With a terminal object t, and
  2. \forall a, b \in Obj(C), the objects and arrows of the categorical product a \times b \in C.

So, a cartesian category is a category closed with respect to product. Many of the common categories are cartesian: the category of sets, and the category of enumerable sets, And of course, the meaning of the categorical product in set? Cartesian product of sets.

Categorical Exponentials

To get from cartesian categories to cartesian closed categories, we also need to define categorical exponentials. Like categorical product, the value of a categorical exponential is not required to included in a category. The exponential is a complicated definition, and it's a bit hard to really get your head around, but it's well worth the effort. If categorical products are the categorical generalization of set products, then the categorical exponential is the categorical version of a function space. It gives us the ability to talk about structures that are the generalized version of "all functions from A to B".

Given two objects x and y from a category C, their categorical exponential xy, if it exists in the category, is defined by a set of values:

  • An object x^y,
  • An arrow \mbox{eval}_{y,x}: x^y \times y \rightarrow x, called an evaluation map.
  • \forall z \in Obj(C), an operation \Lambda_C: (z \times y \rightarrow x) \rightarrow (z \rightarrow x^y). (That is, an operation mapping from arrows to arrows.)

These values must have the following properties:

  1. \forall f : z \times y \rightarrow x, g : z \rightarrow x^y:
    • \mbox{val}_{y,x} \circ (\Lambda_C(f)\times 1_y)
    • \forall f : z \times y \rightarrow x, g : z \rightarrow x^y: \Lambda_C(\mbox{eval}_{y,x} \circ (z \times 1_y) = z

To make that a bit easier to understand, let's turn it into a diagram.

exponent.jpg

As I alluded to earlier, you can also think of it as a generalization of a function space.x^y is the set of all functions from y to x. The evaluation map is simple description in categorical terms of an operation that applies a function from a to b (an arrow) to a value from a, resulting in an a value from b.

So what does the categorical exponential mean? I think it's easiest to explain in terms of sets and functions first, and then just step it back to the more general case of objects and arrows.

If X and Y are sets, then X^Y is the set of functions from Y to X.

Now, look at the diagram:

  1. The top part says, basically, that g is a function from Z to to X^Y: so g takes a member of Z, and uses it to select a function from Y to X.
  2. The vertical arrow says:
    1. given the pair (z,y), f(z,y) maps (z,y) to a value in X.
    2. given a pair (z,y), we're going through a function. It's almost like currying:
      1. The vertical arrow going down is basically taking g(z,y), and currying it to g(z)(y).
      2. Per the top part of the diagram, g(z) selects a function from y to x. (That is, a member of X^Y.)
      3. So, at the end of the vertical arrow, we have a pair (g(z), y).
    3. The "eval" arrow maps from the pair of a function and a value to the result of applying the function to the value.

    Cartesian Closed Categories

    Now - the abstraction step is actually kind of easy: all we're doing is saying that there is a structure of mappings from object to object here. This particular structure has the essential properties of what it means to apply a function to a value. The internal values and precise meanings of the arrows connecting the values can end up being different things, but no matter what, it will come down to something very much like function application.

    With exponentials and products, we can finally say what the cartesian closed categories (CCCs). A Cartesian closed category is a category that is closed with respect to both products and exponentials.

    Why do we care? Well, the CCCs are in a pretty deep sense equivalent to the simply typed lambda calculus. That means that the CCCs are deeply tied to the fundamental nature of computation. The structure of the CCCs - with its closure WRT product and exponential - is an expression of the basic capability of an effective computing system. So next, we'll take a look at a couple of examples of what we can do with the CCCs as a categorical model of computation.

    Share

3 responses so far

What if it's not Regular? Pump it!

May 01 2011 Published by under Computation

At this point, we've seen a fair bit about regular languages, and we've gone through the introduction to context free languages. We know one way of showing that a language is regular or context free: if you can write a (regular/context free) grammar for a language, then that language is necessarily (regular/context free). But... what if we have a language that we suspect is not regular?

Continue Reading »

Share

5 responses so far

The Next Step in Computation: Level-2 Languages

Apr 26 2011 Published by under Computation

Time to move on to Chomsky level 2! Level two languages are also known as context free languages, or CFLs. Level 2 languages are wonderful things. They're simple enough to be parsed easily, but expressive enough to support a very wide range of useful languages. Pretty much every programming language that's widely used can have its syntax specified with a level-2 language.

Grammars for Context Free Languages

In terms of the grammar, a CFL is easy to describe: it's a language where the left-hand side of every grammar rule consists of exactly one non-terminal symbol. That's it: the right hand side of a rule in a CFL grammar can be anything at all. Unlike the regular grammars, there are no restrictions on the position or the number of NTSs on the right hand side.

This change makes a huge difference in what you can do. In a CFL, you can count. You can have distant relationships - things like a sub-string that can occurs at the end of a string only if a match for it occurs at the beginning. The canonical example of this is paren matching: you can write a language that makes sure that you have matching parens: the same number of open and close parens.

  • A ::= '(' ')'
  • A ::= A A

This language includes ()((())()()(())), but not ()((())()()(()) (the same thing, but with one trailing paren omitted - 8 opens, 7 closes), or ()((())()()(()))) (the same thing, but with an extra close paren at the end - 8 opens, 9 closes).

As a quick aside: this also illustrates what we mean when we say that a language supports counting. When we say that a language requires counting, what we mean is that is that some feature of a string in the language has to have a number of repetitions dictated by another feature in the same string. In a paren-matching grammar, we require that the number of close parens must be the same as the number of open parens. We don't just make sure that that's true for 1, or 2, or 3 open parens, we have matching close parens. For any number of parens, the number of closes must be the same as the number of opens.

We can look at a much less trivial example of a simple grammar. As I've written about at other times, in computer science, there's a formal language that we use for a ton of valuable things called lambda calculus. Lambda calculus is the formal mathematical basis of the Haskell language, and the basic tool used for specifying formal semantics of computational systems. The complete grammar of the simply typed lambda calculus is:

  • ident ::= "A" | "B" | "C" | ... | "Z"
  • expr ::= "\lambda" ident "." expr
  • expr ::= ident
  • expr ::= expr expr
  • expr ::= "(" expr ")"

You can see a practical example of counting in this grammar. It guarantees that expressions in the lambda calculus are well-formed. We couldn't do that in a regular language. That's a huge boost in capability.

So why do we call this a context free language? In terms of the grammar, when we're constructing a string, we're always doing it by replacing non-terminal symbols with sequences of other symbols. When we do one of those replacements, if there's an NTS "S" in the current string, then we can replace it. We can't look at anything but the individual non-terminals in the string. We can't consider the context in which a non-terminal occurs before we decide whether or not we're allowed to replace it.

Capability of Context Free Languages

As we've seen, CFLs give us some pretty significant additional capabilities. That abilities to count and to describe non-consecutive relationships between different parts of a string in a language are a huge upgrade in computational capability. But CFLs are still pretty limited in many ways. You can't write a grammar for a spoken language using CFGs - natural languages aren't context free. We use CFLs and CFGs all the time for compilers and such in real programs, but there's always an extra step involved, because there are aspects of real computer languages that can't be captured in context-free form.

So what can't you do in a CFL? I'm not going to try to formally characterize the limits of CFLs, but here are two examples:

Complex counting/Arithmetic
if you want a language of strings with the same number of Xs and Ys, that language is a CFL. If you want a string with the same number of Xs, Ys, and Zs - that isn't.
Constrained relationships
We can have some distant relationships in a context grammar - but it's a very limited capability. You can't specify that a particular symbol can only occur in one place in the string if it already occured in a prior part of the string. For example, in the lamba calculus example, it really should say that you can only use the "expr ::= ident" rule if the ident occured in an enclosing LAMBA ident". CFLs can't do that.

Computing CFLs: the PushDown Automaton

So - now that we know what a CFL is, and what it can express: what's
the basic computational model that underlies them? CFLs are computable using a kind of machine called a pushdown automaton (PDA). Pushdown automata are relatively simple: take a finite state machine, and add a stack.

For the non-CS folks out there, a stack is a last in first out (LIFO) storage system. What that means is that you can store something on the top of the stack whenever you want to (called pushing), look at what's on top of the stack (peeking), and removing the element on top (popping). For a PDA, the transitions look like:

(\mbox{state},\mbox{top-of-stack},\mbox{input}) \rightarrow (\mbox{state}, \mbox{stack-action})

  • The top-of-stack in the transition can be either a symbol from the machine's alphabet, or it can be "*". If it's a symbol, then the transition can only be taken if both the machine state and the top-of-stack match. If it's "*", then the transition can be taken regardless of the value on top of the stack.
  • The stack-action can be "push(symbol)"; "pop", or "none".
  • The machine accepts the input if it reaches a final state with the stack empty. (There are alternate formulations that don't require an empty stack, or that only require an empty stack but don't use final states. They're exactly equivalent to empty stack + final state.)

As usual, it's easiest to understand this with an example. So let's look at a simple language consisting of parens and identifiers, where the parens have to match. So, for example, "((I)(((I)I)(II)))" would be accepted, but "(I)(((I)I)(II)))" (the same string, but with the first open removed) would be rejected.

Our machine has an alphabet of "(", ")", and "I". It has two states: 0, and 1. 0 is both the initial state, and the only final state. The available transitions are:

  • (0, *, : in state 0, if you see an open paren, push it onto the stack, and stay in state 0 It doesn't matter what's on the stack - if there's an open-paren in state 0, you can do this.
  • (0, : in state 0, if you see a close paren and there's an open-paren on top of the stack, then you've matched a pair, so you can pop the top open off of the stack.
  • (0, \epsilon, : if you're in state 0, and you see a close paren, but the stack in empty, then go to state one. State one is an error state: it means that you saw a close paren without a corresponding open paren. That's not allowed in this grammar. Once you're in state 1, you're stuck.
  • (0, *, : in state zero, if you see an "I", then you consume it and continue in state zero. It doesn't matter what's on the stack, and you don't change the stack.

Or graphically:

So - if it sees a "(", it pushes a "(" on the stack. If it sees an identifier, it just keeps going past it. If it sees a ")", and there's an "(" on top of the stack, it pops the stack; if it sees a ")" and there's no "(" on the stack, then it switches into state 1. Since state 1 is not a final state, and there is no transitions that can be taken from state 1, the machine rejects the string if it sees an extra ")". If there's a "(" without a matching close, then when the machine finishes, it will have a non-empty stack, and so it will reject the string.

Finally, one nifty little note. The pushdown automaton is a very limited kind of machine. It can't do complex arithmetic, or process complex grammatical constructions. There's a lot that it can't do. So what happens if we take this very limited machine, and give it a second stack?

It becomes maximally powerful - Turing complete. In fact, it becomes a Turing machine. We'll see more about Turing machines later, but a Turing machine is equivalent to a two-stack PDA, and a Turing
machine can perform any computation computable by any mechanized computing process. So one stack is extremely constrained; two stacks is as un-constrained as any computing device can ever be.

Share

15 responses so far

Regular Expressions and Derivatives

Apr 16 2011 Published by under Computation, Haskell

When you're working with regular languages specified in regular expression form, there's a really cool idea that you can use for building regular expression matchers, and for describing how to convert from a regular expression to a NFA. It's called the Brzozozwksi derivative of a regular expression - or just simply the derivative of a regexp.

The basic idea of the derivative is that given a regular expression, r, you can derive a new regular expression called the derivative with respect to symbol c, D_c(r). D_c(r) is a regular expression describing the string matched by r after it's matched an r.

Continue Reading »

Share

3 responses so far

Nondeterminism in Finite State Automata

Apr 11 2011 Published by under Computation

In the last post, I mentioned the fact that regular expressions specify the same set of languages as regular grammars, and that finite state machines are the computational device that can recognize those languages.

It's even pretty easy to describe how to convert from regular expressions to FSMs. But before we do that, to make it a bit easier, we'll extend our finite state machines. Doing that is interesting in itself: What we're going to do is create non-deterministic finite state machines - NFA (for nondeterministic finite automata) for short.

Continue Reading »

Share

7 responses so far

Bad Behavior has blocked 1183 access attempts in the last 7 days.