 # Introduction to the LAST programming language

2022-09-01

Inspired by John Tromp’s remarkable 206-bit self-interpreter for Binary Lambda Calculus (BLC), which I found via Justine Tunney’s amazing SectorLambda, I got an idea for a minimal programming language.

The language is called LAST and this is an introduction to it.

# What is it?

LAST is an elegant encoding for lambda calculus which uses no explicit variables or indices and only 4 symbols: `L`, `A`, `S`, and `T`.

LAST maps trivially to binary and has a BLC-style self-interpreter which is only 194 bits long.

LAST features S-optimization which is a novel form of expression reduction and equivalence, not possible in other variants of lambda calculus.

# How does it look like?

To get a visual sense of the language, below are a few classic lambda calculus functions written in LAST.

name lambda calculus LAST
identity λx.x LT
Y combinator λf.(λx.f (x x)) (λx.f (x x)) LALASTATTLASTATT
Church encoding
true λx.λy.x LLST
false λx.λy.y LLT
successor λn.λf.λx.f (n f x) LLLASTAASSTSTT
plus λm.λn.λf.λx.m f (n f x) LLLLAASSSTSTAASSTSTT
pair λx.λy.λz.z x y LLLAATSSTST

For a more complicated function, see the LAST self-interpreter.

# Design

## Syntax

The formal grammar of the language is as follows (in ABNF):

``Term = "L" Term / "A" Term Term / "S" Term / "T"``

So a `Term` (an expression) in `LAST` is either:

• `L`ambda followed by another `Term` (the body of the lambda)
• `A`pplication followed by two consecutive `Term`s. The first represents the function to apply, the second represents the argument to pass to the function.
• `S`kip followed by another `Term` (the body of the skip).
• `T`op

## Semantics

The semantics are defined by the LAST machine, which is a slightly simplified Krivine machine. The difference is that instead of using de Bruijn indices, the machine uses two simple (albeit more powerful) terms: `S` and `T` (explained below).

The operation of the machine is described in detail in the following section. It is mostly the same as the Krivine machine, so you may skip to the next section if you are already familiar with it. Conversely, you can use descriptions of the Krivine machine to understand the LAST machine.

## The LAST machine

The machine state is initialized with the expression to be evaluated and two stacks: one for arguments, one for current environment (typically both empty). Pseudocode type of the machine state is thus:

``state = expression * arguments * environment``

Both stacks will contain pairs, each made of an expression together with an environment. Such a pair is called a closure.

``````arguments = stack of closure
environment = stack of closure
closure = expression * environment``````

The machine loops until the current expression is a `L`ambda and the argument stack is empty (that’s the stop condition). The result of the evaluation is the expression together with the environment stack.

Every iteration of the loop changes the machine state according to the current expression, as follows:

• `L`ambda causes the machine to stop if the stop condition is met. Otherwise a closure is moved from the top of the argument stack to the top of the environment stack. The current expression becomes the body of the lambda.
• `A`pplication causes the machine to push a new closure onto the argument stack. The closure is made of the argument expression together with the current environment. The current expression becomes the function expression.
• `S`kip causes the machine to pop a closure off the environment stack. The current expression becomes the body of the skip.
• `T`op causes the machine to pop a closure off the environment stack. The current environment becomes the closure’s environment. The current expression becomes the closure’s expression.

Before interpreting `S` or `T` the environment stack must be nonempty – otherwise the machine stops with an error.

## Correspondence between LAST and lambda calculus with de Bruijn indices

The Krivine machine interprets lambda calculus with de Bruijn indices. These indices are natural numbers which identify variables in the environment stack.

Instead of numerical indices LAST uses two primitives (`S` and `T`) which can emulate them: `T` is equivalent to `0` and `S` is equivalent to the successor function.

In other words, indices can be translated into LAST as follows:

• The index `0` is translated into `T`
• An index `n + 1` is translated to `S` followed by a translation of index `n`

Meaning an index `n` is n repetitions of `S` followed by a `T`. For example:

index in BLC in LAST
0 10 T
1 110 ST
2 1110 SST
3 11110 SSST
4 111110 SSSST
5 1111110 SSSSST

This shows that the LAST machine can fully emulate the Krivine machine.

However the Krivine machine cannot fully emulate the LAST machine, as `S` can also precede `L` and `A` (semantics explained in S-optimization).

In effect, all programs in BLC can be translated to LAST, but not vice versa.

Therefore LAST can be seen as a superset of lambda calculus with de Bruijn indices.

# S-optimization

LAST introduces the notion of S-optimization which is a kind of optimization that can turn some programs into shorter and more efficient equivalents.

The ability to encode these more efficient equivalents is unique to LAST.

S-optimization can be automated, so a compiler from BLC (or a higer-level language) to LAST is viable.

S-deoptimization, an inverse of S-optimization, can also be automated, so a compiler from LAST to BLC or lambda calculus with de Bruijn indices is also viable.

To explain S-optimization, let’s take the following lambda expression:

``λx.λy.x x``

which can be encoded for the Krivine machine as:

``λλ 1 1``

This in turn can be translated to LAST as:

``LLASTST``

However, there is also an equivalent, yet shorter and more efficient equivalent program which can be written in LAST, but not lambda calculus with de Bruijn indices:

``LLSATT``

The difference is that the `S` has been “factored out” of the `A`: an unused variable on top of the stack has been discarded before applying the variable below to itself. This has been done only once as opposed to two times in the previous program.

This may be easier to explain with a visualization:

Screenshot from Hai (Paul) Liu’s Lambda Viewer tool

As we can see `x` is referred to twice in the application while `y` is not referred to at all, but “stands in the way”. With de Bruijn indices and Krivine machine we have no choice but to implicitly skip over it twice while in LAST we can skip over only once.

The ability to do this in LAST enables more efficient and shorter programs. The more variables stand in the way and the more references to variables that precede them, the more the length of the program is reduced and the efficiency improved. For a slightly more complex example:

``````λx.λy.λz.x x x x
; before optimization:
LLLAAASSTSSTSSTSST
; after optimization:
LLLSSAAATTTT``````

Screenshot from Hai (Paul) Liu’s Lambda Viewer tool

As we can see we have 4 references to `x` and 2 variables (`y` and `z`) to skip over.

A nontrivial example of an S-optimized program is presented in the section BLC-style LAST self-interpreter.

# LAST programs are quaternary numbers

If we consider the symbols L, A, S, T to be digits, a LAST program is a number in quaternary.

For example:

``LALASTATTLASTATT``

is equivalent to the bijective quaternary number:

``1212342441234244``

where we map:

``````L <-> 1
A <-> 2
S <-> 3
T <-> 4``````

# BLC-style I/O

We shall now define an I/O protocol for LAST similar to that of BLC, except input and output in it will be a string of quaternary digits rather than bits.

For example the following LAST program:

``LTLALALA``

will output `LALALA`.

The program has 2 parts (separated by space for clarity):

``LT LALALA``

The first part, `LT`, is the main function (which is the identity function).

The second part, `LALALA`, is the input string.

Upon execution, the input is converted to a list of digits as follows.

Each digit is converted to a function which represents it:

``````L = LLLLSSST
A = LLLLSST
S = LLLLST
T = LLLLT``````

These functions are then paired together into a list (in pseudocode):

``````pair(L,
pair(A,
pair(L,
pair(A,
pair(L,
pair(A,
NIL
)
)
)
)
)
)``````

where `pair` is the standard pairing function, which is represented in LAST as:

``pair = LLLAATSSTST``

and `NIL` is identical to the standard `false` function:

``NIL = LLT``

The result of this conversion for our `LALALA` example can be represented in LAST as (whitespace for clarity):

``````LAAT LLLLSSST
LAAT LLLLSST
LAAT LLLLSSST
LAAT LLLLSST
LAAT LLLLSSST
LAAT LLLLSST
LLT``````

This list is then fed to the main function.

The main function should operate on that input and then output a result as a similar list of digits. Here the operation is identity, so the input is turned directly into output without change. So the output of our main function will be exactly the same list:

``````LAAT LLLLSSST
LAAT LLLLSST
LAAT LLLLSSST
LAAT LLLLSST
LAAT LLLLSSST
LAAT LLLLSST
LLT``````

That list is then converted back to digits and displayed:

``LALALA``

# BLC-style LAST self-interpreter

The operation of the BLC self-interpreter is explained in detail in this paper.

A similar albeit more efficient self-interpreter for LAST is 97 characters:

``````ALATTLALLLATSLAAAATSASTLASTLLASSTLAATSTSSTSASTLASS
TLASSTLAASSTTASTTSASTLASTLASTATLLTSATLATLLSTATT``````

The general interface of the interpreter is the same, except that thanks to the quaternary I/O (which fits neatly with the LAST syntax) as well as S-optimization it is more streamlined and efficient than the original.

Moreover, as we shall see, it can be trivially compressed into 194 bits, surpassing the original by 12 bits.

# Trivial binary encoding

Since LAST programs are quaternary numbers, there is a trivial mapping between LAST and binary:

``````L <-> 00
A <-> 01
S <-> 10
T <-> 11``````

# LAST-B: binary LAST

With the above mapping to binary, we can define LAST-B – a binary version of LAST which works exactly the same, except it uses binary rather than quaternary.

Since each quaternary digit turns into two binary digits, every LAST-B program fits into an even number of bits and the length of a LAST-B program in bits is twice the length of the equivalent LAST program.

This is unlike BLC, where odd de Bruijn indices take odd number of bits, so programs can have an odd length. The advantage of BLC indices is that they are shorter than LAST-B “indices” (index n is shorter by n bits).

# The 194-bit LAST-B self-interpreter

Having defined LAST-B, we can now translate the 97-character BLC-style LAST self-interpreter:

``````ALATTLALLLATSLAAAATSASTLASTLLASSTLAATSTSSTSASTLASS
TLASSTLAASSTTASTTSASTLASTLASTATLLTSATLATLLSTATT``````

into 194 bits:

``````01000111110001000000011110000101010111100110110001
10110000011010110001011110111010111001101100011010
11000110101100010110101111011011111001101100011011
00011011011100001110011100011100001011011111``````

# Playground

Enter LAST code or select an example and click one of the buttons below.

Example:

All characters except L, A, S, and T will be ignored.

Output will appear here.