λDNA programming language

Darius J Chuck

2023-12-04

Oops, I did it again: I implemented another minimal programming language based on lambda calculus, taking the correspondence between it and DNA to a new extreme.

Here is the version of its README file as of 2023-12-04. For the most recent version, check out the source code.

λDNA programming language

This is a working proof-of-concept for a programming language called λDNA. There are two implementations: one in plain old C and one in JavaScript which you can run directly in your browser here. λDNA is an evolution of the LAST programming language.

λDNA is modeled after biological DNA. It is based on lambda calculus and is possibly the most minimal complete programming language. λDNA can run the shortest-possible self-interpreter of lambda calculus which can be in principle implemented in a binary variant of λDNA (to be implemented?) in 4 bits.

λDNA builds on the work of great scientists, programmers, and researchers. It is inspired by Justine Tunney’s SectorLambda implementation of John Tromp’s Binary Lambda Calculus (BLC), a minimal version of Alonzo Church’s lambda calculus which incorporates N.G. de Bruijn’s namefree lambda expressions.

Cool features

Programs look like DNA sequences
‼️ Directly executable ‼️
Ultra-tiny: the entire logic of the language, including a REPL is on the order of TENS of lines of code
It supports I/O without any encoding or decoding
It can thus run the shortest possible self-interpreter of lambda calculus

Programs look like DNA sequences

Programs in λDNA are written as sequences of letters GACT – same as conventionally used to represent DNA sequences. Each of the four letters directly corresponds with a virtual machine code instruction and a fundamental operation of the lambda calculus.

‼️ Directly executable ‼️

No preprocessing or translation or even parsing is done on the programs λDNA takes as input. The minimal virtual machine directly executes sequences of letters as machine code. This simplifies the implementation and remarkably enables an I/O protocol that also does not need to encode the inputs or decode the outputs.

REPL

The way to run λDNA is through the REPL (Read-Evaluate-Print Loop). The REPL works as follows:

It reads a string of letters representing the program from standard input (stdin). The program can request input from stdin.
The program is then evaluated, and the result is printed to standard output (stdout). That output can then be examined by the next program.
Go to 1., consuming the next program which will have access to this program’s results.

This loop continues until stdin is closed or the λDNA machine is terminated by the user (the C implementation also terminates when an empty program is provided).

I/O

λDNA uses the following I/O protocol.

Input

When the REPL is first initialized, the first program that is read from stdin can use T (top of environment) to refer to external input. When the program tries to evaluate the T, it will request the input from stdin. The input should be another program, which will be ran in the same environment as T. This gives λDNA the capacity to process input lazily and interactively.

Output

Every time the REPL completes an iteration, a result is printed to stdout. That result is then made available to the next program under T (top of environment). Results from previous iterations (if any) are also available under CT, CCT, etc. (TODO: put a limit on the number of results available.) This gives λDNA the capacity to process output lazily and interactively.

Comparison to a BLC-like I/O protocol

As far as I can tell, the λDNA I/O protocol is in principle more general than and can be used to simulate a BLC-like I/O protocol.

For example, let’s consider the following REPL session:

λ> ATAGGAGGTACGGGCGACATCCTACCAGGCGCGCTAGGGCGCTAGGGGCTAGGGGTTTTGAGAATTCTGAATTCT
T> GACTAGGGGTT
GACATCCTACCAGGCGCGCTAGGGCGCTAGGGGCTAGGGGTTT
λ> AGGCTT
GGCGCGCT
λ> AGGTCT
T> GACTAGGGGCTT
GACATCCTACCAGGCGCGCTAGGGCGCTAGGGGCTAGGGGTTT
λ> AGGCTT
GGGCGCT
λ> AGGTCT
T> GGT
GGT

Note: the lines that begin with λ> contain programs that are read from stdin; the lines that begin with T> contain the input that these programs request; the lines without any prompt are the results that go to stdout.

Here we run a program¹ written for a BLC-style I/O protocol (specifically the flavor of the protocol that LAST uses), feeding it input symbol-by-symbol as plain unencoded lambda terms (like GACTAGGGGTT). This is in contrast to LAST which translates the letters L, A, S, or T into lambda terms before feeding them to the program – similarly to BLC, except BLC works on bits (0, 1).

Every time we feed an input symbol to the program, we get a result, which is a term that evaluates to a lazy output stream.

We can then destructure this output stream symbol-by-symbol.

Every time we run AGGCTT on it, we get the head of the output stream, which is an unencoded term representing a single output symbol. In LAST that that would be automatically encoded into one of the four letters (L, A, S, or T), in BLC into a bit (0, 1).

Every time we run AGGTCT on the output stream, we get its tail, which will cause the machine to request another symbol from the input. Note: we must use CT rather than T, because of the way the REPL works in λDNA; to get to the tail, we have to skip over the most recent result, which is the head that we extracted in the previous step/program.

We can then provide the next input symbol and continue on like this forever or until we provide GGT (nil) as the input, which will cause this particular program to terminate and close the output stream, producing GGT as its final output.

We could in principle write an encoder/decoder on top of this that would make it behave exactly like the BLC I/O protocol or the LAST I/O protocol or any other I/O protocol we can come up with.

Note

I am not 100 % sure that this I/O protocol is indeed solid, and I certainly haven’t formally proven it. I just came up with it, tested it on what came to mind and haven’t found a flaw (yet?). If there should be a problem with it, we can always revert to a BLC-style I/O protocol. An early implementation of λDNA which used a protocol like that is here.

Self-interpreter

A self-interpreter of language X is a program written in X which:

takes as an input an arbitrary program in X encoded in some way
and evaluates that encoded program, producing the same result as running the unencoded program directly.

In other words, using lambda calculus terminology, a term E is a self-interpreter, if for any term M, E(encode(M)) evaluates to the normal form of M if it exists (see [Lynn] for a more in-depth explanation).

There have been many attempts at finding the shortest possible self-interpreters for lambda calculus (see Research on self-interpreters). As far as I know λm.m(λx.x) is the shortest known self-interpreter, devised by [Brown and Palsberg].

Both this research and a close look at the definition leads to the following conclusion:

as we relax the restrictions on the encoding of the input, it seems that we can decrease the length of the self-interpreter almost arbitrarily

Well, what if we use identity as our encoding function? Then our self-interpreter itself becomes the identity function:

λm.m

Is that ridiculous? In my opinion it isn’t. This is in fact the shortest possible self-interpreter for lambda calculus (also the shortest possible closed lambda term).

‼️ NB This self-interpreter is also a self-reducer (and the only self-interpreter that is also a self-reducer?), as its output is effectively encoded with the same encoding as its input (identity).

We can make it actually work in λDNA, because we don’t use any encoding or even parsing for external input.

The self-interpreter, which is the identity function, is written in λDNA as GT. The following λDNA program applies external input to this self-interpreter:

ATGT

Running this via the REPL will request input from stdin, which can be an arbitrary lambda term written in λDNA, which will then be evaluated by the self-interpreter, producing the same output as running the lambda term directly.

Comparison with the Binary Lambda Calculus self-interpreter

Perhaps the most notable version of lambda calculus that has real implementations capable of processing input from the operating system and has a working self-interpreter that operates on that input is BLC.

BLC has a model of I/O where the input is a stream of bits which are lazily encoded into lambda calculus terms using Church pairs.

In Functional Bits: Lambda Calculus based Algorithmic Information Theory, John Tromp presents a self-interpreter for BLC for that model of I/O which works by unpacking these bits (decoding) and interpreting them according to the syntax of BLC. This self-interpreter can be theoretically squeezed into 206 bits, which, as conjectured by Tromp, is close to optimal size.

The limiting factor here being the input encoding.

This factor does not apply to λDNA, as it does not encode its input at all. Therefore its shortest possible self-interpreter that operates on external input is the same as the shortest possible self-interpreter for lambda calculus described here.

That self-interpreter can be theoretically squeezed into 4 bits, as we can implement a binary variant of the λDNA machine that operates directly on bits, using e.g. 00 instead of G, 01 instead of A, 10 instead of C, and 11 instead of T.

For that machine the self-interpreter would be:

Still, the plain λDNA version is only 12 bits longer, so perhaps it’s not worth the effort. ;D

Research on self-interpreters

See Lambda calculus with lambda calculus by Ben Lynn which points to:
- Efficient Self-Interpretation in Lambda Calculus by Torben Mogensen which describes the self-interpreter E = Y(\e m.m (\x.x) (\m n.(e m)(e n)) (\m v.e (m v)))
- Linear Time Self-Interpretation of the Pure Lambda Calculus by Torben Mogensen which describes the self-interpreter E=\q.q(\x.x)(\x.x)
- Breaking Through the Normalization Barrier: A Self-Interpreter for F-omega by Matt Brown and Jens Palsberg which describes the self-interpreter E=\q.q(\x.x)

How it works?

The crucial thing that allowed to eliminate the need for preprocessing or parsing the code is changing the syntax of applications. In other lambda calculus flavors, applications are written by first specifying the operator (the function being applied) and then specifying the operand (the argument that the function is being applied to). For example, to apply the function f to the argument x, we’d write:

f(x)          in conventional mathematical notation
(f x)         in conventional lambda calculus notation
f x           in conventional lambda calculus notation when order is clear from precedence rules
01 10 110     in Binary Lambda Calculus (BLC), assuming f is under variable 10 and x is under 110
A T ST        in LAST, assuming f is under variable T and x is under ST
𝛂 τ στ        in LAST/BLC, using Greek letters

Note: in BLC and LAST the spaces are not part of the syntax, just added here for clarity.

When evaluating applications in lambda calculus, we first process the argument. This means that to evaluate code written like this we need to first parse it, because we have to know the extent of the function and the argument in the source code beforehand.

However, if we swap f and x, the order in the source code will correspond to the order in which an evaluator processes the code.

Since this was true for abstractions and variable references to begin with, with this change and a slight modification to the evaluator, we can eliminate parsing altogether.

This is exactly what was done in λDNA. Here we apply f to x like so:

A CT T        assuming f is under variable T and x is under CT
𝛂 στ τ        the same using Greek letters

With this change the language becomes at the same time Forth-like (concatenative, stack-oriented – we provide arguments before the operation) and Brainfuck-like (naively-implemented) – we execute the instructions in sequence without parsing. Still, at the same time it remains fundamentally a simple variant of lambda calculus, a functional language.

This in itself is pretty remarkable – how multiple paradigms seem to collapse into one just by changing the syntax.

Speaking of Brainfuck. The slight modification to the evaluator I mentioned is the introduction of a counter which switches the machine between two modes.

If it is > 0, then we are in “stack mode” or “argument mode” or, as I’d like to think of it “transcription mode”, where we don’t execute the instructions, but merely manipulate the counter. On every A, we increment it, and on every T, we decrement it – similarly to what we’d do in a naive Brainfuck interpreter on [ and ]. When we get to 0, we push the fragment of code we processed in “stack mode” onto the argument stack.

When the counter is = 0, we are in “execution mode” which pretty much works like a good old Krivine/LAST machine, except that when we find an A, we increment the counter (entering into “stack mode”) and remember the position in code, so we can put it onto the arguments stack when we leave the “stack mode”.

Yes, this whole thing is in principle slower than parsing beforehand, but speed is not the major factor we’re trying to optimize here. Also for small programs (which are mostly what we’re dealing here) it is not necessarily slower, because… there is no parsing.

What would be nice to implement

In the C version: garbage collection.
Binary λDNA that operates on actual bits, which uses a mapping like G->00, A->01, C->10, T->11 and is able to run the 4-bit self-interpreter directly – would make for an impressive demo.
A hand-optimized version of λDNA written in assemby that makes SectorLambda seem large in comparison – 333 bytes should be easily doable. I suspect we can go even lower than that and fit the REPL in half a boot sector! Imagine: HalfSectorLambda. This is a challenge (#HalfSectorLambdaChallenge :D) for somebody who has more fun playing with that than I do. :D I’m sure jart could wizard it up in a jiffy.

Why is it cool?

Pretty sure we are at some kind of limit of simplicity here.

By simplifying we can find the essence. In this case, going from lambda calculus, through BLC, to LAST, to λDNA, we find that the following things are not essential for computation:

variable names; we don’t need an infinite supply of variable names to refer to; we don’t need to know what is a variable name, where does it come from and how it works; we don’t need alpha-renaming
numerical indices; we don’t even need an infinite supply of natural numbers to refer to variables; we don’t need to know what is a number, where it comes from and how it works; this is great: since we use our calculus to describe the foundations of computation, including natural numbers (see Church numerals), it seems like cheating to use them in the definition of the calculus
operator precedence and parentheses; we don’t need to overcomplicate our syntax with additional rules of operator precedence and parentheses, which we have to then additionally explain and confuse students with
“nonlinear” order of operations; if we change the syntax of applications the way λDNA does, we can just execute our 4 fundamental operations one after the other
thus we don’t need any parsing or preprocessing before execution
thus we can have a machine which can interact with the outside world purely by lambda terms encoded in its native machine code – there is no need for it to encode input or decode output

Here are some more incredible things, distilled from the descriptions above:

the fact that just by changing the syntax, the line between paradigms gets blurred,
really, a lot of things seem to get blurred, align or otherwise reduce to each other,
the correspondence (however acctidental it may be) to biology,
that removing encoding and decoding also enables to run the shortest possible self-interpreter which is also a self-reducer.

It is very possible that some of this is circular/self-referential, but perhaps that is because this is a fundamental property of reality. Certainly, discovering all this feels like learning something profound. The insights gained in the process seem to expand understanding of the fundamentals of computation – and this understanding can be generally applicable.

How did it originate?

Not long after reading the excellent paper (a swan song?) Replication is Recursion; or, Lambda: the Biological Imperative by Scott Federhen, I started thinking about the correspondence between lambda calculus/LAST and DNA and imagining how DNA is likely to work on the physical level. I figured it cannot work by any sort of addressing mechanism, any sort of jumping around between addresses. It’s unlikely that any parsing happens before interpreting the DNA code. From there I thought about how LAST could be brought closer to that model, and εὕρηκα!

Running

There are a few options to run λDNA. Choose one that suits you best.

Play with λDNA right away in your browser

The fastest way is to run the REPL in your browser here.

Alternatively, you can serve it from your local machine.

Run λDNA on your machine

First clone this repo and navigate to it.

C implementation

To compile and run the REPL:

gcc repl.c && ./a.out

Node.js implementation

To run the REPL:

node repl.node.js

Note: Ctrl+D exits.

Browser implementation

For an up-to-date local in-browser experience, start a http server in the repo directory, e.g. by running:

python3 -m http.server

Then go to the URL that the server is serving on (something like http://0.0.0.0:8000/) and enjoy.

The most minimal, the shortest, etc.

The term “possibly the most minimal complete programming language” is obviously controversial.

I use it here as an attention-grabber (you can call it clickbait), because it concisely describes the extremely minimal nature of λDNA. Let me try and justify using that term by going into a little more details.

To actually determine “the most minimal complete programming language” we would have to strictly define what the terms “minimal” and “complete” mean, which I won’t attempt here.

Still, λDNA is derived from programming languages which are notable for their minimalism – lambda calculus itself is sometimes cited as the most minimal programming language.

λDNA can be considered an extremely minimal variant of lambda calculus that removes certain inessential elements from existing extremely minimal variants (specifically rooted in the Binary Lambda Calculus (BLC) by John Tromp), thus making it even more minimal.

Notably, in λDNA there is no parsing and no encoding/decoding of I/O – instead these things are handled in a much simpler way.

If there is no error (as there very well may be) or loss of expressive power, this should be enough to conceptually justify why λDNA is in principle more minimal than BLC.

As for minimal implementations, thus far (December 2023) only this one exists, and it isn’t written to compete with the original implementation of BLC or with the notable SectorLambda implementation of BLC by Justine Tunney.

Conceptually though, based on the above, a version of λDNA that improves upon SectorLambda on its metrics of minimalism is only a matter of putting in the work.

At this point I am satisfied with what I achieved here and I will leave it for others to prove or disprove whether λDNA is “possibly the most minimal complete programming language”. Possibly it isn’t. Even if it is, it certainly won’t be once the right kind of programmer attempts to beat it. This programmer may be you! ;)

Credits

λDNA was created by Darius J Chuck, derived from LAST, which was itself inspired by Justine Tunney’s SectorLambda implementation of John Tromp’s Binary Lambda Calculus (BLC), a minimal version of Alonzo Church’s lambda calculus which incorporates N.G. de Bruijn’s namefree lambda expressions.

License

MIT

Comments

Comments welcome on Mastodon.

Feel free to post this on social media. You can email the link to your post to tao at xtao.org to let me know, so I can participate in the discussion.

Support TAO

If you like, you can support my work with a small donation.

Thank you!

— Darius J Chuck