2021-11-27
In this article I propose a minimal syntax definition for representing rose trees.
This is a step on a path to explaining how a minimal syntax like Jevko has direct correspondence to familiar tree structures and thus can be used to represent them with minimal accidental complexity.
A rose (or multiway) tree is a tree data structure where each node contains a label/value alongside a number of branches/subtrees. The number of branches is variable and unbounded.
A rose tree can be defined as follows in Haskell:
data Tree a = Node {
label :: a,
branches :: [Tree a]
}
Haskell’s lazy semantics are irrelevant here, so the following TypeScript definition is analogous for our purposes:
type Tree<a> = {
: a,
label: Tree<a>[],
branches }
The above definitions have the following minimal syntactical analog (in ABNF):
= label branches
Tree = *(open Tree close) branches
where we might define:
= "["
open = "]"
close = *character label
where character
is any Unicode character excluding the
square brackets.
Note that if we take the original definition:
= label branches
Tree = *(open Tree close) branches
then drop label
:
= branches
Tree = *(open Tree close) branches
then inline branches
:
= *(open Tree close) Tree
then inline both open
and close
:
= *("[" Tree "]") Tree
we are left with a definition that describes the Dyck language.
We can then rewirte the original definition as follows:
= label *("[" Tree "]")
Tree = *character label
to see that it is the same as a Dyck language extended with a (possibly empty) label that goes with each (possibly empty) sequence of bracket-delimited branches.
The following example tree (source):
1
|
+- 2
| |
| +- 4
| |
| `- 5
|
`- 3
|
+- 6
|
`- 7
can be represented with our syntax as follows:
1[2[4][5]][3[6][7]]
Our label
definition allows using almost
arbitrary strings as labels.
We might complete it with a simple escape mechanism (as in the definition of Jevko data) to effectively allow arbitrary Unicode strings.
To represent a value of type a
we would serialize it to
such a string.
Alternatively, if we weren’t building up to a definition of a generic
syntax, we could restrict the label
definition to allow
only specific kinds of values.