2021-12-06
Newick format was developed in 1986 as a minimal representation for phylogenetic trees for the PHYLogeny Inference Package.
More generally it can represent different kinds of tree-structures.
Following up the previous article here I show how Jevko could be used as an even more general and minimal alternative for the task.
The Wikipedia example tree:
which is represented in the Newick format in several ways:
(,,(,));
(A,B,(C,D));
(A,B,(C,D)E)F;
(:0.1,:0.2,(:0.3,:0.4):0.5);
(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;
(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);
(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;
((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;
can be represened in Jevko as follows:
[[][][[][]]]
[[A][B][[C][D]]]
[[A][B][[C][D]E]F]
[0.1[]0.2[]0.5[0.3[]0.4[]]]
0.0[0.1[]0.2[]0.5[0.3[]0.4[]]]
[0.1[A]0.2[B]0.5[0.3[C]0.4[D]]]
[0.1[A]0.2[B]0.5[0.3[C]0.4[D]E]F]
[0.1[0.2[B]0.5[0.3[C]0.4[D]E]F]A]
Let’s call this format Phylo-Jevko.
Compared to the Newick format’s grammar:
Tree → Subtree ";"
Subtree → Leaf | Internal
Leaf → Name
Internal → "(" BranchSet ")" Name
BranchSet → Branch | Branch "," BranchSet
Branch → Subtree Length
Name → empty | string
Length → empty | ":" number
Phylo-Jevko is simpler:
Tree = *Branch Name
Branch = Length "[" Tree "]"
Length = number / ""
Name = string / ""
In Phylo-Jevko:
:
, as it is in the Newick
format.Both Name and Length may be surrounded with whitespace. Whitespace is also allowed within Name, no quoting needed.
A simple escape mechanism (as in the definition of Jevko) could be introduced to allow Names with brackets.
Thanks to these simplifications, no extra separators, such as
:
, ;
, ,
, or '
are
needed – only brackets.
Comments could be implemented as branches prefixed with
#
instead of Length:
Comment = "#" "[" Name "]"
Nested comments could be allowed like so:
Comment = "#" "[" NestedComment "]"
NestedComment = *("[" NestedComment "]" / Name)
Extremely minimal formats for encoding all kinds of tree structures can be built based on Jevko, offering the most bang for the buck in terms of complexity. Less accidental complexity means less trouble and better efficiency. This is a good direction to go in when building a new system.
For existing systems, whether the extra efficiency is worth the work necessary to simplify – that’s a question that is best answered on an individual basis.
Comments welcome on Mastodon.
If you like, you can support my work with a small donation.
Thank you!