2021-07-16
Previously we’ve defined TAO in one line of abstract grammar. Today we’ll instantiate this grammar to a variant of TAO which only ever requires a single character to be escaped, reducing escape friction to minimum:
("`[" TAO "`]" / 1*symbol)
TAO = *"`" any / (any - "`") symbol =
The Data TAO example encoded with this variant would look like this:
first name `[John`]
last name `[Smith`]
is alive `[true`]
age `[27`]
address `[
street address `[21 2nd Street`]
city `[New York`]
state `[NY`]
postal code `[10021-3100`]
`]
phone numbers `[
`[
type `[home`]
number `[212 555-1234`]
`]
`[
type `[office`]
number `[646 555-4567`]
`]
`]
children `[`]
spouse `[`]
The clear disadvantage is that because digraphs are used for bracketing, compactness is reduced. In exchange for that though the parser becomes even more regular and arbitrary strings of data can be inserted as leaves into TAO trees with escaping realized with an equivalent of a simple
.replaceAll('`', '``') string
Since we selected `
as the escape character, escaping is
rarely needed in practice, as it is on average the
second least often used character. This is a major reason why it is
part of the canonical
grammar.
To go further and practically eliminate escape friction, we could use
the ASCII Escape character ␛
(code 27
=
0x1B
) – this might be good when using TAO for
serialization/deserialization when we are certain that this character
can’t occur in our data. In such case escaping is not necessary at all,
so we gain extra speed. The cost is dramatically reduced portability,
because of non-printability and copy-pasteability. Translation to a
portable form is though trivial.
Extra-compact binary variants of TAO are not far-off from here.