One-escape TAO

Darius J Chuck

2021-07-16

Previously we’ve defined TAO in one line of abstract grammar. Today we’ll instantiate this grammar to a variant of TAO which only ever requires a single character to be escaped, reducing escape friction to minimum:

TAO = *("`[" TAO "`]" / 1*symbol)
symbol = "`" any / (any - "`")

The Data TAO example encoded with this variant would look like this:

first name `[John`]
last name `[Smith`]
is alive `[true`]
age `[27`]
address `[
  street address `[21 2nd Street`]
  city `[New York`]
  state `[NY`]
  postal code `[10021-3100`]
`]
phone numbers `[
  `[
    type `[home`]
    number `[212 555-1234`]
  `]
  `[
    type `[office`]
    number `[646 555-4567`]
  `]
`]
children `[`]
spouse `[`]

The clear disadvantage is that because digraphs are used for bracketing, compactness is reduced. In exchange for that though the parser becomes even more regular and arbitrary strings of data can be inserted as leaves into TAO trees with escaping realized with an equivalent of a simple

string.replaceAll('`', '``') 

Since we selected ` as the escape character, escaping is rarely needed in practice, as it is on average the second least often used character. This is a major reason why it is part of the canonical grammar.

To go further and practically eliminate escape friction, we could use the ASCII Escape character (code 27 = 0x1B) – this might be good when using TAO for serialization/deserialization when we are certain that this character can’t occur in our data. In such case escaping is not necessary at all, so we gain extra speed. The cost is dramatically reduced portability, because of non-printability and copy-pasteability. Translation to a portable form is though trivial.

Extra-compact binary variants of TAO are not far-off from here.