2021-12-18
JSON was made to be a strict syntactic subset of JavaScript.
JSON is however not a semantic subset of JavaScript.
This article presents the semantics of JSON, as defined by official standards, and discusses the major differences between JavaScript and JSON and their implications for interoperability.
The JSON.org website links to ECMA-404, the official standard for JSON. ECMA-404 in turn contains a reference to STD 90/RFC 8259, a complementary standard for JSON.
All three sources agree on the syntax, defining it in slightly different, but equivalent ways.
When it comes to semantics though, there is no agreement.
ECMA-404 notes:
RFC 8259, also defines various semantic restrictions on the use of the JSON syntax. Those restrictions are not normative for this specification.
RFC 8259 counters with:
Note, however, that ECMA-404 allows several practices that this specification recommends avoiding in the interests of maximal interoperability.
ECMA-404 opens by pronouncing the following about the role of JSON in data interchange:
[JSON] does not attempt to impose ECMAScript’s internal data representations on other programming languages. Instead, it shares a small subset of ECMAScript’s syntax with all other programming languages. The JSON syntax is not a specification of a complete data interchange. Meaningful data interchange requires agreement between a producer and consumer on the semantics attached to a particular use of the JSON syntax. What JSON does provide is the syntactic framework to which such semantics can be attached.
This point is reiterated for a second time:
The goal of this specification is only to define the syntax of valid JSON texts. Its intent is not to provide any semantics or interpretation of text conforming to that syntax. It also intentionally does not define how a valid JSON text might be internalized into the data structures of a programming language. There are many possible semantics that could be applied to the JSON syntax and many ways that a JSON text can be processed or mapped by a programming language. Meaningful interchange of information using JSON requires agreement among the involved parties on the specific semantics to be applied.
When describing specific parts of JSON, this is either implied by leaving any mention of semantics out or explicitly restated again.
According to ECMA-404 then, JSON is only a syntax with no inherent semantics. These should be agreed upon by parties which are using the syntax to exchange data.
RFC 8259 does not outright disagree, but contains no similar concession.
Its abstract says this:
This document […] offers experience-based interoperability guidance.
Indeed, such guidance is offered when describing specific parts of JSON. This includes semantic suggestions.
These will now be discussed in detail.
ECMA-404 says this about the semantics of objects:
The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.
whereas RFC 8259 says this:
An object is an unordered collection of zero or more name/value pairs
The names within an object SHOULD be unique.
An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings.
JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.
N.b. JSON.org also imposes the semantic of unorderedness on the definition of objects:
An object is an unordered set of name/value pairs.
Nothing is said here about uniqueness of names though.
Meanwhile ECMA-262, the official JavaScript standard, defines an ordering for object properties. Duplicate names are also allowed – a value of a duplicate key overwrites the lexically preceding value for the same key. The same applies to JSON objects parsed from JavaScript:
In the case where there are duplicate name Strings within an object, lexically preceding values for the same key shall be overwritten.
The same is not necessarily true for other JSON parsers, leading to interoperability and security issues.
Additionally, there exists an edge case which makes the same object have different meaning or be outright invalid when evaluated as part of JavaScript source versus as JSON text:
However, because B.3.1 applies when evaluating ECMAScript source text and does not apply during JSON.parse, the same source text can produce different results when evaluated as a PrimaryExpression rather than as JSON. Furthermore, the Early Error for duplicate “proto” properties in object literals, which likewise does not apply during JSON.parse, means that not all texts accepted by JSON.parse are valid as a PrimaryExpression, despite matching the grammar.
More details can be found here.
ECMA-262 defines the Number type as follows:
The Number type has exactly 18,437,736,874,454,810,627 (that is, 264 - 253 + 3) values, representing the double-precision 64-bit format IEEE 754-2019 values as specified in the IEEE Standard for Binary Floating-Point Arithmetic, except that the 9,007,199,254,740,990 (that is, 253 - 2) distinct “Not-a-Number” values of the IEEE Standard are represented in ECMAScript as a single special NaN value. (Note that the NaN value is produced by the program expression NaN.)
There are two other special values, called positive Infinity and negative Infinity. For brevity, these values are also referred to for expository purposes by the symbols +∞𝔽 and -∞𝔽, respectively. (Note that these two infinite Number values are produced by the program expressions +Infinity (or simply Infinity) and -Infinity.)
The other 18,437,736,874,454,810,624 (that is, 264 - 253) values are called the finite numbers. Half of these are positive numbers and half are negative numbers; for every finite positive Number value there is a corresponding negative value having the same magnitude.
i.e. in JavaScript the Number type is defined according to the IEEE754 standard: numbers are 64-bit floating point, with limited precision. Additionally there are three special numeric values: NaN, +Infinity, and -Infinity, also in accordance with the IEEE754 standard.
In contrast, ECMA-404 says this about the semantics of numbers in JSON:
JSON is agnostic about the semantics of numbers.
ECMA-404 also notes:
Numeric values that cannot be represented as sequences of digits (such as Infinity and NaN) are not permitted.
This makes JSON numbers incompatible with JavaScript numbers as well as with the IEEE754 standard in general.
In light of this the following guidance from RFC 8259 might seem confusing:
This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.
Note that when such software is used, numbers that are integers and are in the range [-(2**53)+1, (2**53)-1] are interoperable in the sense that implementations will agree exactly on their numeric values.
If not read carefully, one might take it to mean that adopting IEEE754 semantics offers good interoperability.
But in fact the recommendation is to use numbers in JSON within precision or range defined by IEEE754.
Because there is no way to represent NaN or Infinity in JSON, the set of values representable in JSON intersects only a strict subset of the set representable in IEEE754.
This is implicitly acknowledged by RFC 8259. Similarly to ECMA-404, it notes:
Numeric values that cannot be represented in the grammar below (such as Infinity and NaN) are not permitted.
This state of the matters forces workarounds, such as serializing all
special values to null
, as described in ECMA-262:
Finite numbers are stringified as if by calling ToString(number). NaN and Infinity regardless of sign are represented as the String “null”.
Another side of the problem is that valid JSON numbers that fall outside of the IEEE754 range are truncated when they are deserialized in JavaScript.
The same is true for parsers in other languages which try to represent JSON numbers as IEEE754.
These interoperability issues lead to serious bugs in practice (e.g. [1], [2], [3], [4]).
Even though JSON is a syntactic subset of JavaScript, its semantics are different. The standards which define JSON do not align with each other or the JavaScript standard on the meaning of objects and numbers in JSON. Furthermore, JSON numbers are incompatible with the IEEE754 standard used for numbers in JavaScript and other programming languages. This makes full interoperability via this standard impossible in practice, leading to bugs and workarounds.
This is certainly worth having in mind when working with JSON in any programming language.
Whether this can be rectified and how is a subject for another article.
If you like, you can support my work with a small donation.
Thank you!