Tim Bray beat me to writing about this with some very similar thoughts to mine: Fixing JSON. I especially like his idea about native times, along with prefixing them with
@ as a parser hint. I'd like to propose some tweaks however, based on my experience of writing JSON parsers (twice).
Commas and colons
Not only you don't need either of them, they actually make parsing more complicated. When you're inside an array or an object, you already know when to expect a next value or a key, but you have to diligently check for commas and colons with the sole reason of signaling errors if you don't find them where expected. Add to that edge cases with trailing commas and empty containers, and you get a really complicated state machine with no real purpose.
My proposal is simpler than Tim's, though: no need to actually remove them, just equate them to whitespace. As in:
whitespace = ['\t', '\n', '\r', ' ', ',', ':']. That's it.
It removes all the complications from parsing, and humans can write those for aesthetics. And by the way, this approach works fine in Clojure for vectors and maps.
JSON is defined as a UTF-8 encoded stream of bytes. This is already enough for encoding the entire Unicode. Yet, on top of that there's another encoding scheme using
\uXXXX. One could probably speculate it was added to enable authoring tools that can only operate in the ASCII subset of UTF-8, but thankfully we've moved away from those dark ages already.
Handling those is a pain in the ass for a parser, especially a streaming one. Dealing with single-letter escapes like
\n is easy, but with
\uXXXX you need an extra buffer, you need to check for edge cases with not-yet-enough characters, and you're probably going to need a whole separate class of errors for those. Gah…
Just do away with the thing.