The world *desperately* needs a replacement for YAML. TOML is fine for configura...

diggan · on April 3, 2021

Seems you're missing my personal favorite, extensible data notation - EDN (https://github.com/edn-format/edn). Probably I'm a bit biased coming from Clojure as it's widely used there but haven't really found a format that comes close to EDN when it comes to succinctness and features.

Some of the neat features: Custom literals / tagged elements that can have their support added for them on runtime/compile time (dates can be represented, parsed and turned into proper dates in your language). Also being able to namespace data inside of it makes things a bit easier to manage without having to result to nesting or other hacks. Very human friendly, plus machine friendly.

Biggest drawback so far seems to be performance of parsing, although I'm not sure if that's actually about the format itself, or about the small adoption of the format and therefore not many parsers focusing on speed has been written.

rubyn00bie · on April 3, 2021

Your list is like a graveyard of my dreams and hopes. Anything that doesn't validate the format of the underlying data is pretty much dead to me...

The problem with most of these is they're useless to describe the data. Honestly, it is completely not useful to have the following to describe data:

email => string

name => string

dob => string

IMHO, it is akin to having a dictionary (like Oxford English) read like:

email - noun

name - noun

birthday - noun

It says next to nothing except, yes, they are nouns. All too often I waste time fighting nils and bullshit in fields or duplicating validation logic all over the place.

"Oh wow, this field... is a string..? That's great... smiles gently except... THERE SHOULD NOT BE EMOJI IN MY FUCKING UUID, SCHEMA-CHUD. GET THE FUCK OFF MY LAWN!"

sangnoir · on April 3, 2021

It sounds to me like XML with a DTD & XSD would solve your problem. XML no longer fashionable, but its validation is Turing-complete

Nitramp · on April 3, 2021

My experience is that validation quickly becomes surprisingly complex, to the point of being infeasible to express in a message format.

Not only are the constraints very hard to express (remember that one 2000 char regexp that really validates email addresses?), they are also contextual: the correct validation in an Android client is not the same as on the server side. Eg you might want to check uniqueness or foreign key constraints that you cannot check on the client. Sometimes you want to store and transmit invalid messages (eg partially completed user input). And then you have evolving validation requirements: what do you do with the messages from three years ago that don't have field X yet?

Unfortunately I don't think you can express what you need in a declarative format. Even minimal features such as regexp validation or enums have pitfalls.

I think it's better to bite the bullet and implement the contextually required validation on each system boundary, for any message crossing boundaries.

scythe · on April 3, 2021

If you want automatic built-in string validation, one option that seems particularly interesting is to use a variant of Lua patterns, which are weaker and easier to understand than regular expressions, but still provide a significant degree of "sanity" for something like an email. The original version works on bytes and not runes, but you could simply write a parser that works on runes instead, and the pattern-matching code is just 400 old and battle-tested lines of C89. You might want to add one extension: allow for escape sequences to be treated as a single character (hence included in repetition operators and adding the capability to match quoted strings); with this extension, I think you could implement full email address validation:

https://i.stack.imgur.com/YI6KR.png

Lua patterns have also shown up in other places, such as BSD's httpd, and an implementation for Rust:

https://www.gsp.com/cgi-bin/man.cgi?section=7&topic=PATTERNS

https://github.com/stevedonovan/lua-patterns

http://lua-users.org/wiki/PatternsTutorial

neop1x · on April 3, 2021

Amazon Ion [1] supports schema [2] and it all looks quite nice to me. Maybe it deserves wider adoption.

[1] https://amzn.github.io/ion-docs/ [2] https://amzn.github.io/ion-schema/

tormeh · on April 3, 2021

I agree with this, something RON/JSON-like with type annotations would be great:

    {
      "isTrue":false:Boolean,
      "id":"123e4567-e89b-12d3-a456-426614174000":UUID
    }

BlueTemplar · on April 4, 2021

Sounds like your issue is that UUID is NOT a string, but a 128-bit integer ?

geoduck14 · on April 3, 2021

>THERE SHOULD NOT BE EMOJI IN MY FUCKING UUID

thanks for the lolz

djedr · on April 3, 2021

Still early, but here's my baby I hope can improve things:

website with grammar spec: https://tree-annotation.org/

prototype of a JSON/YAML alternative for JS: https://github.com/tree-annotation/tao-data-js

same thing, even less finished for C#: https://github.com/tree-annotation/tao-data-csharp

working on it constantly, more to come soon

fmakunbound · on April 3, 2021

XML and XML Schema solved this more than 20 years ago. It had to be replaced with JSON by the web developers though, so they could just “eval() it” to get their data.

jdeisenberg · on April 3, 2021

XML with RelaxNG (https://relaxng.org/) would have made life so much better than using XML Schema, but, as they say, that ship has long since sailed.

servercobra · on April 3, 2021

All except the easily written by humans part. Which is kind of a key part.

MrPatan · on April 3, 2021

If all the smart people like you used XML, how come it was so painful to use and it died?

takeda · on April 3, 2021

Because it offered all these things parent responded, but that made it too complex. You either provide schema and get commodities of describing it or you don't.

I had a chance of using SOAP at one point. It was a F5 device and I used a python library. What I really liked is that when it connected to it it downloaded its schema, and then used that to generate an object. At that point you just communicated with device like you did with any object in Python.

We abandoned it for inferior technologies like REST and JSON, because they were harder to use from JS, as parent mentioned.

MrPatan · on April 3, 2021

Parent didn't say it was harder to use from JS. Parent said "It had to be replaced with JSON by the web developers though, so they could just “eval() it” to get their data."

First of all, I was there 20 years ago. I had to deal with XML, XSLT, one kind of Java XML parsers that didn't fully do what I needed, another kind of Java XML parsers that didn't fully do what I needed. And oh boy was it a pain. I just wanted to get a few properties of a bunch of entities in a bigger XML document, that's all. Big fail.

Second, JSON always had a parser in JS, so I don't know where that eval nonsense is coming from.

Third, JS actually had the best dev UX for XML of all languages 20 years ago. Maybe you know JavaScript from Node.js, but 20 years ago it used to run excusively in web browsers, which even then were pretty good at parsing XML documents. The browser of course had a JS DOM traversal API known to every single JS developer, and very soon (Although TBH I can't remember if before or after JSON) it also had xpath querying functions, all built in.

XML was so bad, that its replacement came from the language where it was actually easiest to use. think about that for a second.

So the answer to the question "Why was XML replaced?" is not "Because webdevs lol".

I suspect it was because it has both content and attributes, which all but guarantees it's impossible to create a bunch of simple, common data structures from it (like JSON does).

fmakunbound · on April 4, 2021

> Second, JSON always had a parser in JS, so I don't know where that eval nonsense is coming from.

Firstly, it sounds like XML ran over your dog or something. Sorry to hear about that. It wasn’t particularly hard to use at all, and if you’re dealing with the possibility of emojis in your JSON UUIDs in 2021, one might even say it’s easier to use.

If you’re referring to JSON.parse() in “had a parser” above, then you have a temporal problem. Regarding eval(), it’s suggested right in the original RFC for JSON. Check it out. Web developers at the time were following that advice.

BlueTemplar · on April 4, 2021

Another issue is that due to their age, a lot of XML tools ignore the existence of Unicode (or UTF-8).

dragonwriter · on April 3, 2021

> The world desperately needs a replacement for YAML.

The world desperately needs support for YAML 1.2, which solves the problems the article addresses fairly completely (largely in the “default” Core schema[0], but more completely with the support for schemas in general), plus a bunch of others, and has for more than a decade. But YAML 1.2 libraries aren’t available for most languages.

[0] not actually an official default, but reflects a cleanup of the YAML 1.1 behavior without optional types, so its defaultish. Back when it looked like YAML 1.3 might happen in some reasonably-near future, it was actually indicated by team members that the JSON Schema for YAML (not to be confused with the JSON Schema spec) would be the explicit default YAML Schema in 1.3, which has a lot to recommend it.

tormeh · on April 3, 2021

Nope nope nope. YAML is awful and needs to die. The more you look at it the worse it gets. The basic functionality is elegant (at least until you consider stuff like The Norway Problem), but the advanced parts of YAML are batshit insane.

dragonwriter · on April 3, 2021

“The Norway Problem" is a YAML 1.1 problem, of which there are many.

What advanced parts of YAML are you talking about that remain problems in YAML 1.2?

medstrom · on April 3, 2021

From the article:

> The most tragic aspect of this bug, howevere, is that it is intended behavior according to the YAML 2.0 specification.

dragonwriter · on April 3, 2021

The article is simply, factually wrong; there is no “YAML 2.0 specification” [0], and everything they point to is YAML 1.1, and addressed in YAML 1.2 (the most recent YAML spec, from 2009.)

[0] https://yaml.org/

svnpenn · on April 3, 2021

You seem pretty quick to disregard TOML. I switched all my JSON and YAML for TOML. Do you care to detail what is missing?

atombender · on April 3, 2021

TOML quickly breaks down with lots of nested arrays of objects. For example:

    a:
      b:
      - c: 1
      - d:
        - e: 2
        - f:
            g: 3

Turns into this, which is unreadable:

    [[a.b]]
    c = 1

    [[a.b]]
    [[a.b.d]]
    e = 2

    [[a.b.d]]
    [a.b.d.f]
    g = 3

TOML also has a few restrictions, such as not supporting mixed-type arrays like [1, "hello", true], or arrays at the root of the data. JSON can represent any TOML value (as far as I know), but TOML cannot represent any JSON value.

At my company we use YAML a lot for table-driven tests (e.g. [1]), and this not only means lots of nested arrays, but also having to represent pure data (i.e. the expected output of a test), which requires a format that supports encoding arbitrary "pure" data structures of arrays, numbers, strings, booleans, and objects.

[1] https://github.com/sanity-io/groq-test-suite/

svnpenn · on April 3, 2021

Looks fine to me:

    [[a.b]]
    c = 1
    d = [
       { e = 2 },
       { f = { g = 3 } }
    ]

timClicks · on April 3, 2021

An improvement, but the original YAML is still significantly better, in my opinion.

Arnavion · on April 3, 2021

Also many (most? all?) serializers don't let you control which fields are serialized inline vs not. So if you have a program that generates configuration, you're going to end up with the original unreadable form anyway.

kji · on April 3, 2021

S-expressions are super easy to parse and are fairly easy for humans to read. See e.g. using s-expressions in OCaml: https://dev.realworldocaml.org/data-serialization.html

Nihilartikel · on April 3, 2021

Apropos of this, in Clojure-land the idiomatic serialization is, EDN [1], which is pretty ergonomic to work with IMO, since in most cases it is the same as a data-literal in Clojure.

My feeling is that :keywords reduce the need and temptation to conflate strings and boolean/enumerations that occurs when there's no clear way to convey or distinguish between a string of data and a unique named 'symbol'. I miss them when I'm in Pythonland.

[1] https: https://www.compoundtheory.com/clojure-edn-walkthrough/

gnud · on April 3, 2021

S-expressions inherits all trouble with data types from json (dates, times, booleans, integer size, number vs numeric string).

You get neat ways of nesting data, but that is not enough for a robust and mistake-resilient configuration language.

The problem isn't parsing in itself. The problem is having clear sematics, without devolving into full SGML DTDs (or worse still, XML schemas).

diggan · on April 3, 2021

> S-expressions inherits all trouble with data types from json (dates, times, booleans, integer size, number vs numeric string).

Hm, not sure that's true, S-expressions would only define the "shape" of how you're defining something, not the semantics of how you're defining something. EDN https://github.com/edn-format/edn for all purposes is S-expressions and have support for custom literals and more, to avoid "the trouble with data types from JSON"

gnud · on April 4, 2021

Yes, EDN is S-expressions plus a bunch of semantic rules. Parsing EDN is quite a bit more complex than just parsing S-expressions, just because you need to support a bunch of built in types, as well as arbitrary exensions through 'tags'.

The tag system is quite brilliant though.

dqpb · on April 3, 2021

I’ve used most of the technologies you listed. Cue is the best, and the only one with strong theoretical foundations. I’ve been using it for some time now and won’t go back to the others.

ng12 · on April 3, 2021

Jsonnet hasn't taken off because it's turing complete. It's a really great language for generating JSON but not a replacement for JSON.

hansvm · on April 3, 2021

> The world desperately needs a replacement for YAML.

For situations like TFA you really want a configuration language that behaves exactly like you think it will, and since you don't have to interop with other organizations you don't really need a global standard.

Moreover, broadly used config languages can be somewhat counterproductive to that goal. Take JSON as an example; idiomatic JSON serdes in multiple programming languages has discrepancies in minint, maxfloat, datetime, timezone, round-tripping, max depth, and all kinds of other nuanced issues. Existing tooling is nice when it does what you expect, but for a no-frills, no-surprises configuration language I would almost always just prefer to use the programming language itself or otherwise write a parser if that doesn't suffice (e.g., in multilingual projects).

Mildly off-topic: The problem here, more or less, was that the configuration change didn't have the desired effect on an in-memory representation of that configuration. We can mitigate that at the language level, but as a sanity check it's also a good idea to just diff the in-memory objects and make sure the change looks kind of like what you'd expect.

atombender · on April 3, 2021

You don't need wide adoption for internal projects in an organization, but you do want great toolchain support.

For example, the fact that NestedText is a Python library means a Python team could use it, but it's a poor fit for an organization whose other teams use Go and JavaScript/TypeScript.

We use YAML for much more than configuration, by the way. I feel like YAML hits a nice sweet spot where it's usable for almost everything.

BlueTemplar · on April 4, 2021

> and since you don't have to interop with other organizations

Until you have to, and all hell breaks loose ?

Now, the example of codepages maybe isn't really appropriate to companies, but is still a good enough metaphor ?

ak217 · on April 3, 2021

I don't think YAML is going anywhere, largely because it was the first format to prioritize readability and conciseness, and has used that advantage to achieve critical mass.

It's far more productive to push for incremental changes to the YAML spec (or even a fork of it) to make it more sane and better defined. Things like a StrictYAML subset mode for parsers in other popular languages.

dragonwriter · on April 3, 2021

> It's far more productive to push for incremental changes to the YAML spec

The problems this article raises and strictyaml purports to address were addressed in YAML 1.2, already supported in python via ruamel.yaml; YAML 1.2 addresses much of this in the Core schema which is the closest successor to the default behavior of earlier spec versions, and does so more completely in the support for schemas more generally, which define both the supported “built-in" tags (roughly, types) and how they are matched from the low-level representation which consists only of strings, sequences, and maps (which, incidentally, are the only three tags of the “Failsafe” schema; there’s also a “JSON” Schema between Failsafe and Core, which has tags corresponding to the types supported by JSON.

IshKebab · on April 3, 2021

JSON5 is the best option currently. A fair number of tools in the JS ecosystem support it.

atombender · on April 3, 2021

JSON5 is better than JSON on my points, but it has downsides compared to YAML. For example, YAML is very good at multiline strings that don't require any sort of quoting, and knows to remove preceding indentation:

  foo: |
    "This is a string that goes across
    multiple lines," he wrote.

In JSON5, you'd have to write:

  {
    foo: \"This is a string that goes across \
  multiple lines,\" he wrote."
  }

This sort of ergonomic approach is why YAML is so well-liked, I think. (Granted, YAML's use of obscure Perl-like sigils to indicate whitespace mode is annoying, but it does cover a lot of situations.)

YAML is also great at arrays, mimicking how you'd write a list in plaintext:

  foo:
  - "hello"
  - 42
  - true

geraldbauer · on April 3, 2021

You might look at JSON Next variants (if you remember - "classic" JSON is a subset of YAML), see https://github.com/json-next/awesome-json-next

My own little JSON Next entry / format is called JSON 1.1 or JSONX, that is, JSON with eXtensions, see https://json-next.github.io

orthoxerox · on April 3, 2021

The list is missing http://www.relaxedjson.org/

Also, there's no explanation what <..-..> and <..+..> do.

tormeh · on April 3, 2021

Also RON: https://github.com/ron-rs/ron

A bit like JSON5, but I believe even more advanced.

imtringued · on April 4, 2021

I will keep using YAML because I don't want to learn the pitfalls of your alternatives. With YAML everyone is complaining about the pitfalls, and therefore everyone is aware of them. A random replacement may not have this particular problem, but it may have other problems that remain unknown.

debug-desperado · on April 3, 2021

Thanks for this list, I’ve never heard of Ion. I’ll consider it for config and even replacing Avro & Protobuf in future projects.

BlueTemplar · on April 4, 2021

Besides this issue, what's wrong with YAML ?