Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s a feature, not a bug. Assert assumptions and crash on bad one.

Crashing is not an outage. It’s a restart and a stack trace for you to fix.





Even if you want it to crash, you almost never unwrap. At the very least you would use .expect() so you get a reasonable error message -- or even better you handle the potential error.

> Crashing is not an outage.

Are you in the right thread?


Did you skip all the other context about the other systems that failed?

The problem was a query producing incorrect data. The crash helped them find it.

What do you think happens when a program crashes?


> The crash helped them find it.

Barely given the initial impression it was a DDoS.

Although you can argue that's bad observability.


The type system is for asserting assumptions like "this cannot fail". You don't crash at all.

Most properties of programs cannot be validated at compile time and must be checked at runtime.

But you’re still missing it. Crashing is not bad. It’s good. It’s how you leverage OS level security and reliability.


This wasn't a runtime property that could not be validated at compile time. And you don't need to fall back on "OS level security and reliability" when your type system is enforcing an application-level invariants.

In fact I'd argue that crashing is bad. It means you failed to properly enumerate and express your invariants, hit an unanticipated state, and thus had to fail in a way that requires you to give up and fall back on the OS to clean up your process state.

[edit]

Sigh, HN and its "you're posting too much". Here's my reply:

> Why? The end user result is a safe restart and the developer fixes the error.

Look at the thread your commenting on. The end result was a massive world-wide outage.

> That’s what it’s there for. Why is it bad to use its reliable error detection and recovery mechanism?

Because you don't have to crash at all.

> We don’t want to enumerate all possible paths. We want to limit them.

That's the exact same thing. Anything not "limited" is a possible path.

> If my program requires a config file to run, crash as soon as it can’t load the config file. There is nothing useful I can do (assuming that’s true).

Of course there's something useful you can do. In this particular case, the useful thing to do would have been to fall back on the previous valid configuration. And if that failed, the useful thing to do would be to log an informative, useful error so that nobody has to spend four hours during a worldwide outage to figure out what was going wrong.


> I'd argue that crashing is bad.

Why? The end user result is a safe restart and the developer fixes the error.

> fall back on the OS to clean up your process state.

That’s what it’s there for. Why is it bad to use its reliable error detection and recovery mechanism?

> It means you failed to properly enumerate and express your invariants

We don’t want to enumerate all possible paths. We want to prune them.

If my program requires auth info to run, crash as soon as it can’t load it. There is nothing useful I can do (assuming that’s true).


> The end result was a massive world-wide outage.

The world wide outage was actually caused by deploying several incorrect programs in an incorrect system.

The root one was actually a bad query as outlined in the article.

Let’s get philosophical for a second. Programs WILL be written incorrectly - you will deploy to production something that can’t possibly work. What should you do with a program that can’t work? Pretend this can’t happen? Or let you know so you can fix it?


> Programs WILL be written incorrectly - you will deploy to production something that can’t possibly work. What should you do with a program that can’t work? Pretend this can’t happen? Or let you know so you can fix it?

Type systems provide compile time guarantees of correctness such that systems cannot fail in ways covered by the type system.

In this case, they used an unsound hole in the type system to do something that unnecessarily abandoned those compile-time invariants and in the process caused a world-wide outage.

The answer is not to embrace poking unsound holes in your type system in the first place.


It's not philosophical, half of the internet broke. There is a notion between "I really have no other choice but to crash" and "I might wanna crash at this moment because something is wrong but I won't and I'll try recovering".

In this particular case, it was the "limit 200" because "performance reasons" so I think there was more space to implement the latter than the former.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: