Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can absolutely get behind the "nobody knows what log levels are for".

First of all, logs come from application components, and the bigger your application is, the more levels are in this hierarchy (components of components etc.) Authors of individual components may not have any idea about how performance of their component is going to affect the whole application.

So... log level needs to be re-interpreted? Logs need to be collected hierarchically? Logs need to become part of the application public interface? -- This is too high price to pay, and, in practice, nobody is going to do this. So, in reality, outside of very simple cases, it's just easier to ignore the level of the log message.

Another aspect is that log levels aren't really a sequence. Messages can belong to several categories at the same time, but these categories need not occupy a continuous sub-sequence. And the more detailed you make it, the more obvious this problem becomes.

Yet another aspect: people writing logs may rely on another (dumb) program processing them. So, they will leave semantics of logs aside, while concentrating on the desired side effect caused by the application processing them.

---

My personal experience with, eg. Kubernetes is that its authors consistently underestimate severity of some conditions and overestimate severity of others. I often find that what was labeled as "warning" was the reason the whole cluster doesn't function anymore, while something that was reported as "error" was a completely expected condition that the program absolutely knew how to recover from.



> logs come from application components, and the bigger your application is, the more levels are in this hierarchy (components of components etc.) Authors of individual components may not have any idea about how performance of their component is going to affect the whole application.

There is no way to solve this problem. If you have 50 unrelated sub-components in an application, there is no way to know if a given event is critical or just informational, because it requires specific context to understand what the event is actually impacting in that moment.

That doesn't mean the log level is useless. It just means it is one aspect of the signal you are getting from a sub-component. You can then filter on that specific signal from that sub-component in cases where the sub-component might provide a different log with a different log level signal. Is it perfectly accurate? No. Does that make it useless? No.

> Another aspect is that log levels aren't really a sequence. Messages can belong to several categories at the same time, but these categories need not occupy a continuous sub-sequence. And the more detailed you make it, the more obvious this problem becomes.

This just means you have a low-quality signal. Is every low-quality signal useless? No.

> Yet another aspect: people writing logs may rely on another (dumb) program processing them. So, they will leave semantics of logs aside, while concentrating on the desired side effect caused by the application processing them.

Again, low quality signals aren't useless, and you don't have to do this if you don't want to. Throwing away log levels is literally throwing away more signal, which is going to make life harder, not easier.


> You can then filter on that specific signal from that sub-component

I already answered why this is a bad idea, but I will repeat: this makes logs part of the public interface. Which, in turn, imposes a lot more restrictions on component providers than they currently have / than is plausible to expect from them. And if you (the mothership application developer) decide on your own to rely on the feature that is not a part of public interface (i.e. you decide to filter logs from components and translate them somehow) anyways -- well, you've done a crappy job as an engineer by essentially planting a time bomb in your application.

So, your plan is not good.

This also means you cannot automate response to log levels. As a human reading the logs, in most if not even all cases, you could probably get down to the bottom of why a particular log message had the level it had, but it's not humanly possible to write a program to do that. This is what makes log levels worthless (in the context of monitoring).

> low quality signals aren't useless

You seem to confuse the goal OP set for logs (use them for monitoring, i.e. in an automated way) with other possible goals (eg. anthropological, where you are studying how human understanding of log messages evolved over time).


> Throwing away log levels is literally throwing away more signal, which is going to make life harder, not easier.

Signal that can't be used is just noise, and noise complicates solving a problem by interfering with the signal.

In projects I work on, I generally only use two logging levels: info and error. Error indicates unrecoverable conditions that mean an execution context is terminating. Info is everything else. "Warn" is useless because something is only a warning in a particular context and I don't have that context when I'm building the logs. "Debug" is a lie; logging isn't debugging, and if I need to debug I need to slap an actual debugger on the binary with source code available.

I couple that with the ability to turn on and off logging at a fine-grained module level and (if I'm living my best life) being able to instrument the production code for breakpointing and logging on the fly (via systems such as Google Cloud Debugger).


Not only logs - that year I spent where CPUThrottlingHigh was an "error" alert (on a cluster with mandatory CPU limits) was awful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: