When I was learning to code C, I would often spend all night trying to chase down a bug, then finally go to bed in frustration at 5 am. After a few hours of sleep, I'd look at the code again, and immediately see the problem. I would feel so stupid for missing it before. Surrender works. Or maybe debugging all night doesn't work...
A good night's sleep can get you a few dozen extra IQ points. Amazing how many bugs get solved by ideas that magically pop into your head while you're showering the next morning.
I have a sense of when I'm getting closer to a bug. As long as I'm making progress I won't stop until I found the bug. That's not wasted time. If you need two hours to do this, then you spent that two hours working on the bug.
On the other hand if you spend a lot of time with no progress, then maybe you should stop and come back to it later, depending on priority, etc.
I just solved a bug in someone's CUDA agent based modelling library which caused a runtime error when compiling for a more recent CUDA compute architecture.
The actual bug was due to the more recent architecture enforcing pointer alignment, while the older implementations just ignored it.
My fixing of the bug, created other bugs (largely due to my lack of CUDA experience)and I eventually solved it by coincidence when applying my fixes from a reset across the whole code, rather than at a local level.
I may have made no progress towards the task in these 2 full days I spent working solving this issue, however I have greatly improved my CUDA programming knowledge and debugging technique, which I would have otherwise never done, so when I inevitably reach another bug It should (hopefully) be much quicker for me to solve.
You can also avoid getting frustrated by using a more test driven development. Incrementally write code, then test (could be just printf-ing), then chances are that the bug is in the last chunk of code written.
I am working on a project that aims to standardize the use of bisection to narrow-down and arrive at root-cause of bugs. One approach is to run regex on your output logs to determine what modules are running into issues. check out http://bisect.cc
TDD can be a great way to debug, particularly if it takes a lot of effort to get your running app into the state where the bug occurs. Write a cucumber scenario that sets up the conditions where you think the bug occurs and tweak until you're able to replicate. If you know when a bug occurred you can take a look at your logs and then use those to construct your bug scenario in cucumber. Once you're able to replicate in cucumber it's usually far faster to fix because your test cycles are much shorter (automatic vs having to put the running app in the correct state each time).
Testing your code as you develop is very useful, but it isn't a panacea for avoiding all bugs. Lots of times, bugs show up because of something you didn't realize could go wrong. And if you didn't realize it could go wrong, you also didn't realize that you should write a test for it. If it's a rarely occurring problem, it might first be reported by a customer a couple of years later, so you can only narrow it down to something that's changed in the last few years.
5 days to fix a bug! Jesus, write better code, write tests, with good logging. Break your code apart into testable modules. Test you hypothesis from he ground up. If you can't localise the bug quickly there is something wrong with your methodology. You need to use a debugger. Clearly you are trying to debug a house built on sand.
I find this is a common problem. People don't realise 80% of your time is spent debugging and checking your assumptions. If you are writing reams of code before checking your assumptions than you pay quadruple cost. The first thing I do when building a big data processing project is think about how I am gonna debug it first. Invest tons of time at the beginning getting readable output in a console. I have gone as far in some projects to output trees and grids in ASCII art to visualise data structures (in the console). I have never spent 5 days debugging a bug since university. The days spend at the beginning of the project making the debug output as good as possible pay for themselves a hundred times over.
Sometimes bugs are genuinely hard to debug and appear in hostile environments. Many people who have been programming for a long time have seen bugs that take much longer than five days to debug. For example:
- The bug doesn't reproduce when run under the debugger (not so uncommon with C code). Or the bug goes away when you change the timing of the code by writing to a log file.
- The bug could be in code that you didn't write and don't yet completely understand; the people who wrote the code have all left the company.
- The bug can be in a third-party library that you don't even have the source code to. You can only infer what the library is doing by seeing how it reacts to what you're passing it. Or you can try disassembling its code.
- The bug can be due to bad machine code generated by a compiler.
- The bug can be due to undocumented behavior in an operating system.
- The bug could be in a really complex algorithm.
- The bug might only reproduce at a customer's site with the customer's confidential data.
- The bug might be sporadic, perhaps depending on certain patterns of network traffic.
- The bug could be due to a fundamental misunderstanding of the original project requirements that will only become apparent after days of investigation.
no offense, but you sound like a programmer at either a small, or maybe medium sized conpany. Where code is all known to you, and touched by you (based upon your 'when I code..' sentence used). I regularly run into bugs that take longer than several hours to find, not talking about fixing. I admit, not the thing you like to get in an ideal world, but all the more you run into in the real world. Deadlines and 10+ year old platforms introduce hacks, and untested code.
back to the original post; i kind of like this in a mildly manner. as said in another comment, I also tend to hunt until I recognize a lockup. At that time I step back, walk to a drawing board and invite some colleagues to hear my brain pondering and running over several options while visualizing the pipeline on the board. This gives me three things:
- experience of other colleagues
- visualization of the problem to get clear what is happening.
- probably coffee and some smalltalk over completely different things which at that time seem completely logical to originate from the thing explained. It relaxes me.
Back at my desk Im all fresh again, new ideas to try out etc. Bottom line, sometimes taking a step back helps for me as well. Although I like to see it as a last resort, while I (now again) know I should start doing this more often, and especially at an earlier stage.
To end with the start, when you run in a multiple day bug you should immediatly write a test for it, after solving. Make sure you dont run in it again :).
All of the techniques you mention help a lot, but it's only going to reduce the incidence of really terrible bugs, not eliminate it. No matter how great your tests and logging and code, a subtle compiler, OS, or hardware bug can still give you days of pain.
The toughest bug I encountered took a month to find. The program (a Unix daemon) was single process, with no threads. The program would crash (seg fault) after a few hours (some system) or a few days (other systems). There did not seem to be any rhyme or reason for the crash. And running it under a debugger didn't help, as it kept crashing at different points.
I finally found the issue, but only after close study of the code. I realized that one signal handlers was different than all the others, and that signal handler was calling some non-thread safe code. But because I wasn't using threads, the thought of a critical section of code being interrupted never occurred to me.
Now I spend my work time finding bugs in a distributed system. Fun times. (latest possible bug---one component stoped logging but only after what I would call "ludicrous load" (say, 1000+ load average)---it was still running, but apparently got wedged on a mutex)
You are very right. and spot on. I used that example to mainly bring out the message.
This particular bug wasn't in my code but on the platform the code needed to run on.
Because of my wrong approach to debugging I ended up spending more time looking in the wrong place for the bug. so actually what I thought was debugging was just time wasting.
If a bug takes 5 days to solve, it's a bug that needed 5 days solving it. If you've not had to spend 5 days on one since Uni you're a lucky man, I didn't for years then I've had quite a few appear over the last 18 months.