Debug Framework
Over the past week I’ve been in two situations which have encouraged me to think about the way I ‘debug’ or problem solve.
After reflection, I thought I’d share an unconscious framework I seem to follow, whether it be for solving problems in life or in code. Having a simple pattern to follow may empower you to solve problems in a more timely manner with less anxiety.
Hold onto the following two scenarios as we explore further:
- In my band, a friend plugged his bass guitar into the amp. It buzzed for a couple of seconds, went silent and when he tried playing the bass?—?the amp remained silent. After several seconds of tapping and twisting different things, he frustratedly remarked?—?‘This amp has sh*& itself, it’s completely busted…we need to get a new amp’
- At work, a colleague (web developer) was tasked with fixing an issue where styles weren’t appearing as they should. Almost two hours after starting the investigation, disheartened that he hadn’t progressed, reaches out for assistance.
We’ll explore the the following pattern which is iterative, so we’ll repeat over and over until we get to the root cause and then solution:
- Frame
- Identify
- Reduce
01. FRAME accurately
Before you go any further, it’s really helpful to frame the issue accurately. For example:
Bass Guitar example
- The overall problem wasn’t that the ‘amp is busted’. The problem was that ‘we’re not hearing the amplified sound of the bass guitar’.
Website example
- The overall problem wasn’t that ‘the production site just broke randomly’. The problem was that ‘in Safari on iOS the checkout page is missing all elements’.
Why frame the problem like this? Framing the overall problem avoids assumptions. Notice in the bass guitar example, that the friend immediately assumed that the amp was the problem and therefore was fixated on ‘fixing the amp’ (ignore all other possibilities)?
02. IDENTIFY possible causes
In most cases, there could be a number of things causing the problem, or even a specific combination of one or more things. It’s helpful to identify each of these broader pieces. For example:
Bass Guitar example?—?the relevant parts that could cause the issue may be:
- Bass Guitar
- Guitar Lead (from guitar to pedal)
- Pedal
- Guitar Lead (from pedal to amp)
- Amp
- Power
Website example?—?the relevant parts could be:
- Web browser (eg. application bug, cache)
- Internet connection
- Web server
- Continuous deployment tools
- Website code (Git tracked code that we’ve written)
- Website state (database, any files that store data, non-Git tracked files (eg. dependencies, plugin))
Notice that for now, we’ve identified the broader pieces like ‘Bass Guitar’. Right now, it’s not yet helpful to further break the bass guitar into small pieces (eg. strings, pick ups, knobs, wiring etc). Remember, this process is cyclical, so in the next loop, we may need to identify the pieces of the bass (if we find that the bass guitar is relevant to the issue).
03. REDUCE the surface area
Now that we’ve identified each of the pieces, we need to focus on testing and eliminating each piece one-by-one until we find the piece that’s broken. We can be tempted to lazily “scatter-shoot” our attention across all pieces at once; however we must procedurally isolate and test one piece at a time to prove that it is or isn’t a suspect in this problem.
Isolating might mean swapping out whole parts or disabling whole features to determine if that piece has an effect on the issue.
Let’s go back to our examples:
Bass Guitar example
- The bass guitarist could plug the guitar straight into the amp with one of the leads. This eliminates the pedal and the other lead. Rather than troubleshooting all 5 parts they’re now only focusing on 3 parts. When we did this, it turns out music still isn’t coming out of the speaker, so we started swapping those 3 parts with other parts, and continued isolating.
- After swapping one lead with the other lead, glorious music is sounding from the amp!
Great so the problem was the first lead (not the amp)!
If we’re interested in digging further into why the first lead doesn’t work, we could repeat the process with the smaller problem?—?Frame the problem as ‘the lead isn’t transmitting a signal’, identify the pieces as 2 plugs and wires and reduce the pieces one-by-one.
Website example
- It turns out the website was working a week ago, but there weren’t any code changes since then. So we can eliminate “website code” as a possibility.
- Does it work in any other server environments (ie. local computer or staging server)? What do you know, we found that the error didn’t happen on the staging server.
- What are the differences between staging and live? There’s a ‘social sharing’ plugin which was active on live, but not in staging. What’s more, when we activated the social sharing plugin on staging, that environment had the same error.
Ok good news. We know that the the Social Sharing plugin is causing this issue. We can continue by repeating this process, focusing on the Social Sharing plugin:
- Frame the problem as ‘when enabled, the Social Sharing plugin is causing elements not to display in Safari in the checkout’
- Identify the pieces as CSS styles it’s injecting, JS it injects, HTML layout it manipulates
- Reduce the surface area by eliminating each piece one-by-one
We may then find that it’s some CSS, and therefore need to repeat the process within the CSS. As they say, rinse and repeat.