Yer Trouble
yertrouble.bsky.social
Yer Trouble
@yertrouble.bsky.social
I help IT people fix problems by getting better at troubleshooting.
When seeking help from senior colleagues or outside vendors, be prepared to share specifics. Gather the network addresses and/or phone numbers, etc. Everyone who takes escalations doesn't want to hear "I've tried nothing and I'm all out of ideas."
September 26, 2025 at 6:14 PM
Have a monitoring system. Have something you can look back to see the exact moment things went from good to bad. That can often lead you to a particular event that started the issue, and that can be very helpful to figure out the fix.
September 19, 2025 at 6:09 PM
It is becoming increasingly common in our cloud connected world to be impacted by troubles that are not in our control to fix. We need to know how to quickly narrow down if it is in fact a vendor trouble, and which vendor, and then how to get a hold of that vendor. (2/2)
September 12, 2025 at 5:36 PM
Know the vendors in your environment. Again preparation is key. Have a list of all the circuits and systems from outside vendors that are vital for your environment to work. Know what each would look like it it were bad. Test, and if it's bad, call in a ticket to that vendor. (1/2)
September 12, 2025 at 5:35 PM
Look for patterns in the error. Does it always happen at the same time of day? Is the trouble in a particular city, or part of the city? Or is it "everywhere"? Does the physical environment have anything to do with it? Does the trouble happen every time it rains for example?
August 30, 2025 at 9:49 PM
Obtain logs of the error, or packet captures, and other non-subjective and time-stamped measurements and data
August 24, 2025 at 6:15 PM
The scientific method usually is not one and done, often you need 2 or 3 experiments. If a certain 3 things are all true, that should lead you to a very specific conclusion. (2/2)
August 16, 2025 at 5:17 PM
Develop your "scientific method". Develop a hypothesis for what you think could be wrong. Think of experiments to prove or deny this theory. If A is bad, we would see B error. Test it. If error B is happening, there is a good chance A is bad, and that can lead you to further things to look at. (1/2)
August 16, 2025 at 5:17 PM
Know what "good" looks like for your particular system. Preparation and study is key. You can't fake this when a trouble is upon you. If you know what "good" looks like, you can test and measure to check each component, what's "good", narrow down what could be bad.
August 5, 2025 at 11:35 PM
Divide and Conquer. Break the system up into functional or geographical sections, test each section, narrow down where the trouble is.
July 30, 2025 at 1:46 AM
Is any part of the system good? Process of elimination, rule out the parts that are good, to narrow down what could be bad.
July 23, 2025 at 6:36 PM
Walk through the sequence of events that led up to the trouble. User was in so-and-so program, user ran such-and-such command.
July 15, 2025 at 5:44 PM
Down for everyone or just me? The troubleshooting you do is different, depending on if the trouble is affecting a whole floor, or a whole building, or a whole city, or just one user.
July 12, 2025 at 8:46 PM
Users lie. Users often don't know they are lying. They just might not fully understand what you're asking or why it's important. You need to develop skill in asking specific, layman level questions and interpreting the answers to eliminate the grey areas.
July 9, 2025 at 3:28 AM
On the other hand, is it a situation where you CAN'T make things any worse? Such as if a particular device is down anyway, rebooting it might not make things any worse, but might fix it.
July 5, 2025 at 7:57 PM
Know the consequences of your actions. Don't make things any worse by guessing. Know the worst case scenario. Could a particular action make things worse? Do you (or your manager) accept that risk? Are there any human safety risks from an action you want to take?
July 5, 2025 at 7:57 PM
Have a good understanding or description of what the trouble actually is. That way you know what "fixed" looks like. Then you will know when to stop.
July 1, 2025 at 7:35 PM
Break the trouble down into small manageable parts.
June 28, 2025 at 2:47 AM
Reposted by Yer Trouble
An example of making sense of problems first before going straight to procedures:
June 24, 2025 at 3:00 PM
If you don't write it down, it never happened. Keep a notebook. You won't remember next week when your manager wants to know what you did on a particular trouble. Even better, ticket all the things. Could be a career-saving move if things if things really go sideways on you.
June 25, 2025 at 1:27 AM
90% of problems are operator error.
June 21, 2025 at 12:16 AM
Make sure the problem is still happening. If time has passed, check again before spending a lot of time and effort. Sometimes problems go away by themselves. Or somebody else found the problem fixed it first. It makes sense to quickly check if the problem is still happening.
June 19, 2025 at 5:41 PM
Read the error message. Google it.
June 19, 2025 at 12:01 AM
Don't rely on others to tell you what was tried already, test it yourself. If you have documented proof such as a ticket with logs and time stamps, that can be relied on slightly more. However, conditions can change over time and there is value in re-checking to see it with your own eyes.
June 17, 2025 at 10:45 PM
Turn it off and turn it back on again.
June 16, 2025 at 5:15 PM