Electrical Safety: The Troubleshooting Process and Staying Alive

Tim Conley, CBS Field ServicesFeatures, Summer 2025 Features

Far too many technicians, electricians, and other tradespeople are severely or fatally injured each year due to an electrical issue. From a technician’s standpoint, we are wired to jump on the problem, fix it, and get the plant back up and running, regardless of the necessary actions, because downtime results in lost revenue. We must examine ourselves and address the concept of successfully troubleshooting, transitioning from troubleshooting to repairing, retesting, and returning the equipment to service, all while prioritizing safety as our primary concern. 

As a  U.S. Navy Nuclear Electrician’s Mate, since retired, I learned a seven-step troubleshooting process:

  1. Symptom recognition
  2. Symptom elaboration
  3. List probable faulty functions
  4. Localize the faulty function
  5. Localize the faulty component
  6. Failure analysis
  7. Retest and update the material history

Others  look at troubleshooting differently: 

  1. Identify the problem
  2. Establish a theory of probable causes
  3. Create a plan of action
  4. Implement the plan
  5. Verify full functionality
  6. Document findings and outcomes 

Did you notice that things are missing? Note the concepts of Step 3, creating, and Step 4, implementing a plan. However, even this concept is missing something, and I challenge everyone to compare the two methods of troubleshooting, identify and fill in the holes, and then merge them into a more solid troubleshooting program. The Navy’s plan was to “find it and fix it.” However, as NETA Certified Technicians, we must look at how to analyze the problem safely and transition from troubleshooting to repair.

NFPA 70E

NFPA 70E Informative Annex Q Human Performance and Workplace Electrical Safety was long overdue. It was first published and incorporated into our safety programs in 2018, and it  reiterates the Hierarchy of Risk Control Methods:

  • Principles of human performance
  • Information processing and attention
  • Human performance modes and associated errors
  • Error precursors
  • Human performance tools
  • Human performance warning flag
  • Workplace culture

Annex Q emphasizes the critical role of human factors in electrical safety, recognizing that even the most skilled workers are susceptible to errors. By integrating human performance principles into safety programs, NFPA 70E encourages a proactive approach to mitigating risk. 

Understanding how individuals process information, recognize hazards, and respond to dynamic work environments allows organizations to implement effective training, error-reduction strategies, and real-time corrective actions. Including human performance tools and warning flags helps workers anticipate and address potential issues before they escalate, reinforcing a culture of accountability and continuous improvement. As workplace culture plays a vital role in safety outcomes, this framework serves as a guide for fostering awareness, reducing errors, and ultimately enhancing electrical safety across all levels of an organization.

CASE STUDIES

Texas: Mine Arc Fault Fatality

In the Fall of 2024, an electrician working in a Texas mine suffered fatal burns from an arc flash event. He succumbed to his injuries a few days later. The incident remains under investigation as the Root Cause Analysis (RCA) has not yet been released. At this stage, any conclusions are based on the Mine Safety and Health Administration (MSHA) fatality alert and are purely speculative. The alert suggests that troubleshooting was conducted without proper PPE, appropriate tools, or the correct test equipment, highlighting potential safety lapses that may have contributed to the fatality. It also appears that access door interlocks had been overridden to replace line fuses across the 4160 Vac contactor with the door open.  

On the surface, this incident bears a striking resemblance to another fatality in North Carolina that was captured on video and is readily available on YouTube. You can view the footage here: Weyerhaeuser OSB Mill Incident. This documented event serves as a stark reminder of the devastating consequences of electrical hazards, emphasizing the critical importance of proper PPE, safe work practices, and adherence to established safety protocols.  

At this point, any direct comparisons remain speculative until the RCA is officially released. However, it is important to acknowledge that the voltage involved — 4,160 V AC — exceeds the threshold outlined in OSHA 1910.269(l)(2)(i), which mandates that two qualified individuals must be present when working on high-voltage equipment. This regulation exists to provide an added layer of oversight and protection, reinforcing the necessity of strict procedural adherence, comprehensive risk assessments, and a strong safety culture to prevent similar incidents in the future.

MSHA’s Fatality Alert Best Practice statement (Figure 1) includes:

  • Use properly rated electrical meters and personal protective equipment such as electrically rated gloves, arc flash protection suits, insulated blankets or mats, and polycarbonate barriers.  
  • Establish safe procedures before beginning work and discuss them with all miners involved in the task.  
Figure 1: MSHA Fatality Alert
PRINTED WITH PERMISSION FROM MINE SAFETY AND HEALTH ADMINISTRATION (MSHA)
Washington: Hydroelectric Dam Arc Flash

At a hydroelectric dam in Eastern Washington, six employees were seriously injured in an arc flash incident in the Fall of 2015. I visited the plant in 2016, unaware of the event that had occurred the year before. I learned that a generator had been undergoing repairs, but details on why the repairs were being performed were not shared. It wasn’t until a few years later that I received a copy of the RCA, and in 2018, I had the honor of interviewing two survivors who were injured.

In the last 30 minutes of a 10-hour shift, an electrical supervisor announced that a generator output circuit breaker was experiencing excessive hydraulic pump runtime alarms. The generator had been out of service for two months for unrelated repairs. Following the repairs, mechanical and electrical trades performed pre-startup procedures to restore the generator to service. Having issues with generator circuit breaker hydraulic actuating systems after long outage periods is not entirely uncommon, but is not anticipated.  

Together, the supervisor, a team of electricians, and an operator set out to investigate the abnormal condition and the hydraulic alarms. The plan (or lack thereof) was to perform a visual inspection. The “let’s go look at it” relatively simple and safe task led to an explosion in the generator output circuit breaker cubicle.

The incident sent six employees to Seattle’s Harborview Burn Center with injuries ranging from first- to third-degree burns. The worst case was an employee who survived but was burned on 72% of his body, requiring multiple resuscitation efforts during the process at the hospital.    Another worker was burned on 53% of his body; he watched his arc-rated clothing and spandex underwear burn away and has permanent lung and vocal cord damage. It was from these first-hand accounts that I truly understood the toll of an arc flash incident and the long road to recovery that survivors faced.

How did this incident happen? The answer: “We all went because we didn’t want to get held over on overtime.” The bottom line is that the six people — four electricians, one operator, and the supervisor — transitioned from “let’s go take a look” (no testing, no touching) to troubleshooting with no plan and no safeties in place. As a result, one of the electricians depressed the manual override of the circuit breaker hydraulic system, allowing the generator output circuit breaker to close slowly because of low system pressure due to 13.8 kV on the line side and an idle generator on the other side.  

How did six people allow themselves to end up fighting for their lives in a burn center just minutes from the end of a successful shift? The supervisor came in at 4:10 PM, barely 20 minutes from the end of the work week, with the crew in various states of getting ready to go home. The workers had mentally disengaged from the responsibility of work for the day, and the work week was over in their minds. They were also aware that the time to catch their rides was approaching. Additionally, the plant has ten generators, and the loss of this generator’s output capacity could have been made up by any of the numerous dams on the river, and this generator had already been down for two months. So, what was the sense of urgency? 

Because of the training I received in the Navy, as well as my many years working in the industry, I look at things in a different light. I ask, “Am I or my coworkers at risk of death or injury because of actions or inactions that might occur in a true emergency?” How do you classify a true emergency? Yes, smoke or fire are examples of emergencies that warrant specific immediate actions based on your company’s policies and procedures. But in this case, a false sense of urgency arose because of the perception created by the supervisor coming into the office and directing employees to get fire-rated PPE and begin investigative actions. What should have been said was, “Monday morning, we’re going to investigate and troubleshoot Generator X’s output circuit breaker,” or “I’m looking for volunteers for overtime to troubleshoot Generator X’s output circuit breaker tomorrow.”  

In the end, the arc flash incident at the Eastern Washington hydroelectric dam was not the result of a single failure, but a cascade of human factors, flawed assumptions, and cultural oversights. A seemingly minor task at the end of a long work week, driven by fatigue, perceived urgency, and a desire to avoid overtime, escalated into a life-altering tragedy for six dedicated workers. This case is another stark reminder that troubleshooting — especially involving energized equipment — is not casual work. It demands planning, alertness, and discipline guided by established safety principles and procedures. The lack of an electrical work permit, the absence of a defined troubleshooting strategy, and a failure to recognize and respond to human performance warning flags all contributed to the catastrophe. The survivors’ stories are not only testaments to resilience but also a sobering call to recommit ourselves to a culture of safety, where no troubleshooting or investigative task is routine.

ELECTRICAL WORK PERMIT

NFPA 70E, Article 130.2.(C) Exemptions to Work Permit allows us to work without an energized electrical work permit (EEWP) if a qualified person is provided with and uses appropriate safe work practices and PPE in accordance with Chapter 1. This applies under any of the following conditions: 

  • Testing
  • Troubleshooting
  • Voltage measuring  

By eliminating the requirement for an EEWP, we removed one of the backup safety measures a technician would have if energized work was deemed necessary. An EEWP identifies a plan and requires higher levels of management authorization — hence, more eyes. In allowing the exemption for troubleshooting, we become dependent solely on the technician’s skills and compliance with safety guidance.  

NFPA 70E Article 130.8(A) Alertness (2) When Impaired, states: 

Employees shall not be permitted to work where electrical hazards exist while their alertness is recognizably impaired due to illness, fatigue, or other reasons.    

At the end of a scheduled work week and a ten-hour shift, I believe fatigue and being mentally checked out of work were factors in the poor decision-making process.  

Article 130.2(C) Exemptions to Work Permit states that electrical work shall be permitted without an energized electrical work permit if a qualified person is provided with and uses appropriate safe work practices and PPE in accordance with NFPA 70E Chapter 1 Safety-Related Work Practices — for example, when troubleshooting. Safe work practices are the key point. The moment it becomes a perceived crisis, the mind’s judgment regarding safe work practices can easily become clouded.  

TROUBLESHOOTING

Troubleshooting is defined as tracing and correcting faults in a mechanical or electronic system. Elsewhere, it is a systematic approach to problem-solving that is often used to find and correct issues with complex machines, electronics, computers, and software systems. Note that each definition includes correcting the problem. Correcting a problem may be a simple reset button resulting from a system overload that is known to have caused the trip.  

NFPA 70E does not define troubleshooting but allows me to do it without mandating an EEWP.  The procedural limitations set by the employer limit what is permitted to be performed during a troubleshooting evolution. Clear and concise limitations on where live testing may be performed, when testing stops, and when lockout/tagout is implemented for repairs must be established by the employer.  

Troubleshooting requires a higher level of thought, concentration, and in many cases, a written plan to investigate, test, and troubleshoot. Time should be spent to develop a job safety plan, perform the risk assessment, and adopt the risk control methods outlined in Informative Annex F, which includes symptoms, assessing the hazards and how to mitigate them, and establishing safe points where the equipment is in a safe condition so technicians can evaluate results. 

Think about Article 110.4 (C) (1) Contact Release Training, where the plan includes emergency response and contact release. The days of grabbing your tool pouch and meter and running off to be the hero are over for very good reasons. Lost production and lost generation are not true emergencies requiring immediate and prompt action.  

Using the standard seven-step troubleshooting process includes:

  1. Symptom recognition
  2. Symptom elaboration
  3. List the probable faults
  4. Localize the faulty function
  5. Localize the trouble to a faulty component
  6. Repairs > Retest 
  7. Failure Analysis > Documentation > Material History 

Where in this list of actions do you transition from troubleshooting to repair? During your thought process, you spent time to figure out the reason for the equipment failure, you visualized the repair, and you figured out whether it’s a quick fix and we’ll be back online — but this is a recipe for a crippling disaster.  

Step 1, Step 2, and Step 3 are frequently accomplished before racing off to get tools and test equipment. Step 4 and Step 5 are the physical actions you intend to take to obtain data through physical testing.  Testing is often considered touching, as tools and test equipment to investigate the issue will make physical contact with the system. In Step 4 and Step 5, you must go back to the definition of a qualified person, using appropriate precautions, knowledge, and techniques to safely obtain data to determine the fault.  

Electric shock and arc flash risk assessments must be performed to determine whether you can safely perform the testing evolution. If you are doing an electric shock and arc flash risk assessment, take the next logical step and develop a procedure. Tools to enhance your safety before any hands-on activities include modern microprocessor relays and SCADA-type devices that allow a technician to access data collected and retained during an event.   

CONCLUSION: STAYING ALIVE MEANS THINKING AHEAD

The hard truths found in each of these incidents reveal the same common thread: a breakdown in planning, communication, and respect for the risk. Too often, “just troubleshooting” becomes a justification for bypassing essential safety measures, especially when urgency, fatigue, or production pressure creep in. 

But troubleshooting isn’t a casual step in the process. It’s a high-risk activity that demands discipline, deliberate thinking, and clearly defined limits. As NETA Certified Technicians, we must recognize that our role is not only to diagnose and repair but to do so in a way that sends everyone home safe. That means knowing when to stop, when to escalate, and when to say no. It means using tools like risk assessments, written procedures, and the seven-step process as more than just guidelines — they’re lifelines. 

Most importantly, we must foster a culture where no one feels forced to choose between safety and schedule. Staying alive in this field is about more than technical skill — it’s about leadership, communication, and the courage to treat every job, no matter how routine, as potentially life-altering. Because sometimes, the only thing separating “just a look” from a life-altering tragedy is a plan that never got written.

REFERENCES

  1. National Fire Protection Association (NFPA).  NFPA 70E—2024®, Standard for Electrical Safety in the Workplace.® Available at https://link.nfpa.org/free-access/publications/70e/2024.  
  2. Occupational Safety and Health Administration (OSHA). OSHA 1910.269. Accessed at www.osha.gov/laws-regs/regulations/standard number/1910/1910.269.  
  3. Mine Safety and Health Administration (MSHA). Fatality Alert. Accessed at www.msha.gov. 

Editor’s Note: The Fatality Final Report and Root Cause Analysis for the Texas mine incident was released on April 7, 2025, too late for this publication. Available at https://www.msha.gov/data-reports/fatality-reports/2024/august-9-2024-fatality/final-report.

Tim Conley, CESCP, is a Senior Technical Advisor at CBS Field Services.He develops procedures for specialized testing, designs and installs protection system upgrades, and performs partial discharge testing and analysis and root cause failure analysis of electrical distribution systems, helping owners and insurance companies identify causes and courses of action. Conley is a NETA Level 4 Certified Senior Technician and is certified through NFPA as a Certified Electrical Safety Compliance Professional (CESCP).