An Approach to Integrated Problem Solving - Maretha Price Sasol Secunda Operations: Reliability Engineering
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
An Approach to Integrated Problem Solving Maretha Price Sasol Secunda Operations: Reliability Engineering Copyright ©, 2018, Sasol SMARTER approaches in Asset Management www.saama.org.za
Why Integrative Problem Solving? • RCA is often mistaken with problem solving – it is a tool to use in problem solving! • This leads to inadequate use of RCA in the larger context. • Latent cause = root cause… all types of causes are important! • Understand the entire picture and integrate findings to understand the entire failure mechanism. • Knowing how to use the various methods and how to integrate methods is key. • Often a single method does not lead to a successful failure investigation. • Referring to RCM, RCA can be a valuable pro-active tool: • Understand consequences • Understand degradation mechanisms • Conduct a proper failure modes & effects analysis (FMEA) and update asset strategies Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
Fault Finding vs. Root Cause Analysis Fault Finding • Fault finding is when you find the reason for a technical error. • To find the component that failed / causing the problem in equipment / systems. • Physical Cause – technical and specialized knowledge is needed. • Normally you can identify one specific failure. • Technical orientated – Basic and Specialized Technical knowledge required. Root Cause Analysis • Root Cause Analysis looks at the entire package and underlying reasons that should have prevented the incident / failure from occurring. • Normally more than one cause – latent, systems etc. • Knowledge of the law, procedures, business, policies, people, maintenance, etc. required. Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
When to do a Root Cause Analysis • Injuries/Safety • Process upsets (trips, blockages, bottlenecks, etc.) • Equipment failures (Process/Non-process Equipment) • Process Safety Incident (Fires, Product releases or Explosions) • Environmental Incidents • Health related Incidents • Security Incidents • Quality related incidents To contribute to Plant Stability & Reliability Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
Typical Steps in an Investigation 1 Problem Statement. 2 History and Basic Conditions. 3 Data Gathering and Multi-Disciplinary Interpretation. 4 Sequence of Events. 5 Generate all Possible Causes. 6 The Formal RCA Evaluation against a Defined Method. 7 Classify the Causes. 8 Generate and Document Solutions and Recommendations. Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
The Importance of a Good Problem Statement • List Specific Date and Time • List Conditions when incident occurred. What? • Clarify Location of the incident. When? • Capture Initial observations How? On Friday Evening (31 July 2016) around 20:14, high bypass pond levels were recorded. The booster pumps were commissioned and the bypass pond level was returned to normal. People like to be informed – always show them where they are in the process and what to expect. Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
Understand the History and Impact • Explain the function of the process and the function of the main equipment. • Clarify specific products, temperatures and pressures. • List critical control measures • List conditions that will lead to failures. • List Equipment / System criticality This criticality is based on … Copyright ©, 2018, Sasol
Data Gathering and Multi-Disciplinary Interpretation • Capture Equipment History – Installation, Modifications, Strategies, etc. • Review previous incidents and findings – ensure next step execution. • Capture all abnormalities, statistics and observations per discipline. • Discussion in a team to understand how the various findings are interlinked to cause the incident. Sometimes asking “WHY?” can go a long way … Don’t underestimate the value of a photo! Copyright ©, 2018, Sasol
The All Important Sequence of Events • A sequence of events gives a chronologic account of events. DATE TIME EVENT A trip was logged due to a loss of the uninterrupted power supply. Instrumentation • Often contains the “hidden traces” of the solution to 12 December 2017 (Tuesday) + 12:22 department was called, as the operator reported that all HMI graphics were lost and that plant status cannot be assured. Acid dosing control on another unit was also lost. a problem. Instrumentation department reviewed and check all alarm logs and received indication • It is important to be factual and only list items that + 12:30 that there is an apparent loss of power (mains) on the specified units. The cabinets and power supplies were physically opened. It was observed that the power from the can be proven as fact. mains were confirmed to be in an off position. All other controllers and modules (inverter/rectifier) was also in the off position. All distribution boards were also confirmed to be off. • Start the sequence of events from the closest Electrical department was called out and found that the substation was in an alarm state. It further indicated a rectifier / inverter offline as a result of power lost from the similar occurrence as the investigation in question. + 12:50 uninterrupted power supply. The rectifier / inverter went offline as per the design of this circuit and stopped working when the UPS battery ran flat. The functionality of the • If the incident is truly first in its nature, the rectifier / inverter was validated and confirmed to be in a working condition. Electrical department proceeded to check the substation and confirmed that the power + 13:00 equipment history and modifications may from the was intact and that there was no loss of power. All systems and components were reset and production proceeded to start up the plant + 13:10 sequence of events. successfully. • Always play it back to the investigation team in as simple terms as possible and let it follow a storyline flow. Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
Sometimes it is as Easy as 1, 2, 3 … Copyright ©, 2018, Sasol
Generate ALL Causes, including the Mechanism • List all potential causes – even causes that are improbable. • Evaluate immediate improbable causes for elimination and capture reasons for elimination. • Ensure that the remaining potential causes align with the sequence of events. Dip in Plant Air Supply A dip in plant air supply could have resulted in solenoid activation, causing the firewater system to be activated, without sending an alarm to the control panel. Eliminated as a Root Cause: Apart from the fact that the solenoid was found to be in a working order after both incidents, no fluctuations in plant air supply was observed during the time of any of the incidents. Switch Manually Pressed in Error If the switch in the control room was activated, it would have activated the firewater system without sending an alarm to the panel, as this switch is not linked to a panel alarm. The event of this occurring, however, is highly unlikely, as this switch is of an “sunken in” type and has to be pressed … and ALWAYS with deliberate action. It is possible that this switch could have been pressed in error, due to not understanding its purpose, but based on the probability of failures; the most probable cause is that play it back… this switch was faulty. Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
Which Method is Which …? ● Identify barriers to prevent incidents Barrier ● Barriers are evaluate to see where barriers failed or worked less effectively Analysis ● Reasons for barrier failures and preventative actions are listed and implemented ● It often does not identify missing barriers ● Align the sequence of events with the conditions that caused them Causal Factor ● Indicate events / conditions with evidence in a solid line to represent the incident Tree ● Evaluate the incident by evaluating changes in events / conditions ● Identify the root cause, as well as causal factors ● Identify faults that could lead to the incident, as well as a cause for every fault Fault Tree ● Group logically related items using “AND” or “OR” between faults and causes Analysis ● Continue identifying causes for each fault until you reach a root cause ● List countermeasures for each root cause Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
Which Method is Which …(continued)? ● Group all causes into categories (include people, process and procedures) Fish-bone ● Group causes into sub-causes and evaluate the causes against proof Diagram ● Can be used effectively with any other technique ● 20 % of the most important failures causes 80 % of the incidents Pareto ● List all potential causes and rank the causes according to highest probability Analysis ● The top 20 % of the causes will be the most likely root causes ● Used effectively when multiple causes are to be evaluated Failure Mode & ● Identify all the various failure modes as well as frequency and preventative actions Effect (FMEA) ● Evaluate if all preventative actions were implemented and followed (Insert Strategy Here …) Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
Which Method is Which … (continued)? ● Appraise the situation KEPNER- ● Compare the situation against a workable scenario TREGOE ● List the differences and changes to derive a root cause Copyright ©, 2018, Sasol
If all else Fails … … evaluate the on the balance of probabilities … Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
Understanding the Types of Causes ● Origin of an incident. Root Cause ● Most basic cause which can be fixed and / or improved to prevent recurrences. ● Primary cause leading to an incident. Direct Cause ● Incident would not have occurred without this cause. Contributing ● Not a self-sufficient cause. Cause ● Contributes to the severity / frequency of an incident. ● True reason for an incident. True Cause ● Can be physical or latent in nature. Physical ● Causes with tangible roots and visible after the incidents. Roots ● In other words causes that can be seen physically. ● Refers to human intervention actions Human Roots ● For instance tasks that were executed / not executed Latent Cause ● Underlying reasons that explains physical & human causes. Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
Remember when Closing Out an Investigation • Verify your findings - follow the trail of cause and effect to verify that the root cause is correct and that all causes and effects are listed to prevent the failure from recurring. • Group all causes into the various disciplines. • List corrective actions for every cause to minimise severity of to prevent recurrence. • Develop an action plan with responsible people and due dates to implement findings. • Implement all actions and solutions. • Review performance and track effectiveness of solutions. • Remember to capture the lessons learnt from the investigation. Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
A Few Practical Learnings • Capture information relating to the incident as soon as possible. • Involve all relevant parties from the beginning of the investigation to ensure alignment. • Identify an incident owner who will ensure that all next steps are executed. • Ensure that the RCA is conducted within a multi-disciplinary team. • Ensure that the Sequence of Events is as accurate and captures all incident conditions. • Ensure that the team is aligned with the RCA methodology to be followed. • Ensure that dates and responsible people are assigned to next steps. • Always intent to solve the problem and not to blame people. • Ensure that everyone is allowed to voice their opinion. • Evaluate all possible listed causes. • Substantiate causes with facts. Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
In the Words of Jack Sparrow Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
Thank you Maretha Price Sasol Secunda Operations: Reliability Engineering Copyright ©, 2018, Sasol SMARTER approaches in Asset Management
You can also read