Reliability Engineering in Product Design
This article, featured in Engineering Designer by Wilde’s Mike McCarthy, Principal Engineer, looks at how reliability engineering is used in product design
Most people have an innate sense of what ‘reliability’ is, however, if you ask for a formal definition you receive answers such as “it will work all the time” or “it does what its say’s on the tin…”
In the world of reliability engineering we have a more precise definition:
‘“Reliability is the ability of an item to perform a required function under stated conditions for a stated period of time.’ Source: BS 4778 “Glossary of terms used in quality assurance (including reliability and maintainability terms)”.
A more practical definition is:
‘Reliability is the probability R(t) that an item can perform a required function without failure under given conditions for a given time interval (to a specified confidence level)’
Reliability uses probability theory because we need to make risk-based decisions using imperfect or incomplete information.
What does ‘reliability’ mean for engineers / practitioners?
- Predicting product/process failures, understanding why the failures occurred.
- Improving the product/process in an objective way
- Creating optimised test plans
- Planning/scheduling maintenance activities and predicting spare part requirements.
- Making risk based decisions with sparse data
What does ‘reliability’ mean for managers?
- Lower manufacturing costs (overtime, rework, downtime, audits and inspection)
- Lower ‘through life’ costs (warranty costs, product recalls, customer dissatisfaction)
- Ensuring product or process capability
- Providing evidence for dependability claims
- Insight into contractual compliance
- Managing supply chain performance
- Making risk based decisions with sparse data
The same tools and methods address both group’s issues and concerns.
‘Reliability’ and ‘Quality’ are sometimes confused as being the same thing. They are intimately linked but are actually distinct concepts. ‘Quality’ is the extent to which the product is free from defects and how well it conforms to requirements at one point in time – usually at the point of manufacture. Quality is compliance to specification.
Reliability is the extent to which a product continues to conform to requirements over its operational cycle. In essence, ‘Reliability equals Quality over time under stated conditions.’
As product designers and manufacturers, we are always asked to make predictions about how our products will perform in their operating environments and under their associated usage patterns. This is typically driven by the need to show that the product meets customer requirements prior to release, or for spare part provisioning through life or for warranty management purposes. Usually we must accomplish this with limited time, resources and budgets.
We are also constantly asked to make trade-offs between engineering and commercial constraints. This is a difficult problem that is illustrated in Figure 1 below – the chart shows that there is a notional ‘optimum’ level of reliability that will minimise the life-cycle cost of our products.
There is a trade-off between the cost of producing the products (ie design, test, manufacture) and the cost of maintaining them in the field. Of course we must also ensure that this ‘optimised reliability’ level is acceptable to our customers.
Thankfully there are a variety of tools and methods that we can use to uncover, characterise, quantify, validate and predict issues with products throughout the product development cycle.
The list below shows some of the most important reliability topics in relation to product development:
- FRACAS (Failure Reporting Analysis & Corrective Action System)
- RCA (Root Cause Analysis)
- FMECA (Failure Mode Effects & Criticality Analysis)
- FTA (Fault Tree Analysis)
- DoE (Design of Experiments)
- Life Data Analysis (‘Weibull Analysis’)
- Accelerated Testing
- System Reliability Modelling (Reliability Block Diagrams)
- Reliability Growth Analysis
These methods are not new! – FMECA in aerospace dates back to the late 1940s, Weibull analysis was first described in a 1951 paper by Waloddi Weibull and modern DoE dates back to 1935 with the work of Sir Ronald Fisher for instance. However, it is my experience that they are not deployed across industry to their maximum business effectiveness. This brief article will address a few of these important practical tools in relation to product design.
FRACAS
All high reliability companies have a formal FRACAS system. This could be implemented in a bespoke relational database or in a series of spreadsheets. A FRACAS system is simply a closed-loop recording and control system for capturing, collating and analysing failure data in order to prioritise and manage corrective actions. It is the starting point for many reliability engineering processes such as FMECA, DoE and accelerated testing.
FMECA
FMECA (Failure Mode Effects & Criticality Analysis) is a process used in design, process/manufacturing and in field service applications. It is a structured ‘what-if’ approach to uncovering and assessing the critical items, components and operating conditions affecting product performance. FMECA is used to evaluate the effects and sequences of events caused by a specific failure mode – it is powerfully used to identify weak spots in the design for instance.
A common method used to classify failure modes uses a ‘Risk Priority Number’ (RPN), this is the product of three assessment criteria values – namely; frequency of occurrence, severity of effect and detectability. FMECAs are usually employed to demonstrate performance levels likely to be met and to estimate the significance and probability of failures. They are used to justify the level of availability/safety to users and regulators as well as manage life cycle issues. They can involve a significant amount of resource and are only truly effective when undertaken by an experienced multi-functional team.
Root Cause Analysis (RCA)
A Root Cause is the underlying reason for a failure, anything else is just a symptom.
RCA is simply the systematic search for the underlying reasons for failure. The tools of RCA are equally simple – but can be effectively applied to complex problems. Typical RCA tools are brainstorming, checksheets, Pareto analysis, process maps, cause & effect diagrams, 5-why…
’RCA is concerned with answering the question: “What are the factors that directly resulted in the nature, the magnitude, and the timing of the issue you are dealing with?”
It is important to realise that every product failure has:
a) set-up factors that established the vulnerability,
b) triggering factor(s) that enabled the vulnerability,
c) exacerbating factors that made the effect as bad as it was,
d) mitigating factor(s) that kept the effect from being worse.
It should be noted that in real life there is never one single root cause – and that each root cause can have a ‘physical’ a ‘systemic’ and a ‘human’ component. RCA is an essential component of an effective FRACAS system.
Life Data Analysis
This is sometimes also known as ‘Weibull Analysis”, however this is a general analysis methodology that models the failure rate of components (or failure modes) with age. This ‘age’ can be represented by time, cycles to failure or miles/kilometres travelled… Life Data Analysis is an advanced technique that is capable of reasonable accuracy with very little data.
It is particularly useful in small sample size testing and is employed across many industrial sectors. Further information can obtained via the international standard IEC61649 – “Weibull Analysis” or from the www.weibull.com website.
Accelerated Life Testing
Accelerated life testing is an advanced method designed to provide reliability information on a product, component or system using fewer samples and with shorter test duration than conventional test methods by using the techniques of Life Data Analysis (Weibull Analysis).
Data is obtained from tests that use higher stress levels or higher usage rates compared to normal operating conditions. Typical stresses for electronic products are; temperature (static & cycling), humidity, voltage…etc. For mechanical components typical stresses are; temperature (static & cycling), pressure, vibration, applied load…etc. An accelerated life model using this test data is then created and used to extrapolate back to normal use conditions.
Accelerated life models usually consist of a life distribution (ie a description of the probability of failure as a function of age at that particular stress level) and a ‘Life-Stress’ relationship (ie how the product life is affected by stress) This ‘Life –Stress’ relationship usually comes from the ‘physics of failure’ however there are many models described in the reliability literature. (see www.weibull.com for example).
System Reliability using Reliability Block Diagrams (RBD)
A system is made up of components and subsystems. As system reliability testing may be impractical, it may be easier and less expensive to test components/subsystems than to test complete systems – particularly if the system is a nuclear reactor. If life testing is performed by the vendors of the individual components, then this information can be readily incorporated into a system level model of the product.
A reliability block diagram is a graphical representation of how the components and subsystems of a system are “reliability-wise” connected. Blocks represent the components of the system and are connected via lines – creating something that looks like a piping schematic. The structure of these connections affects the reliability of the system – it represents the ‘reliability architecture’ of the system. Each block can have both failure and repair characteristics
Once an RBD has been designed and appropriate failure and repair information has been entered, it is possible to predict mean times to failure, availability and maintainability metrics, warranty periods at a specified risk level as well as undertaking scenario modelling related to duty cycle and variable operating stresses. An important application of RBDs in design is the ability to do ‘design optimisation studies’ – for example; using RBDs we can simulate the reliability and effect on product cost of different architectures (including the use of parallel redundancy for instance), we can also experiment with using components with different quality attributes in order estimate system reliability goals in a variety of operating conditions very early in the product design cycle.
Reliability Engineering is a huge subject that covers the whole product life cycle (from early concept design, through detailed design and development, testing, manufacturing and eventually through to in-field support and end of life decommissioning).
Reliability Engineering is a mature engineering discipline that is supported by a powerful and varied set of tools that can be used across industry sectors to manage risk and costs.
Related Technology
Related Software
- ReliaSoft Weibull++
- ReliaSoft ALTA
- ReliaSoft DOE++
- ReliaSoft
- ReliaSoft RGA
- ReliaSoft BlockSim
- ReliaSoft RENO
- ReliaSoft Lambda Predict
- ReliaSoft Xfmea
- ReliaSoft FMEA Accelerator
- ReliaSoft RCM++
- ReliaSoft XFRACAS
- ReliaSoft MPC 3
- ReliaSoft FMEA Accelerator
- ReliaSoft Evaluation Products
Related Applications
- Failure Modes, Effects and Criticality Analysis (FMEA & FMECA)
- Life Data (Weibull) Analysis
- Accelerated Life Testing
- Experiment Design & Analysis
- Reliability Growth Analysis
- System Reliability & Maintainability Analysis
- Probabilistic Event & Risk Analysis
- Standards Based Reliability Prediction
- Reliability Centered Maintenance
- Failure Reporting & Corrective Action System (FRACAS)
- Educational Users




