Plant Maintenance Resource Center

Maintenance Task Selection

Join Now

FREE registration allows you to support this site and receive our regular M-News newsletter.

Maintenance Task Selection - Part 3

Summarised by : Sandy Dunn
Webmaster, Plant Maintenance Resource Center

This is part two of a summary from the plantmaint Maintenance discussion forum which discusses alternative approaches to Maintenance task selection, including RCM, PM Optimisation (PMO), RCMCost, and others - it also touches on Total Productive Maintenance (TPM)

Go Back to Part 1 of this discussion

From: John Moubray

Kim

Much of what has been written, especially by Steve Turner and Sandy Dunn, in response to your initial questions about RCM invites further comment and clarification. So here goes.

1: RCM in commercial aviation

Steve Turner states that "RCM was originally developed as a design tool.... ". This comment is simply not true. To place it in perspective, however, it is necessary to review the evolution of RCM inside and outside the commercial aviation industry.

Before doing so, it is perhaps worth noting that Mr Turner bases his comments on his ten years of experience in RCM and on his copy of Nowlan & Heap's report. For the record, I also possess a copy of Nowlan & Heap's report. In addition, Stan Nowlan himself was my personal mentor in this field, from 1981 until his death in 1995. I am also currently serving on a working group commissioned by the US Air Transport Association to review MSG3, which is the name of the process used by the airlines to develop maintenance programs for commercial aircraft. The comments that follow reflect this experience.

RCM finds its roots in the early 1960's. The initial development work was done by the North American civil aviation industry. The airlines at that time began to realise that many of their maintenance philosophies were not only too expensive but also actively dangerous. This realisation prompted the industry to put together a series of "Maintenance Steering Groups" to re-examine everything they were doing to keep their aircraft airborne. These groups consisted of representatives of the aircraft manufacturers, the airlines and the FAA.

The first attempt at a rational, zero-based process for formulating maintenance strategies was promulgated by the Air Transport Association in Washington DC in 1968. The first attempt is now known as MSG 1 (from the first letters of Maintenance Steering Group). A refinement - now known as MSG 2 - was promulgated in 1970.

In the mid-1970's the US Department of Defence wanted to know more about the then state of the art in aviation maintenance thinking. They commissioned a report on the subject from the aviation industry. As mentioned by Ron Doucet, this report was written by Stanley Nowlan and Howard Heap of United Airlines. They gave it the title "Reliability Centered Maintenance". The report was published in 1978, and I agree with Steve Turner's comment that it is still one of the most important documents - if not the most important - in the history of physical asset management. It is available from the US Government National Technical Information Service, Springfield, Virginia.

Nowlan & Heap's report represented a considerable advance on MSG 2 thinking. It was used as a basis for MSG 3, which was promulgated in 1980. MSG 3 has since been revised twice. Revision 1 was issued in 1988 and revision 2 in 1993. It is used to this day to develop prior-to-service maintenance programs for new aircraft types (recently including the Boeing 777 and Airbus 330/340).

Copies of MSG 3 revision 2 are available from the Air Transport Association, Washington DC.

Several points to note from this history:

the term "reliability centered maintenance" is not used in commercial aviation. The term does not even appear in the MSG3 document. In fact, there are some fundamental differences between RCM (both as it is described in the Nowlan & Heap report and as it is described in SAE JA1011) and MSG3. Many of these differences find their roots in assumptions made about the training and skills of the maintenance technicians found in commercial aviation and in other industries. (In general, the former are much more highly trained and their training is focused - far more so than usual - on the needs of a specific industry. As a result, many of the tasks in maintenance programs developed using MSG3 are described and grouped in ways that have precisely defined meanings within commercial aviation but that are meaningless outside commercial aviation.)
the Nowlan & Heap report was commissioned by the US Department of Defence. Consequently, as the authors well knew at the time, it was written specifically for use by people outside the commercial aviation industry

2: Maintenance strategy formulation in the US nuclear power industry.

One justification put forward by Steve Turner (and others) for using retroactive or reverse RCM approaches like PMO 2000 is that they have been used in the US nuclear power industry. I personally have not applied RCM in a US nuclear power station (although our network has assisted nuclear facilities with the application of RCM II in other countries.) Consequently, I cannot comment personally on the application of RCM in the US nuclear environment. However, I am able to draw your attention to comments made by Dr David Worledge, who perhaps knows more than anyone else about the application of RCM to US nuclear power stations. Dr Worledge worked for the Electric Power Research Institute (EPRI) from 1981 to 1995, and headed the initial pilot applications of RCM in US nuclear power stations from 1982 to 1985, and their subsequent development. (He now works as an independent consultant.)

On 24 and 25 August 1999, an RCM conference was held in Denver, Colorado. It was organised by Electric Utility Consultants Inc and was aimed exclusively at the electricity transmission and distribution sectors. The speakers were a whole variety of RCM consultants and end-users, myself included. I made a few comments about RCM in the US nuclear power industry during a presentation on the history of RCM. After this presentation, Dr Worledge stood up to make some additional comments from the floor.

The gist of his comments was as follows: The initial maintenance programs in US nuclear power plants were developed in conventional fashion, relying heavily on vendor recommendations. Continuing efforts to enhance safety and reliability, and ever increasing regulatory requirements resulted in utility management at some plants questioning whether the overall result was a significant degree of over-maintenance. By the early 1980's, the nuclear power industry often seemed to be faced with a choice of either generating power or doing the prescribed PM. They had to find a way of reducing the PM workloads quickly without prejudicing safety or reliability.

EPRI became aware of the Nowlan & Heap report entitled "Reliability-centered Maintenance", which was published in 1978. This seemed to offer a solution to their problem. However, after initial applications of "classical RCM" by EPRI, many plants developed their own methods for maintenance optimization, some of which departed from RCM principles. Dr Worledge stressed to bring some order to this situation it became EPRI's objective to reduce PM workloads using standardized but "streamlined" approaches which took advantage of certain features of the design of nuclear power plants, but which kept close to the philosophy of classical RCM. They took the view that high levels of redundancy in their safety systems, high levels of regulator-imposed failure-finding tasks, and the fairly simple mission of the power generating systems at such plants could validly support certain simplifications of the methodology. They also took the view that at least in older plants the existing operating experience had encountered all reasonably likely failure modes, further supplemented in some cases by comprehensive risk assessments and very detailed record keeping carried out by the nuclear power industry itself. In addition, each plant already had a detailed system functional review performed in its Final Safety Analysis Report, as part of obtaining its operating license. Consequently, they felt that the function analysis and the FMEA steps embodied in the RCM process could be simplified.

A further notable aspect of their situation was that in most plants all the key protective devices and safety systems used in nuclear power stations were already covered by fairly comprehensive maintenance programs. As a result, there was often more interest in removing superfluous maintenance activities, which in some cases were actually damaging to reliability and availability of safety systems. However, there was still a drive to improve reliability in power generating systems because the industry was in need of increasing plant capacity factors.

The most abbreviated of three streamlined approaches, (recommended by EPRI in EPRI TR-105365, September 1995), modified the RCM process by setting up a list of simple functional questions such as "does this component failure lead to a plant trip, or to a power reduction of >5%, the loss of a safety function, to a plant transient, or a personnel hazard, or a delay in start-up", etc, without further functional analysis. Two additional streamlined approaches in the EPRI report closely resembled classical RCM with some liberties taken over documentation, and the early separation of clearly less important components.

A further approach which some have described as "reverse RCM" where existing PM tasks are simply re-examined as to their utility and cost effectiveness, was sanctioned by EPRI only in urgent situations (under the name Outage Management Assessment) to try to reduce the pressing workload for an upcoming, already scheduled, refueling outage. Reverse RCM was never recommended by EPRI for general use, and did not form part of its recommended streamlined approaches.

Dr Worledge concluded his remarks by saying that in his opinion, these processes achieved their limited objectives in the nuclear power industry, in that they led to very substantial reductions in PM workloads without appearing to prejudice safety or reliability. However, he then went on to express the opinion that caution should be exercised when a process developed to solve a very specific set of problems in the unique environment of the US nuclear power industry is proposed for use in other industries - such as oil and gas, thermal power generation and electricity T&D - where the same initial conditions may not apply.

3: The SAE RCM standard

Since the Nowlan & Heap report was published, a great many processes have emerged that claim to be RCM. Many of them bear little or no resemblance to the process described by Nowlan & Heap. This became a cause of grave concern to many organisations. In particular, the US Naval Air Command (Navair), which was one of the sponsors of the original N&H report, found that some vendors were using all sorts of weird and wonderful processes which they described as "RCM" to develop maintenance programs for equipment that they were selling to Navair. (The history of RCM in the US military has been ably described by Dana Netherton, chairman of the SAE RCM committee, in articles that appeared in maintenance journals in Australia, the USA and the UK.)

These aberrant RCM processes led Navair to approach the SAE - as a recognised standards-setting institution with close ties both to the US Military and to the aerospace sector - for help with the development of a standard that could be used to define what is and what is not RCM. This standard (SAE JA1011) was published in August 1999 and can be obtained from the SAE at www.sae.org.

The standard is important because of a tendency for vendors of strategy formulation processes other than RCM to compare their processes with RCM, but without specifying which version of RCM. In particular, beware of comparisons to something called "Classical" RCM. Nowhere in the literature on this subject have I encountered a description of a process which is specifically labelled "Classical RCM", so it seems to be a convenient mirage. In some of the cases where the term has been used, it seems to refer to an horrendously complicated variant of the process which not only calls for the analysis to be carried out at far too low a level in the equipment hierarchy, but also requires users to prepare complex (and usually unnecessary) functional block diagrams before starting the analysis. Almost any analytical process is likely to be an order of magnitude quicker than this aberration.

All this means that when asked to compare any non-standard version of RCM with RCM, care needs to be taken to establish whether the comparison is being made with a version of RCM that complies with the SAE standard.

4: RCM in industries other than aviation and nuclear power

Sandy Dunn states that "I have recently come to the conclusion that, in contrast to the position that is put forward by John Moubray, Ron Doucet and others from the RCM II religion, the major problem is that RCM II (and by definition RCM) has NOT been sufficiently adapted to meet the needs of industry outside the airline industry."

Firstly, as discussed in section 2 above, RCM is not used by the airline industry. Secondly, as discussed in section 3 above, the SAE RCM Standard (not RCM II) defines what RCM is. (From now on, unless stated otherwise, when I use the term RCM, I will be referring to any process - of which RCM II is one - that complies fully with the SAE Standard.)

These two points apart, my view of the applicability of RCM is very different. Together with Aladon's network of licensees, I have been directly and indirectly involved with the application of RCM II on more than 1200 industrial sites spanning 42 countries. These applications have embodied the performance of several thousand RCM analyses.

It is true to say that the application of RCM has not been successful in every case. It can be said to have failed in about one third of the orgainsations where it has been tried, either because the organisations concerned did not derive the benefits that they hoped to from the RCM process or the RCM initiative collapsed before it could yield much in the way of significant results. In our experience, none of the initiatives that failed did so for technical reasons. Without exception, the initiatives that failed did so for organisational reasons. Of these, the two most common reasons for failure are:

the principal internal sponsor of the initiative quit the organisation or moved to a different position before the new ways of thinking embodied in the RCM process could be institutionalised
the internal sponsor and/or the consultant who was acting as the change agent could not generate sufficient enthusiasm for the process for it to be applied in a way which would yield results.

Of course, if one third of these applications have failed, then two thirds have been successful. This success rate is at least as good as, if not better than, the success rate achieved by major organisational change initiatives in general.

At this point, it is worth noting Sandy Dunn's observation that his experience in Australia was that "for every ten organisations that started to implement RCM II, only one ever implemented the process on anything other than a "pilot project" scale". He also states that these failures were not "due to any failure on the part of the consultant concerned." Firstly, my records indicate that his personal experience of RCM II is indeed true. Only about one tenth of the RCM II projects with which he personally was associated went beyond the pilot stage. However, a great many other RCM practitioners have been active in Australia for the past ten years, and their collective experience is that about two thirds of the applications of RCM to date have progressed well beyond the pilot stage (not 1 in 10). This sharp contrast bears out my own experience gained from working with some 200 licensed RCM II practitioners worldwide over a period of 15 years - of whom about 120 are currently active: there is in fact a high correlation between the success rate of RCM II applications and the change management capabilities of the consultants involved. (Among others, the British Royal Navy, which is a major user of SAE-compliant RCM, has come to understand that the capabilities of individual consultants are every bit as important as the track record of their employers. So much so that the RN now insists on interviewing at great length every RCM consultant that is to be placed at their disposal, in addition to verifying the commercial bona fides of their employers.)

When discussing the "success" of RCM, we need to look at both the economic benefits and the question of risk.

5a: The economic benefits of RCM

Kim, you are absolutely correct to observe that "... a lot of maintenance decision makers I have met look mainly at the tangible returns (minimum cost, minimum project duration) rather than the projected expected returns of carrying out RCM." In fact, if RCM is correctly applied by properly trained people working under the direction of a skilled facilitator, and the project has been properly planned before it starts, it usually pays for itself in between two weeks and two months. (In some cases, the payback period has been measured in days and sometimes one or two years, but the norm is weeks to months.) This is a very rapid payback indeed.

In nearly every case, these economic benefits flow from improved plant performance rather than reductions in the direct cost of maintenance (although very substantial reductions in direct maintenance costs have been achieved in some cases, especially by military users). From the economic point of view, improved plant performance can manifest itself in a variety of ways, such as an increase in total throughput, a reduction in failure rates, an increase in plant availability or a reduction in scrap rates. Some examples are as follows:

a small dairy products factory in Scotland: a 20% increase in total throughput. This increased the contribution to group profits at plant level by £1 million per annum, while the total cost of the project (including the cost of the manhours spent undergoing training and attending review group meetings) was less than £200 000. The analysis of this entire plant was completed in three months
a plant manufacturing steel wheels for automobiles in England: productivity increased from 35 wheels per man per shift to 105 wheels per man per shift in the space of six months (same machines, same people)
a paper mill in Pennsylvania: a complex new boiler control system failed five times in three years, shutting off steam (and co-generated electricity supplies) to the paper mill, causing a complete mill shutdown. The total cost of these failures was US$11 million, and the company had been unable to solve the problems using conventional problem-solving techniques. They then applied RCM. The project took six months, the RCM-derived recommendations were implemented and there were no further failures in the ensuing three years.The total cost of performing the RCM analysis and implementing the proposed remedies was US$200 000. A $11m saving for an outlay of $200k amounts to a payback of about one week
an iron ore mine in Canada: at a recent conference in Toronto, Ron Doucet cited the case of an RCM analysis of an ore crusher. The analysis cost CDN$80 000 and led to an increase in throughput that was worth a nett $4.8 million per annum. A payback of less than a week.
a microelectronics assembly plant in Malta: scrap rates on one production line were reduced from 4% to 50 parts per million - an 80-fold reduction - in the space of six weeks. Payback in this case was measured in weeks. I could cite a great many other cases. (Sandy Dunn says that RCM "... is complete overkill in most situations in most industries." If results like these are overkill, then long live overkill.)

Both Steve Turner and Sandy Dunn also state that RCM is only worth applying in "high-risk" industries such as petrochemicals and oil & gas. Steve Turner goes further, by suggesting that it is a waste of time to apply RCM to mature plants. Suffice it to say that none of the above examples are from "high risk" petrochemical-type industries, and all the plants concerned had been in service for at least three years, and in some cases much longer.

Cost-effectiveness apart, another comment frequently made about true RCM is that "it takes too long". For instance, at one point, Steve Turner writes: "If you use PMO2000 you will have these (hazardous problems) under control in one year, if you use traditional RCM it will take you six." This implies that it would take six years to analyse all the equipment in a major facility using true RCM. Suffice it to say that the world's largest coal fired power station used RCM II to analyse all 65 of its major systems in a period of 18 months, without losing a microwatt of generating capacity due to the analysis and in circumstances where it was as difficult for them to commit key resources to this process as it has been anywhere else in the world. (Or I could cite the case of the two Malaysian CCGT power stations that also analysed all their equipment in 18 months, or the large UK candy factory - employing 55 maintenance craftsmen - that did likewise. And so on .....)

5b: RCM and risk

Everyone who has commented on RCM seems to agree that it is a good tool for developing maintenance programs in "high risk" situations. Sandy Dunn is also correct when he says "I have heard it said (even by John Moubray himself) that one cannot justify applying RCM to all equipment items - some equipment items have such low impact on business risk that the effort required to perform RCM analysis on them is greater than the potential benefits." However, as those who have heard me speaking at conferences in the recent past will be aware, my position on this subject is changing.

I am increasingly coming around to the view that no physical asset or system can be deemed to be "low risk" unless it has been subjected at the very least to a zero-based FMECA (and preferably a full RCM review) that proves beyond a reasonable doubt that it is in fact low risk. There are two reasons why my viewpoint has changed.

The first reason is actually a combination of factors: feedback from our network concerning the results of the thousands of RCM II analyses that are being performed around the world, and incidents in supposedly "low risk" industries that have had very grave business implications.

The feedback from our network speaks of case after case of supposedly innocuous systems that turn out to embody very surprising and potentially deadly failure modes. In our experience, on average about 4% (1 in 25) failure modes are deemed to have direct safety or environmental implications. We also frequently find that as many as 25% of failure modes have potentially hazardous consequences but are not currently receiving any form of PM. Most of the latter failure modes deal with protective devices that have not been receiving attention any sort of attention prior to the RCM II analysis. This issue is discussed further later.

(These data differ totally from those put forward by Steve Turner when he says "Further to this, in my ten or so years of facilitating RCM analysis, I have put about 1 in every 200 failures in the hazard category. Of these, only once have I ever felt the RCM team had uncovered a potential hazard that was not receiving any PM. My rough calculations tell me that the benefit of RCM over PMO2000 is the one new hazard found in 15,000 failure modes." It is also worth noting that although Steve Turner did attend Aladon's RCM II practitioners' course, he always seems to compare PMO 2000 with one of the forms of "Classical" RCM.)

What about the supposedly "low risk" industries? Two sectors that are frequently said to be "low risk" - and hence not worth rigorous analysis - are automobile factories and food plants. In fact, simply reading the newspapers shows how inappropriate it is to dismiss either of these industries as low risk, as the following examples indicate:

the boiler that blew up (during a maintenance inspection) at Ford's River Rouge plant in Detroit in February 1999, killing six people and shutting the plant down for 1.5 weeks. A huge business risk.
the failure of the Firestone tyres on Ford Explorers which has been partly attributed to the design of the tyres, partly (and arguably) to the pressures at which the tyres were operated and partly (mainly?) to failures (failure modes) in the manufacturing process used to produce the tyres in one plant. These failures pose a serious threat to the continued existence of Firestone as a company - perhaps the ulimate business risk
the failure of a filter used in the Perrier water bottling plant in France, leading to the recall of hundreds of thousands of bottles of Perrier water at enormous cost to the company
the contamination (another failure mode) of pallets used by Coca Cola in Belgium, leading again to a massive and very expensive product recall, in addition to seriously damaging the reputation of the company in Europe. Note that all these failures involve the failure of physical assets. In the case of the Coca Cola plant, it was pallets, which are just the sort of simple, massively redundant items that are likely to be dismissed as "non-critical" (until after the event).

The second reason why my views on criticality are changing concerns the legislative environment in which more and more users of physical assets are operating. The reaction of society as a whole to equipment failures is changing at warp speed as we move into the 21st century. The changes began with sweeping legislation governing industrial safety, mainly in the 1970's. Among the best known examples of such legislation are the Occupational Safety and Health Act of 1970 in the United States and the Health and Safety at Work Act of 1974 in the United Kingdom. These Acts are fairly general in nature, and similar laws have been passed in nearly all the major industrialised countries. Their intent is to ensure that employers provide a generally safe working environment for their employees.

These Acts were followed by a second wave of more specific safety-oriented laws and regulations such as OSHA Regulation Nº 1910.119: "Process Safety Management of Highly Hazardous Chemicals" in the United States and the "Control of Substances Hazardous to Health Regulations" in the United Kingdom. Both of these regulations were first promulgated in the early to mid-1990's. They are noteworthy examples of a then-new requirement for the users of hazardous materials to perform formal analyses or assessments of the associated systems, and to document the analyses for subsequent inspection if necessary by regulators.

These two sets of developments represent a steady increase in legal requirements to exercise - and to be able to demonstrate that we are exercising - responsible custodianship of the assets under our control. They have placed a significant burden on the managers of the assets concerned. However, they reflect the rising expectations of society in terms of industrial safety, and we have no choice but to comply as best we can.

It would be nice if it all ended there, but unfortunately this tide has not stopped rising. The late 1990's have seen even more changes, this time concerning the sanctions that society now wishes to impose if things go wrong. Until the mid-90's, if a failure occurred whose consequences were serious enough to warrant criminal proceedings, the proceedings usually ended at worst with a substantial fine imposed on the organisation found to be at fault, and the matter - at least from the criminal point of view - usually ended there. (Occasionally, the organisation's permit to operate was withdrawn, as in the case of the ValuJet airline after the crash in Florida on 11 May 1996. This effectively put the airline out of business in its then-current form.)

However, following recent disasters, a movement is now developing not only to punish the organisations concerned, but also to impose criminal sanctions on individual managers. In other words, under certain circumstances, individual managers can be sent to prison in connection with equipment failures that have sufficiently nasty consequences. Stephen Young has mentioned the pending legislation in the States of Victoria and Queensland in Australia, which propose custodial sentences not only for specific individuals, but for whole teams of people. Ron Doucet also mentioned the changes to the Evidence Act in Victoria.

Legislative developments of this sort have not only taken place in Australia. For instance, in the United Kingdom, John Prescott, the Minister of Transport, has stated that in the light of the official inquiry into the Paddington rail crash that occurred in 1999, he will introduce a law for a crime to be called 'corporate killing', part of which will entail prison sentences for specific executives. In the United States, following the outcry about the accidents involving tire tread separation on SUV's, section 30170 of the "Motor Vehicle and Motor Vehicle Defect Notification Act" was revised in October 2000 to include prison sentences of up to 15 years for "directors, officers or agents" of vehicle manufacturers who commit specified offences in connection with vehicles that fail in a way that causes death or bodily injury.

There is considerable controversy about the reasonableness of these initiatives, and even some doubt about their ultimate enforceability. However, from the point of view of people involved in the management of physical assets, the issue is not what is reasonable, but that we are increasingly being held personally accountable for actions that we take on behalf of our employers. Not only that, but if we are called to account in the event of a serious incident, it will be in circumstances that could culminate in jail sentences.

(Kim, in this context, you were actually not joking when you wrote "With all this talk of litigation it's amazing we don't have the company legal eagles doing reviews of their equipment strategies". I know of at least one major petrochemical company that requires all FMEA's to be reviewed by the company's lawyers before they are signed off.)

The message to us all is that society is getting so sick of industrial accidents with serious consequences that not only is it seeking to call individuals as well as corporations to account, but (in the case of the Victoria Evidence Act) that it is prepared to alter well-established principles of jurisprudence to do so. Under these circumstances, everyone involved in the management of physical assets needs to take greater care than ever to ensure that every step they take in executing their official duties is beyond reproach. It is becoming professionally suicidal to do otherwise.

6: Planned Maintenance Optimisation

As explained by Steve Turner, PMO starts not by defining the functions of the asset (as specified in the SAE RCM Standard), but starts with the existing maintenance tasks. Users of this approach are then asked to try to identify the failure mode that each task is supposed to be preventing, and then work forward again through the last three steps of the RCM decision process to re-examine the consequences of each failure and (hopefully) to identify a more cost-effective failure management policy. (This approach is what is most often meant when the term 'streamlined RCM' is used. It is also known as "backfit" RCM or "RCM in reverse".)

These retroactive approaches are superficially very appealing, so much so that I tried them myself on numerous occasions when I was new to RCM. However, in reality they are also among the most dangerous of the streamlined methodologies, for the following reasons:

they assume that existing maintenance programs cover just about all the failure modes that are reasonably likely to require some sort of preventive maintenance. In the case of every maintenance program that I have encountered to date, this assumption is simply not valid. If RCM is applied correctly, it transpires that nowhere near all of the failure modes that actually require PM are covered by existing maintenance tasks. As a result, a considerable number of tasks have to be added. Most of the tasks that are added apply to protective devices, as discussed below. (Other tasks are eliminated because they are found to be unnecessary, or the type of task is changed, or the frequency is changed. The nett effect is usually a reduction in perceived PM workloads, typically by between 40% and 70%.)
when applying retroactive RCM, it is often very difficult to identify exactly what failure cause motivated the selection of a particular task, so much so that either inordinate amounts of time are wasted trying to establish the real connection, or sweeping assumptions are made that very often prove to be wrong. These two problems alone make this approach an extremely shaky foundation upon which to build a maintenance program.
in re-assessing the consequences of each failure mode, it is still necessary to ask whether "the loss of function caused by the failure mode will become evident to the operating crew under normal circumstances". This question can only be answered by establishing what function is actually lost when the failure occurs. This in turn means that the people doing the analysis have to start identifying functions anyway, but they are now trying to do so on an ad hoc basis halfway through the analysis. If they do not, they start making even more sweeping - and hence often incorrect - assumptions that add to the shakiness of the results.
retroactive approaches are particularly weak on specifying appropriate maintenance for protective devices. As stated on page 172 of the second edition of my book on RCM: "at the time of writing, many existing maintenance programs provide for fewer than one third of protective devices to receive any attention at all (and then usually at inappropriate intervals). The people who operate and maintain the plant covered by these programs are aware that another third of these devices exist but pay them no attention, while it is not unusual to find that no-one even knows that the final third exist. This lack of awareness and attention means that most of the protective devices in industry - our last line of protection when things go wrong - are maintained poorly or not at all." So if one uses a retroactive approach to RCM, in most cases a great many protective devices will continue to receive no attention in the future because no tasks were specified for them in the past. Given the enormity of the risks associated with unmaintained protective devices, this weakness of retroactive RCM alone makes it in my opinion completely indefensible. (Some variants of the retroactive approach - such as S-RCM - try to get around this problem by specifying that protective systems should be analysed separately, often outside the RCM framework. This gives rise to the absurd situation that two analytical processes have to be applied in order to compensate for the deficiencies created by attempts to streamline one of them)
more so than any of the other streamlined versions of RCM, retroactive approaches focus on maintenance workload reduction rather than plant performance improvement (which is the primary goal of function-oriented true RCM). Since the returns generated by using RCM purely as a tool to reduce maintenance costs are usually lower - sometimes one or two orders of magnitude lower - than the returns generated by using it to improve reliability, the use of the ostensibly cheaper retroactive approach becomes self defeating on economic grounds, in that it virtually guarantees much lower returns than true RCM.

7 Summary

In nearly all cases, the proponents of the retroactive approaches to RCM claim that these approaches can produce much the same results as true RCM in much less time. (Steve Turner claims that PMO is six times quicker, although he compares PMO with "Classical" RCM, not RCM II.) However, the above discussion indicates that not only do they produce nothing like the same results as true RCM, but that they contain logical or procedural flaws which increase risk to an extent that overwhelms any small advantage they might offer in reduced application costs. It also transpires that if one seeks to avoid making some of the more gratuitous assumptions required by retroactive techniques, they actually end up taking longer and costing more to apply than true RCM, so even this small advantage is lost. As a result, the business case for applying retroactive RCM is suspect at best.

However, a rather more serious point needs to be borne in mind when considering these techniques. The very word 'streamline' suggests that something is being omitted.(For instance, Steve Turner states that PMO usually omits the function identifcation step, and that as a result, it only identifies half of the reasonably likely failure modes that would be identified using even 'Classical' RCM.) In other words, there is to a greater or lesser extent a degree of sub-optimisation embodied in all of these techniques.

Leaving things out inevitably increases risk. More specifically, it increases the probability that an unanticipated failure, possibly one with very serious consequences, could occur. If this does happen, as suggested above, managers of the organisation involved are increasingly likely to find themselves called personally to account. If the worst comes to the worst, they will not only have to explain, often in an emotionally-charged courtroom confronted by bitterly hostile legal Rottweilers, what went wrong and why. They will also have to explain why they deliberately chose a sub-optimal decision-making process to establish their asset management strategies in the first place, rather than using one which complies fully with a Standard set by an internationally-recognised standards-setting organisation. It would not be me that they would have to convince, not their peers and not their managers, but a judge and jury.

One rationale often advanced for using the streamlined methods is that it is better to do something than to do nothing. However, this rationale misses the point that all the analytical processes described above, retroactive or otherwise, require their users to document the analyses. This means that a clear audit trail exists showing all the key information and decisions underlying the asset management strategy, in most cases where no such documentation has existed before. If a sub-optimal approach is used to formulate these strategies, the existence of written records makes every shortcut much clearer to any investigators than they would otherwise have been. (This in turn may suggest that perhaps we should simply forget about all of these formal analytical processes. Unfortunately, the demand for documented analyses embodied in the second wave of safety legislation mentioned above does not allow us this option.)

A further rationale for streamlining says something like "we have been using this approach for a few years now and we haven't had any accidents, so it must be all right." This rationale betrays a complete misunderstanding of the basic principles of risk. Specifically, no analytical methodology can completely eliminate risk. However, the difference between using a more rigorous methodology and a less rigorous methodology may be the difference between a probability of a catastrophic event of one in a million versus one in ten thousand. In both cases, the event may happen next year or it may not happen for thousands of years, but in the second case, it is a hundred times more likely. If such an event were to happen, the user of a form of RCM that complies with the SAE Standard would be able to claim that he or she exercised prudent, responsible custodianship by applying a rigorous process that complies with an internationally recognised standard, and as such would be in a highly defensible position. Under the same circumstances, the user of any "downsized" and hence non-compliant form of RCM is on much, much shakier ground.

8 Conclusion

It is interesting to note that all but one of the people who have chosen to comment at length in this discussion (myself included) are consultants. Consultants of course have commercial axes to grind, which will lead many readers to say "well, he would say that, wouldn't he." This leads me to make two final suggestions in closing:

take special note of the views of the one commentator who is a practising maintenance manager and hence who does not have a commercial axe to grind, but who feels strongly enough about all this stuff - based on personal experience - to share his thoughts at length (Ron Doucet of the Iron Ore Company of Canada), and
if you really want to satisfy yourself about the relative merits of each approach, try them both on a pilot scale, preferably on the same type of equipment. Look at the outcomes in terms of documented maintenance programs (with a special eye on defensibility) and in terms of benefits achieved related to costs. Then make up your own mind.

With best wishes for the festive season

John Moubray

From: Andrew Jardine

I was very pleased to read the RCM overview by John Moubray. It certainly helped me to put much of the recent correspondence in perspective.

Andrew Jardine

From Peter Ball

When I casually suggested to Stephen recently "Grasp the subject - the words will follow" little did I envisage the 6912 word explanation from JM himself, in defence of his RCM.

There seems to be no end of grief outpouring concerning this Classical Vs Streamlined RCM which now incorporates Conventional PMO, Reverse RCM, and "RETROACTIVE RCM" even. The big funny of it all seems to be the ordained role of SAE to provide a Standard (JA1011) to prop-up "Classical RCM". Ho Ho Ho!

Streamlined RCM appears to be considered as an anathema.

My view, for what it is worth is that there is no BIG DEAL here.

Put to one side all of these "versions" of the methodology, and get back to BASICS.

Consider basic reliability centred maintenance. All that is required is a sensible mechanical (or electrical) engineer with access to the plant asset register, and the accounts department.

The tools needed include:

FMECA (Failure Mode Effects and Criticality Analysis),
FTA (Fault Tree Analysis),
Pareto Analysis,
Block Diagrams,
Weibull Analysis, and
an understanding of Risk / Cost Management.

No need for RCM trained teams and in-house facilitators; or software that comes only with the training.

Do not overlook the human aspects of reliability; Good Management providing Good working environment, usually results in Good Reliability.

In the mid to late 80's I introduced 'basic' reliability centred maintenance into the Australian Uranium mining industry. The word Reliability up until then was a management consideration of employee performance. Using the above 'tools' I developed the appropriate maintenance strategies with the end result that the insurance cover was withdrawn from the underwriters, the first year rewards were published as in excess of $AU1million, and things stopped breaking-down.

Happy 2001 to Everyone.

Peter Ball

From Trevor Hislop

Yeeeeeah !!!

Three cheers for common "basic" sense from Peter !

Trevor

From Dana Netherton

I sympathize with the sentiments ... but sentiments are no substitute for judgement, especially when valuable and dangerous physical assets are at stake.

I know I'm new to this forum, so a word of introduction may be in order. I'm Dana Netherton, the chair of the SAE subcommittee that wrote the RCM standard. I understand that Peter (Ball) has "used my name in vain" a few times in the past. I finally thought I would take a look for myself. :-)

In the interests of fair play, I should say a little about my background and setting. I started my working life in US Navy nuclear submarines (naval officer). After I left the Navy and finished some other academic studies, I went to work for an American consulting firm with US Navy contracts. Back in the late 1970s, that firm introduced the US Navy to RCM, and so a few years after I joined the firm, about 12 years ago, I began working in the field of maintenance management consulting and RCM. Most of my SAE committee's substantive work on the RCM standard was done while I worked for that firm. My role on the committee was to protect the interests of our US Navy client.

In recent years, that company has become enamoured of a process similar to the "reverse RCM" that John Moubray described in his very thorough posting of a few days ago. My work on the SAE committee showed me wider horizons, and gave me a broader perspective, than I had had while ensconced in my consultant's office.

In that committee, I met US Navy aviation people with experience in RCM (unconnected with my employer, or with Aladon) who took one look at "reverse RCM" and recoiled -- then returned to make biting and unanswerable comments about it. I met commercial people with experience in RCM -- in the steel industry, in the chemical processing industry -- who had the same reaction and the same comments. After some very serious soul-searching, I finally decided that I could not continue to support my employer's efforts to encourage people to use "reverse RCM" -- and also retain my sense of professional integrity.

So I left them, about a year and a half ago, once the SAE RCM standard was largely put to bed and our Navy client's interests were protected. I now own my own small consulting firm, Athos Corporation, which is a member of the Aladon Network. (BTW, my Aladon license is restricted to North America, so I have no commercial interest in Australia.) As I had expected, and as I am sure that people here can appreciate, starting a new business from scratch is no gold mine by any means; but I go to bed at night with a clear conscience.

Now then. I'd like to say two things about the SAE standard. First, I'd like to address the reasons why someone might like to use it. Then, I'd like to address what questions it does *not* answer -- because there are some important questions that it was deliberately intended to sidestep.

1. Why use the SAE standard?

As people can appreciate, I'm sure, I have spoken to a lot of people about the SAE standard over the past several years -- in many cases from conference platforms (in the USA). I have seen a few complaints about it, in the year or so since it was published in Oct 1999.

So far, every complaint has come from a consultant.

So far, every comment I have seen from a user has ranged from favorable to devoutely grateful.

Why?

Because the standard is not intended to meet the needs of consultants. It is intended to meet the needs of users.

Consultants need to establish the credibility of their process to their prospective clients. If this means attaching a recognized TLA (three-letter acronym) to whatever the heck it is that they do, then hooray, go for it! (All too often.)

Users need to know what sort of pig is inside this poke that has this label on it, this TLA that seems to say that the pig is such-and- so. When they buy a poke with *this* label on it, users need to be confident that they are getting *this* sort of pig inside it. (This is especially tricky when the users are not yet experts in the process they are about to embark on.)

(Is everyone in this international forum familiar with the slang phrase, "buying a pig in a poke"? In the US, at least, this means "buying something sight unseen" -- something that is *said* to be a pig, in an unopened sack or bag (an unopened "poke"). And it is almost always used to describe something you don't want to do ("Oh, I don't want to buy a pig in a poke"), because it carries the implication that what you are carrying away after the purchase is probably not what you thought you were buying. That's how I'm using the phrase here.)

In the 20+ years since the US Department of Defense published Nowlan & Heap's report, that TLA "RCM" has been attached to a heck of a lot of pokes, with a heck of a lot of different kinds of pigs inside.

Peter's e-mail, below, shows one kind of non-Nowlan-and-Heap pig that gets stuffed into the RCM poke. He asserts that "basic" RCM consists of a single sensible engineer with access to the plant's list of its physical assets and to its accounts department, an engineer who has tools such as the following:

FMECA (Failure Mode Effects and Criticality Analysis),
FTA (Fault Tree Analysis),
Pareto Analysis,
Block Diagrams,
Weibull Analysis, and
an understanding of Risk / Cost Management.

Of these tools, at least four are not mentioned at all by Nowlan and Heap: FTA, Pareto Analysis, Block Diagrams, and Weibull Analysis.

(N&H have a decision logic tree, but it is a different logic tree from the one customarily used in Fault Tree Analysis. Being aviation people, they are focused on airplanes, and they assume that the entire airplane will be reviewed -- therefore they do not use Pareto Analysis to decide which assets do not deserve a review. Their process does not require Block Diagrams, though it might use such diagrams if already available. And the failure curves they developed -- the famous six curves -- were not produced by Weibull Analysis, but by a different process that neither uses nor generates mathematical equations (such as the Weibull function). (Appendix C of their report, "Actuarial Analysis", describes their analytic process.) )

Of the remaining tools, the process used by N&H to examine failure modes is not the same as the FMECA process that is described in the various FMECA standards available from the US military, the SAE, and other sources. For one thing, N&H use the crucial term "failure mode" to refer to something that FMECA does *not* call a "failure mode". (We in the SAE RCM subcommittee found this out when we attempted to establish liaison/contact with the SAE subcommittee that is struggling to write a new FMECA standard.) )

And "an understanding of risk/cost management" is far looser, far more vague, than N&H's very specific process for addressing risks with respect to safety and economics. Does Peter mean N&H's approach to "risk/cost management"? Or does he mean someone else's approach? Or one he came up with on his own?

So. Peter's process may be very useful. It probably borrows valuable concepts and features from Nowlan and Heap's report. I'm sure he feels he has gotten very good results with it. I have no reason to dispute his results.

But the process he described in his e-mail is not the process that Nowlan and Heap lay out in their report.

So someone who *wants* Nowlan and Heap's process, and who hires Peter because Peter says he uses "RCM" (the name that Nowlan and Heap made famous), is likely to get something different from what he (the user) wants.

I don't pretend to know what's happening in Australia. I do know that, very quickly after the US Department of Defense decided to abandon Military Standards in the mid-1990s, the US Navy started getting pokes bearing the label, "RCM", with Joe Blow's pet process inside -- processes that had had only the most glancing contact (if any) with Nowlan & Heap's process.

Having had this experience, the US Navy got into the SAE's RCM standard project very quickly indeed because, as a user, the US Navy wanted to be sure -- to be *sure* -- that when it asked for RCM, it could predict what the Sam-Hill it was going to get.

Which is how *I* got into the project (on behalf of our US Navy client). To help make a standard that would help *users* know what they were getting when they asked for "RCM".

I don't know all of the organizations that have formally used the SAE standard so far, of course. I do know that the US Bureau of Land Management, in the US Department of the Interior, recently used the SAE standard in an RFP for consulting services. They seem to have been quite pleased to have access to this tool during their procurement process.

2. What questions does the SAE standard *not* answer?

When I speak to people about the SAE standard -- and as the chairman I have spoken to a lot of people about it in the last couple of years -- I point out that there are two entirely separate questions that one must answer when setting out to select a process (and sometimes, by implication, to select a consultant).

The first question is: "is this process RCM?" This is the question that SAE JA1011 is intended to answer.

If the answer is "yes", the second question is: "is this (RCM) process cost-effective?" SAE JA1011 does nothing whatsoever to help answer this question.

And it is a serious question. You see, it is possible to use any process either well or poorly. As I put it in my presentations, you can do RCM smart -- or you can do RCM "stoopid". :-)

It's still RCM. But one way is cost-effective, and the other way isn't.

A number of people here have probably heard some consultants complain about the "vast expense", and "relatively low return", of SAE-compliant RCM. I worked for one such firm, here in the US. I have seen presentations by another such American consulting firm. In both cases, the people in the firms were sincere in their complaints. They based their complaints on their own experience. Why did they make these complaints?

I'll tell you why. Because they did RCM "stoopid". Their approach was wasteful. They organized and trained themselves and their clients in such a way that it took them forever to get anything done - - and then, when it got done, it was sometimes a toss-up whether it would actually be implemented in the plant.

(The most visible defect was in the training. You don't learn to ride a bike proficiently by reading a book, or by taking a few days of classroom training. You certainly don't learn to use sophisticated and complex engineering analytic techniques proficiently that way. Why should anyone expect someone to be able to learn to do RCM proficiently that way?)

(The next most-visible defect was in the change-management area known as "buy-in". My old firm generated a lot of "shelfware" in the early 1990s, doing one-man analysis "on behalf of" the client (saved him a lot of work, didn't it?) whose reports were put on a shelf and forgotten -- because the people who were responsible for implementing the recommendations saw no reason to make those changes.)

And then, having taken forever to get things done (if anything really *got* done at all), these consultants complained about RCM itself. And cast about for ways to "streamline" the process itself -- instead of looking for ways to organize and train themselves better.

Personally, I am persuaded that it is possible to use an SAE JA1011-compliant process in a way that is cost-effective. It is not easy to come up with such a way (if it were, then everyone would have one!), but I think that at least one cost-effective SAE JA1011- compliant process does exist. I think that the US Naval aviation people on my subcommitee would say that at least two such processes exist.

But this is not a given. You don't always find that an SAE JA1011- compliant process is also a cost-effective process.

Those users who are concerned about the cost-effectiveness of the RCM process they might use would be well-advised to take the measures they would take when embarking on the use of any other sort of new process:

Decide what cost-effectiveness metrics are important to you, then check the track record of that particular process, and see what sort of experience others have had with it.

Learning to use a cost-effective RCM process at your site is not simple, and it is not easy. But was it simple or easy to build your site in the first place? How many important things are simple and easy?

-- Dana Netherton

My two cents worth :-)

From Terrence O'Hanlon

These debates about RCM and other strategies/methods are very interesting. Peter, Dana and John are obviously very experienced and well versed in what they do. I appreciate the detailed explanations as they provide a solid understanding of the foundations for these approaches. I doubt that these explanations will go very far to convince the "other side" to lay down its arms though!

I think that reports on actual results would be as interesting as the RCM debate if not more so! It would be even better to hear from field practitioners! Why did LTV file bankruptcy (citing "foreign" competition) while DoFasco (who must have the same competition) is growing it's profits? Did their approach to asset management have anything to do with it? What is the "foreign" competition doing to be so competitive?

How about a New Year resolution to end the RCM debate and start filing this list with stories about the what is working (and what is not) in the real world! Does anyone else agree?

RCM, RCM2, PMO, TPM, PdM, CBM and PM (now Plant Services Magazine www.plantservices.com is promoting EMM - Effective Maintenance Management!) all look great on paper.

What is working for you?

Happy New Year!

Terrence O'Hanlon

From Thomas Purackal

I would like to learn what are the differences between RCM,RCM-1&RCM-2.Request the experts like Mr.Peter,Mr.John and others to write.

Thomas Purackal.

From John Moubray

Thomas

As a very first response to your query, please refer to Aladon's website at www.thealadonnetwork.com. The first two or three of pages of this website give a brief description of RCM and RCM2.

If you want to study the original text of the document that first described "Reliability-centered Maintenance", you need to get hold of a copy of the report entitled "Reliability-centered Maintenance" by F Stanley Nowlan and Howard Heap from the National Technical Information Service, Springfield, Virginia, USA.

If you want to study RCM2 in more detail, get hold of a copy of my book entitled "Reliability-centered Maintenance" (US edition) from www.amazon.com, or "Reliability-centred Maintenance" (UK edition) from www.amazon.co.uk.

As the correspondence you have been reading indicates, there are great many other processes that use the term "RCM" currently on the market. Some can legitimately be called RCM. Some cannot. The SAE Standard JA1011 was developed to help users to determine the difference. To obtain a copy of the standard, visit the SAE website at www.sae.org, and enter the number "JA1011" into the first search field that you encounter. This will take you to the page that will enable you to download a copy of the standard for a sum of US$59. (I have learned that some national standards organisations charge a lot more for this already expensive document. If you can, I suggest that you try to order it direct from the SAE.)

John Moubray

From Dana Netherton

Thomas,

Regarding your query, I'll say this:

"RCM", as defined by SAE JA1011, is a 7-step analytic process, based directly on the process presented in Nowlan & Heap's 1978 US government report.

No specific process bears the name "RCM-1" or "RCM1". However, RCM processes that are derived directly from Nowlan & Heap's report (such as the two official US Navy RCM processes, one for ships and the other for aircraft, both written in the early 1980s), might be called "RCM Mark 1".

One way in which Nowlan & Heap differ from SAE JA1011 is that N&H address "safety" as an explicit issue to be managed, but do not address "the environment" as an explicit issue. In the late 1970s, who did?

But today, who does not? An SAE standard that did *not* address the environment as an explicit issue would do its users a grave disservice. So SAE JA1011 requires an RCM process to address the environment as an explicit issue to be managed.

"RCM-2" or "RCM2" is Aladon's RCM process, first defined in the first edition of John's book in 1991 (about 13 years after the publication of N&H's report). It does address the environment, and has a number of other enhancements that motivated John to call it "RCM Mark 2", or "RCM2". (The final chapter of his book has a summary of these enhancements.) RCM2 does comply fully with SAE JA1011.

RCM2 also has some important features in areas outside the analytic process itself, areas not addressed by SAE JA1011. You may recall that I mentioned, in my earlier post, that a number of consultants tried to apply RCM, and failed to get results that were worth the cost of the effort. You may recall that I said that the roots of the problem generally lay in the way they organized and trained people (client people and their own people). And that I said that these consultants failed to address those issues, but instead tinkered with the analytic process itself.

RCM2 adds specific ways of organizing and training people so that they are most likely to apply the RCM process successfully. I'm not going to go into those things here -- see John's book for an introductory discussion of them; if you find that you want details, I'm sure that John or I can refer you to people who can give them in a better forum than via e-mail. But in my view these points are the most important features that differentiate RCM2 from the other SAE- compliant processes out there.

Back to Jim, and perhaps others,

In case any independent comment is needed about John's book, I'll make it, and gladly. I have read all of the books titled "Reliability- Centered Maintenance", and IMNSHO John's book is the best, hands down. Nowlan & Heap's report is just that, a report. It's groundbreaking and gives mind-bogglingly important background information about the development of RCM, but it doesn't try to *teach* RCM. John's book is a textbook. And a much better one than the other books rattling around the bookstore shelves, IMNSHO.

As to your suggestion, well thanks! I appreciate the compliment! Frankly, though, I don't feel a need to write a book of my own, given that John's book is already out there.

Again, in the interests of fair play, I will remind folks here that my firm is a member of the Aladon Network. This means that I have signed a license agreement with John's company to use his training materials, including his book (as a textbook). I did that specifically so that I *could* get access to them. I chose to go in that direction because I've seen the training materials out there and I think that John's (Aladon's) are the best.

Futher, in the interests of fair play, those who have seen the most recent edition of John's book (2.3), will know that John has obtained my permission to use portions of my magazine article (portions that talked about the history of RCM) in this edition of his book. I received no financial compensation for giving that permission, just a word of thanks in the acknowledgments. :-)

And, again, I have no commercial interest in Australia, NZ, or elsewhere outside North America, so whether the Aussies, Kiwis, or Brits agree with me makes no commercial difference to me. :-)

Oh, and do note that John said, "get hold of a copy", not "buy a copy". If Kim has a friend or colleague who has a copy, I'm sure John would be happy to see Kim borrow it (so long as Kim reads it!).

Dana Netherton

Go to Part 3 of this discussion

Maintenance Task Selection - Part 3

Copyright 1996-2009, The Plant Maintenance Resource Center . All Rights Reserved. Revised: Thursday, 08-Oct-2015 11:53:49 AEDT Privacy Policy

Copyright 1996-2009, The Plant Maintenance Resource Center . All Rights Reserved.
Revised: Thursday, 08-Oct-2015 11:53:49 AEDT
Privacy Policy