Nut Plant Maintenance Resource Center
Determining the Frequency of Condition Monitoring tasks
Join Now
FREE registration allows you to support this site and receive our regular M-News newsletter.

bkused120x60.gif - 3168 Bytes

Determining the Frequency of Condition Monitoring tasks

A thread from the plantmaint Maintenance Discussion Group

Plant Maintenance Resource Center Home Maintenance Articles

This discussion thread took place within the plantmaint mailing list - a discussion forum for maintenance-related issues. What was the conclusion? Read on and make up your own mind! For more information on the plantmaint mailing list, click here.

From: "Holmes, Matthew"
Sent: Tuesday, November 16, 1999 10:55 PM

Can anyone on this list point me to a good source of published standard frequencies, hard copy handbook(s) or on-line website(s), for Maintenance and Conditioned Based Monitoring. For example, I am looking for frequencies of the following, for various plant equipment (valves, pumps, motors, HXs, etc.):

  • lubrication
  • lubrication analysis
  • vibration monitoring
  • noise analysis
  • thermography
  • etc.
Note: Manufacturer's recommendations acceptable.

Thank you in advance for your time and efforts!


From: "Peter Ball"


Have a look at <link no longer exists>

There is enough there to get you started, but frequencies are not addressed in detail. These are a function of equipment Criticality and can be calculated as a function of Mean Time To Failure (MTTF).

Many CM users work on the basis of monthly checks, as it is less complex for trend analysis.

Hope this is of assistance, and Good Luck.
Peter Ball

From: "Steve Turner"


Might I suggest that the frequencies of condition monitoring tasks are a primary function of the rate of decay of the failure rather than the MTBF. (ref RCM II Moubray (1996) Nolan and Heap (1978)et al) MTBF only comes into the equation if the inspection confidence is known and is less than 100 %. (MIL STD 2173.)

There are some "RCM" algorithm software systems which use MTBF to calculate CM intervals but I find that they give varying outcomes depending on the estimates of MTBF and the inspection confidence (and indeed other inputs such as the cost of failure and the cost of completing the inspection.) One needs also to ponder the equations as they presume MTBF is fixed therefore all failures are random. I am aware that there are formulae that account for this but to me they begin to border on the ridiculous as the analyst must be prepared not only to estimate the average life but the failure pattern.

By far, the most practical approach to determining the best rate of inspection is to ask the question to the fitters or who ever knows the equipment best "how often should the condition monitoring task be done such that the failure will not occur unexpectedly?" Work this answer backward and forward until the point where the respondent is confident that his inspection rate is providing adequate prediction but not over doing it. Because the answer needs to be correct to orders of magnitude absolute precision is not necessary. Some may be familiar with the question "What is the PF Interval?" which is the same line of thought.

A simple assessment of hours, days, weeks or months is about the best you will get. Note that MTBF has nothing to do with this approach at this stage of the evaluation. MTBF estimates can be used to decide if the cost of prevention is more or less than the cost of failure as this must presume a life cycle cost which is dependent on MTBF.

Hope this is food for thought.


From: "Shannon Hood"

I agree with Steve and offer some additional guidelines:

  • Be wary of software that claims it can tell you how often to do your stuff. Maintenace is an art, not a science that's why we get better at it with experience. Ever wondered why some countries refer to their tradespeople as ARTisans?
  • Software can be helpful but I caution reliance on some algorithm with limmited variables and constraints in its optimisation.
  • I have come accross several sites who have used detailed RCM analysis and wiz bang software packages that tell them to perform a particular task every 23.765984 days, only to have the shop floor reality kick in and the task gets scheduled for the first RDO every month. A point you could probably have got to much faster (and cheaper) if you'd asked your staff!
  • I advocate full blown RCM analysis - but on only highly critical machines and encorage the use of software - but within its limmitations.
  • Vibration checks every 4-8 wks is a good starting point.
  • Visit your more critical stuff more often than non-critical stuff.
  • The rate of decay is usually more the faster and heaivier the thing going round and round is, so visit this stuff more often than slow spinning, light stuff.
  • If you think the bathtub applies, increase frequency soon after commissioning and when you think the component might be nearing the end - note that if you think the bathtub appplies, you may want to re-think doing CM in the first instance but your site maintenace engineer or local maintenace consultant will help here. A couple of readings soon after commissioning can help to get a better baseline while the m/c settles in.
  • Infrared every 6 months is usually OK.
  • The two drivers of decay in this area are Amperage and thermal oscillation. So if you've got big current drawers going on and off all the time or high amperage boards living in outside sheds, give these some more attention than others.
  • Again, do the critical stuff more often - if its not critical, is it worth the bucks on CM anyway? Don't get carried away with a new toy and start CMing everything.
  • Tip for infrared: Don't just limit your thinking to hot wires. If you've got the infrared fella in or hired the gear, ask what other applications it may have. Some different ones include checking for leaks in fridge units, checking alignment of large mechanical couplings and warm spots in the wrong pipes exiting/entering heat exchangers.
  • Lubrication has no hard and fast rules but the manufacturer's manuals usaully have pretty good stuff.
  • Be EXTREMELY careful not to use the wrong lubricant - you can do more harm than good if the lubrication tasks are not clear and/or the lubes are not well labelled and arranged. Oils aint oils!
  • Obvious caution about overlubricating (not so obvious to some process/operation/production staff), who usually find out the hard way!
Don't forget Steve's suggestion about getting the trades staff to advise you on all this.

Don't forget that all the new wanky technologies are not a patch on the best condition monitoring device ever invented - the human. Tap into the people who are using the machine every day and notice the rattles, smells, sqeaks, drips, wiggles that are out of the ordinary. Every one of these will help you foresee and predict failure before it occurs.

Finally, all this CM stuff is simply attempting to predict an imminent failure. Be sure you are taking the appropriate measures to delay the failure as long as possible through the obvious like appropriate lubrication, but also through dusting down cooling fins on motors, vaccuuming the distribution board and cleaning the pool of oil under the machine so an increase in 'drip rate' is noticable. The TPMians will call this 'Defect Elimination' but the less educated amongst us call it common sense.

If you play the game with some of these guidelines in mind, hopefully the MTBF scoreboard will show your improvement.

Hope this helps


From: David Sleeman


Have you tried

From: "Stephen Young"


With respect...

CM frequencies and equipment criticality are not related.

The decision to conduct CM might depend on the criticality of the equipment or process, but the frequency of CM is based on the PF interval, that is the time between when we can detect that a failure is occurring to when the total failure occurs.

If you use MTBF to calculate the CM frequency then for age related failures then in all probability, 50% of your items will have failed by the time you reach the MTBF.

If you use MTBF for calculate CM frequency for random failures then 63.2% of your items will have failed before you reach the MTBF.

Stephen Young
The Asset Partnership

From: "Alexander (Sandy) Dunn"

Let me add another general note of agreement with both Steve and Shannon.

To try and put it as simply as possible, the criticality of an item of equipment, and its reliability (as measured by MTBF) have nothing to do with the frequency with which Condition Monitoring should be done (but has everything to do with whether Condition Monitoring should be done at all).

(The only exception to this, as Steve points out, is where you are not 100% certain that the Condition Monitoring task you are performing will, in fact, predict the failure. In this case, you need to be able to estimate the probability that it won't detect the failure, and in practice, in most industrial applications, this is almost impossible - so, for all practical purposes, forget the exception).

The only question that you need to ask yourself in determining the appropriate frequency for a Condition Monitoring task is "How quickly does it fail, once an incipient failure is detected?". If it fails more quickly, then inspect more quickly.

Clearly, the speed of failure will vary, from application to application. Consider a bearing - a more highly loaded, higher speed bearing that is running closer to its design limits, in an aggressive environment, where lubricant quality is suspect, will be likely to fail more quickly. It also depends on the mode of failure of the bearing - ball faults tend to result in bearings failing very quickly indeed, but a bearing with spalling on the inner race may happily grind away for weeks or months. So operating context is highly important.

Having said that, there are some general "rules of thumb" for Condition Monitoring frequencies. Shannon has, I think quite adequately covered these. I would also agree with the suggestion of considering using thermography for more than just electrical inspections. We have quite successfully used thermography to detect conditions such as silt build up in process water tanks, partial blockages of pipework, incorrectly fitted seals on pumps (leading to rubbing), broken bolts on large open geared mills, failing bearings on conveyor idlers and much more.

As far as the "bathtub" is concerned - the only effective way to deal with this using Condition Monitoring techniques is to perform a baseline check immediately after the item is returned to service after overhaul or repair. This is particularly effective when combating alignment or balancing issues by using Vibration Analysis. In some instances, it can even be used to monitor the quality of the repair being effected. We recently had a case where, despite the alignment on an agitator gearbox supposedly being performed correctly, the baseline reading showed a significant alignment problem. After some detailed investigation, (and rechecking the alignment several times), it was discovered that the new coupling on the agitator had been machined incorrectly, and was not concentric! Incidentally, the repair had been performed off-site by a contractor, and no tolerances had been specified for concentricity. I hope this helps... Alexander (Sandy) Dunn Plant Maintenance Resource Center

From: "Stephen Young"

Hey guys

Have a read of Appendix 4 for John Moubray's book Reliability-centred Maintenance II. It explains how the period for condition monitoring should be determined.... Yes, much of the best information comes from experienced artisans but the information does need to be applied correctly.

Stephen Young
The Asset Partnership

From: "Peter Ball"


I understand the direction you are coming from. Appendix 4: Condition Monitoring Techniques in RCM11 Second Edition, by John Moubray describes the P-F Interval quite nicely. However, it is still only the 'bathtub curve' turned upside down in an attempt to provide better understanding of P-F Intervals.

For those who need to have the basic model details as a negative exponential distribution for mean time to failure (MTTF) which is what I earlier proposed, a very well presented text would be Maintenance, Replacement and Reliability by Professor Andrew Jardine. The book is published by Sir Isaac Pitman in the UK, USA & Canada. ISBN: 0 273 31654 0, and the cost is quite reasonable; or at least mine was in 1993.

This could well provide Ken Bates with the depth of knowledge needed to convince his management, as cost of inspections is considered in the MTTF model. It could perhaps be of use to Matthew Holmes, also.

Oh yes! take comfort in the knowledge that ALL models are flawed in some way; some more than others. Just don't pitch too heavily for one above all others. Other techniques which may mitigate some of the associated risks are: FMECA, Pareto, Weibull, and LCC.

Best regards,
Peter Ball

From: "Steve Turner"


How is the PF Curve the Bathtub curve upside down?

Steve Turner

From: "Shannon Hood"

The PF curve has about as much to do with the S bend as the bathtub curve!


From: "Peter Ball"

How about a U bend?


From: "Peter Ball"

Err Hum!

How about if you turn it 'bottom-side up' instead?

Both commence at infancy, and progress through life with increasing decay, culminating in ultimate failure (death even).

Substitute inspection periods with CM tests (non-invasive), and hope you are good enough to detect an impending failure. If you get an adverse report then reduce your frequency. If all appears to be going well within limits of acceptability, then extend the frequencies.

Naturally you will not be doing this if the item is not Critical, as pragmatic management will not like paying for something that is not really necessary in their view.


From: "Stephen Young"


Interesting thought about the inverted bath tub curve and when inverted the later part of the bathtub curve could LOOK the same as the PF curve but the bathtub curve is illustrating an increasing PROBABILITY of failure with age, while the PF curve is defining HOW LONG a potential failure will take to become a total failure. Two quite different animals and not related.

Stephen Young
The Asset Partnership

From: "Peter Ball"


With due respect .... I beg to differ, perhaps!

Both curves can be look - a - like, upside down or the reverse. It is only the words attached to them that may vary. To my way of thinking there is no valid reason why a point P (potential) cannot be imposed on the bathtub curve, and it often is in reality. It is 'Lead Time To Failure' we are monitoring, and both curves (or animals) can accommodate this factor.

As this interesting discussion has developed from an initial query concerning Condition Monitoring frequency setting, perhaps it may be of some interest to close Moubray's RCM 11 book, and open Patton's Maintainability and Maintenance Management book, to Page 197 (in the 1980 edition) where a Typical Reliability (bathtub) Curve incorporates reference to "Monitor Condition Closely" during the Optimum Operation period ( usually associated with Random Failures). Enjoy.

Peter Ball

From: "Shannon Hood"

I'm afraid I'm with Stephen on this one (as much as it pains me to admit it!)

The PF curve is the time between a change in some parameter away from the 'norm', indicating the point of commencement of decay. This is extremely useful (if not essential) for good condition monitoring. For example, the PF curve for a bearing in a particular application may be 6 weeks. In the first week, the cahnge may be so infantesimally small it cannot be detected through any means. In week 2, a small change in vibration may be detectable if an accelerometer were used. Into week 3 (3 weeks prior to failure), an increase in metal content may be noticable if an oil sample were taken. In week 4 the bearing housing may be getting noticably warmer by week 5 the operator may notice a funny smell and by week 5 and 6.5 days there's a machine making big rattling sounds and about to go... BANG!

One can see the relative importance of various CM techniques and why understanding the PF curve is important on deciding which technique to use and what the frequencies are to be. Using the above example, the time from noticable deviation in accelerometer reading to BANG, is 5 weeks. Therefore a CM frequency of every 4 weeks would theoretically capture any imminent failure.

The bathtub curve is completely unrelated and attempts to look at the actual attrition (or probability of attrition if the sample number is used as a denominator). The bathtub says (say) within the first week of commissioning, 10% of items will fail, in the next week 3% will fail and in weeks 3-77 1% of items will fail, then 3% will fail in week 78 and those that still live will probably die in week 79. Imagine we're in week 20 and an item deviates in its performance away from 'the norm'. The time it takes to go BANG (PF interval) may be nanoseconds (in which case it will appear in the bathtub curve in week 20) or the time it takes to go BANG may be 8 weeks, in which case it will be part of the group that appears on the bathtub curve in week 28.

It must be realised that the PF curve describes an individual component in a specific application. Put a new component in the same application or the same component in a new application and the curve changes! A common mistake with the bathtub curve is that people believe it describes a specific component. IT DOES NOT. Ie it is not describing a bearing 'wearING in', then operating normally, then deteriorating. INSTEAD it is describing the failure probability of a population of identical components.

Except for their geometric appearance, the PF curve (when flipped upside down or the right way up) has no REAL relationship to the bathtub whatsoever.


From: "TIPS from Joseph"

Best FREQUENCY/Interval is addressed by:
click: RELCODE


From: "Peter Ball"

Seems that my innocent little statement concerning CM frequencies has produced some extraordinary useful information. Shannon's remark about wanky technologies is indeed very relevant, and has been addressed to a certain degree by the recent publishing of the SAE Standard JA1011 for RCM. 'Let the buyer beware'. I note that Aladon UK are now stating that their software RCM Toolkit fully conforms to this new standard.

I have considerable reservations as to the real ability of the average maintenance tradesman / artisan to provide significant guidance on the issue of CM frequencies. I would even go as far as to suggest that you could ask 100 different tradesmen the question, and receive close to 100 completely differing responses.

Regards to all,
Peter Ball

From: "Michael Doolan"


perhaps that would be true if you asked 100 different tradesmen; and all the suggestions from those people would be just as relevant to your particular problem/query. Just seen from 100 different perspectives and experience bases.

Give your tradesmen a little credit ... they're the ones actually doing the work on this equipment and see or hear the changes in the performance of that plant an a day to day basis.

Unlike "Most " Engineers, the tradesmen get a feel for the performance of particular machinery they come in frequent contact with and as such get a better understanding of its particular characteristics as each machine tends to have their own "sounds, hums, or temperatures).

Even though you may have 100 of the exact same pump for instance, many will behave slightly differently ( marginal differences in flow or pressure perhaps even temperature); these things most engineers don't understand due to the fact that your so far removed from that environment.

Your tradesmen will know there's a problem with particular plant especially if the rebuild/overhauls tend to be more frequent than should be necessary, some may not know a particular product or modification to perform to remedy the particular fault, but that's where an Engineer that "Listens" comes into the equation - your knowledge of current technology is the base they can draw reference.

Be open to suggestions from your trades base, they have to be open to your orders!

One thing I have noticed over the last 20 years in engineering is that Communication between Engineers and the shop floor Tradesmen is Critical --- unfortunately it is most often ignored or overlooked by the management team of that business. Sad but True. Michael Doolan Specialist Maintenance Tradesman

From: "Steve Turner"

Agreed completely, but I'm sure Peter's comments were not intended that way.


From: "Steve Turner"

I heard on the grape vine that the SAE Standard for RCM was not well received at the Society for Maintenance and Reliability Professionals conference recently and is heading back for another go. Is this true? Can anyone confirm this?

By the way, I would not agree with the statement that 100 tradespeople would provide a different answer to the question of rates of decay or wear etc. Obviously we need to ask the right people - trades folk will know about bearings because every time a bearing starts to get noisy, the management asks them how long have they got to run. Its a bit like asking a taxi driver how long will his diff last with that noise or perhaps his wheel bearing....they seem to know precisely it seems because so many taxis have these noises.

To determine rate of crack growth, then we may need to be a bit more scientific. I'm glad that there are specialists that do this for aircraft..cos I do a lot of flying.

In industrial plant, we don't need absolute precision - just orders o magnitude and in practice we tend to err on the side of conservatism anyway.

Steve Turner

From: "Peter Ball"


I have checked out your 'grape vine' comment regarding SAE JA1011, with the Standard committee chairman Dana Netherton, who advises that there was one (1) hostile vendor present at the SMRP Conference. He states, Quote "There is certainly no intention to rewrite JA1011. (However much that vendor, or other noncompliant vendors, may wish that it would get rewritten.)" Unquote.

Hope that this will clear the air on this issue. Thanks.

Peter Ball

From: "Ray Beebe"

In the early 1970s, when applying routine condition monitoring by vibration analysis to a new fossil-fired power plant, we decided that for plant auxiliaries (pumps, coal mills, fans) that we would take data on the basis of service hours for each individual item of plant. The service hours were read weekly by operations staff, and reported on log sheets, so that information was readily available.

As much of this type of plant was spared, on any given day, some items were not in operation, but were on standby, or on maintenance.

We found that when walking around the plant that it was more trouble than it was worth to select only the nominated items. The extra time to test all that were operating was minimal. We therefore decided that the pratical way was to measure all of a type on a calendar time basis. Some would be sampled more than others, but the reduced complexity balanced this. Therefore, monthly became the usual (and still is).

From: Trevor Hislop

Of course you could ask 100 graduate engineers the same question and get either no answers, or 200 different answers!!

Trevor Hislop

From: "Stephen Young"


A minor correction if I might....

Aladon state that the RCM II process fully complies with the SAE JA 1011 standard. RCM toolkit is the supporting data handling tool for RCM facilitators. The decisions are all made by RCM analysis team and recorded in toolkit. I would hate to think readers of your note gained the impression RCM Toolkit was yet another magic box solution.


Stephen Young
The Asset Partnership

Hey Everyone,

This discussion has been real interesting. My department currently reads data every 5 weeks but the recommendation has been made to go to every 3 months.

Can anyone out there help me to provide my management with the appropriate reasons and supporting links and documents that show why this doesn't supply the support that they need. The reason for suggestion is to save labor costs of reading the data. Any help would be extremely appreciated.

Ken Bates

From: "Shannon Hood"

What if every three months does supply the support they want?

I've no idea of your plant, so here's some high level suggestions:

Look back through your trends and see if you can approximately quantify the PF interval. You may see you have quite a long PF interval and may be able to extend the intervals on these machines.

However, if you notice that you have had a few close calls (or even unexpected failures of machines that have been undergoing vibration monitoring), then you may want to dig your heels in on these and retain (or shorten) the frequencies.

This is a classic case of why condition monitoring IS dependent on machine criticality. From a 'technical' point of view, criticality should make no difference, but most of us have limmited budgets and are driven by bottom line requirements, so in real life criticality does make a difference. Obviously, if those above are absolutely insistant that you increase the frequency, I'd offer a comprimise suggesting that you will yield on some machines beacause they're not as critical as this other list of machines that are very critical. Criticality is dependent on a complicated interaction of direct cost to repair, OHS/env issues, production impact to name but a few.

I often think we maintenance practitioners should be a bit more experimental so another suggestion would be to undertake a bit of R&D. If (say) you have two pumps in a duty/standby situation, why not change the frequency on one and not the other. Providing you regularly switch between the pumps (say monthly), you'll eventually notice a detrimental change if the increased frequency is wrong. Important note with this suggestion is that if you come back with this suggestion and outline its going to take at least 12 months to get some decent data, you've probably bought some time while being seen to responding to their need in a positive way!


From: "Stephen Young"


The frequency of CM is should be based on the PF interval and nothing else. The decision to CM or not is then an evaluation of the consequences of failure and the cost of conducting the CM.

An arbitary variation in CM frequency to reduce cost is failing to appreciate the process of failure.

It could be your arbitary CM period of 5 weeks is too frequent for some items and not frequent enough for others. The correct frequency can only be determined by identifying the failure modes that might affect that item of equipmenty and determining the PF interval for each failure mode and gearing your CM for half the PF interval.

Your managers need to appreciate that maintenance is a valid and very effective risk management tool when it is based on a sound and defensible logic. Using guess work is not a defensible strategy.

Kind regards

Stephen Young

From: "Ber van Loon"

The objective of condition monitoring is to predict upcoming maintenance need by monitoring condition indicators.

To minimize downtime, the desired monitoring interval should be a fraction of the time in which the fastest known failure mode developes from "not measurable" to "significant defect amplitude level".

Sometimes we're lucky when history data is available which can be used to analyze the occurence and the development speed of different failure modes. Tip for those who are desperately seeking for a numerical solution: Take a good look at the standard deviation of TBF.

When you've invented the ultimate monitoring interval: don't forget to ask a specialist about his thoughts on this.

Condition monitoring should not be be regarded as a process which can be managed from behind a desktop. It's a "bottom up" process performed by specialists.

Ber van Loon
Uptime! Condition Monitoring

From: Graham Oliver

Another subscriber has already, kindly, gotten the word out that our software RELCODE might be the answer to the determination of maintenance frequencies. Might we add that if one is into Condition-Based Maintenance, that our product EXAKT would be worth a look as well.

For the record, we would have grave misgivings if maintenance people were to look for and possibly find "standard" frequencies that they could apply. The number of variables that affect the answer are actually immense -- machinery type, speed at which it is run, operator skills, operating environment, product being produced, and so on.

The only way, we think, to get reliable maintenance frequencies is to input one's own data which would be totally pertinent to your machinery working in your environment, and so on.

To look at RELCODE and EXAKT go to

Regards...Graham Oliver
Oliver Interactive, Inc.

From: "George English"

There are many factitious gurus out there - after close scrutiny of their "philosophy", background and credentials - G-d forbid you should ever adhere to their "gospel". This applies especially to compressed air technology and applications.

Just a comment, George English

From: Trevor Hislop

Thank you George English for quoting some down-to earth comments in relation to this "discussion". In over 40 years in the Maintenance Management business around the world I doubt if I have ever heard such a wide range of input from good sensible "talk to the trades-people" through to some absolute "cloud 9" waffle from "ivory tower experts !

Trevor Hislop
Ebony Associates

From: "Peter Ball"

Shannon & Stephen,

Can each or either of you provide academically accepted references to support your stated views that the P-F Curve provides satisfactory CM frequency calculation, that cannot be resolved with the universally accepted bathtub (or hazard rate curve).

Even a few photocopies to my Fax No. would be appreciated, as most of the model criteria would be difficult to enter on E-mail, unless of course you have the details in attachment form suitable for Word 97, or pdf files, even. Thanks guys.


From: "Stephen Young"


I am in NZ at the moment but will arrange something when I get back to the office late this week.

From: "Shannon Hood"


In terms of the "monitor condition closely" quote (as applied to the random failure zone), I ask the next question of How? And whether the answer is through vibration analysis or oil sampling or waiting for a nasty grinding sound (all of which are quite legitimate strategies), then the next obvious question is How often? If I have a bathtub for two identical bearings, one of which goes from shake to bang in hours and the other is in a particular application that may cause it to go from shake to bang in weeks, how does the bathtub help me make the How often decision? This is where PF can help.

I'm certainly one for sound references and I wouldn't trust my inane ramblings! However, one of the problems with maintenace engineering is the fact that it is a relatively new discipline and very little formal academic research has been done until (relatively) recently. Compare the distinct discipline to (say) the discipline of organic chemistry or thermodynamics and you'll understand where I'm coming from. Add to this the fact that while people have always been able to place the screwdriver between the ear and bearing housing to check for excessive vibration, the maintenace application of some of the other technologies are also only relatively new. Personally, its these unchartered waters that I find so exciting about maintenace engineering.

There is always a tragic lag between leading (perhaps bleeding) edge (formal and informal) research and academic acceptance, let alone adoption into university curiculum to drive the production of texts. It was only four years ago that I took a subject in maintenace engineering at a university where we analysed the bathtub curve, did the Wiebul thing etc etc. When I asked about the research done in the aerospace industry that discovered 6 failure curves (not the one bathtub), I was met with a blank stare from my lecturer! Perhaps the application of the bathtub when the world was typically more 'machanical' is less relevant now that we must deal with the reliability issues associated with failures of 'non-mechanical' stuff.

Unfortunately, I am on site at the moment and have all my texts at home, but I will be certain to pass on references for the materials you seek to back up this idea. However, I note your reference is from 1980. Assuming it took a couple years from concept to publishing, the content of that text is approximately 25 years old. How much have ideas changed in 5 (let alone 25 yeras). I would caution academic texts ever being seen as gospel, but, particularly those older than 5-10 years. When I read the phrase 'universal acceptance' part of me hears 'tried and tested' but another part of me hears 'tired and testing'. I believe it is forums such as this and papers at the numerous conferences and in periodicals that are offered that provide the latest information. I worry about the organisation that is willing to wait for universal acceptance by the academic community of a concept before they size it up to see if it fits their competitive needs.

I already have one thesis in the academic world and by the end of this year will have inflicted more guff in the form of a second (Masters) thesis on the Optimisation of Maintenace strategy using financial data. However, just because I've had some university supervisor give it the nod, does not make it any more relevant (or useful) than some of the ideas that linger in the brains of other maintenance practitioners. I guess I'm sceptical about academically accepted documents becuase I know how easy they are to fudge and how out of date most of academia is with the rest of the known universe. There are exceptions, but I believe its the rule.

I note that you are based in Australia and I'd recommend attending Mainstream next year if you want to hear a good mix of academic research and wins and sins of various sites.

Will be in touch next week with the 'academic' guff for what its worth.


From: "Andrew G Starr"

I read with interest the exchange on inspection frequencies. The reference to Jardine is of course an old book but a seminal work. Andrew Jardine was here on Friday, and he's still a firm advocate of modelling risk of failure based on a combined run-time/wear-out model, fed with rich failure data. He markets some software incorporating the model.

The problem perhaps for most users is the lack of good data. The wear-out probability distribution function is usually modelled on the Gaussian or on Weibull with a large beta. But like any model, it needs plenty of data to fit! The difficult question, posed by all new users, is how to choose a sensible measurement interval which will prevent most failures. Using the Gaussian or normal (simpler to explain than Weibull) you need sufficient failure data for a good estimate of Mean time between failure MTBF and its standard deviations. You probably need about ten failures to get an acceptable confidence in the estimate.

The next problem is how to get a good estimate of the P-F interval. This is hard enough in the laboratory, where we can control the degradation of a component, but very difficult in the field because of the number of variables. Most users and consultants use rules of thumb for initial intervals because of these limitations, e.g. accepted PF for major techniques, followed up by fine tuning when data is available. Clearly the logistics of lugging equipment round the plant also dictate to some degree the fine tuning.

The best strategies are flexible - this is certainly true of Jardine's method, and is also embedded in other software philosophies, e.g. Wolfson's MIMIC, to increase measurement frequency when a monitoring threshold is exceeded. On this basis, all users would expect to gather too much data at first, before refining parameters and frequency, and then only increasing frequency when necessary.

Andrew Starr
Dr Andrew Starr
Manchester School of Engineering
The University of Manchester

From: "Bill Roos"

It is with keen interest that I followed this extremely interesting discussion and I would like to add my few pennies worth. I have found that, in order to be truly pro-active, one needs to maintain such massive volumes of data, that it almost becomes impossible to do it without the assistance of a very comprehensive computer based management system. Monitoring equipment and component condition alone is simply not enough, even on-line monitoring and analysis will not present the total picture required to optimize plant availability and eventual profitability.

Finding that fine balance between reliability and productivity, or understanding the impact of the cost of maintenance as opposed to the cost of plant non-availability, requires a far wider look than just the condition of the individual objects that make up the plant. In a fair sized refinery there could be as many as 100,000 pieces of equipment, each with an average of approximately 120 components and parts. Managing the configuration and relationships of more than a million objects, tracking their locations and relationship to the process, the similarities in applications as well as commonalities between failures appears to be an impossible task. Expanding this view to include events and conditions that preceded failures, vendor and individual influences in maintenance and operating activities, skills and training, production demands, as well as the real time financial impact, is something that can only be achieved with the use of a comprehensive ERP management system such as SAP.

A rule based approach to sampling on-line data contained in a system that does the number crunching, must result in a situation where you can actually have a work-force that is only alerted to pre-defined situations that require their attention at the time when it is needed. My believe is that frequencies should not always be set to fixed intervals. In the ideal world a system should automatically adjust frequencies once a pre-defined rate of deterioration is exceeded and in some cases even call new activities to start.

It is not my intention to contradict the valuable comments made by Shannon. Peter and Steve, but merely to add a fresh perspective!

Thank you for a great forum!

Bill Roos

From: "Peter Ball"

The comments from Andrew Starr appear very relevant to this discussion. My interpretation is that you can use the P-F curve and then apply your own conclusions draw from experience. It is interesting to note the similarity of wording in Moubray's RCM11 description of P-F, and Wolfson's. I wonder just who invented this 'curve' theory. If it is so good for RCM why does no-one unconnected with RCM appear to use it? My investigations through the MCM community suggest that they have never heard of it; but they certainly know of, and make use of the 'bathtub' curve!!!!!

To date we have received no convincing evidence to support P-F for calculation of monitoring of machine condition (vibration & oil analysis). Seems it can have an application in the monitoring of machine, or system performance.

Further argument will be appreciated. Thanks,

Peter Ball.

From: "R. Keith Mobley"

I have followed the exchange of comments on this topic with interest. It seems that most of the responses assume that all machines must fail and condition monitoring is simply a tool to predict when catastrophic failure will occur. In the almost forty years that I have been using predictive techniques, the goal has always been the same---to extend the useful operating life, reliability and capacity of critical product systems. If the assumption that failure is not preventable, I have been wasting my time. However, the results of our work proves otherwise.

One important reason that few use the P-F curve is that it must be adjusted to actual application, installation, mode of operation and quality of maintenance for each production system, machine-train or related component. Like most of the other methods used in RCM, the relationship between theory or ideal, and the real world is practically non-existent. The same is true for bathtub curves. The flat or low-probability of failure zone duration is variable. It is strictly dependent on the installation, application and especially on the mode of operation. For example, when a system is applied, installed, operated and maintained properly, the interval of low probability of degradation, damage or failure on both the bathtub and P-F curves can be extended almost indefinitly. More over, the ability, using today's conditioning monitoring equipment, to detect the first minor deviation from optimum operating condition permits plant personnel to make minor adjustments or repairs that can further extend the useful life of production systems.

Before establishing monitoring intervals, you must first decide what results you want from your program. If it is simply to predict immenient failure, follow the advise that has so freely been given over the past few weeks. If you want to optimize performance, minimize costs and extend the useful life of your critical production systems, you must base the interval and methods used on a design review of the installed system. This review must include the designed operating envelope of the system (i.e. what was it designed to do?), the actual installation and how it is really being operated. This review will provide the answer to your question.

R. Keith Mobley

From: "Shannon Hood"


I read your comments with interest and I agree that often the link between the theory and the practical is tenuous. I think that through various methods we can actually adjust the curve/s so assuming they're constant is a big mistake. A good example of this was on a cable manufacturing site which (as you can imagine) had a huge number of high speed spinning winders and unwinders. When the cable drums got full, they were pretty damn heavy, so in additiona to high speed, we had large bending moments hanging off some poor bearings. We were experienceing large amounts of early life failure which was simply accepted as a way of life, and all sorts of strategies were in place to deal with these failures. Not content with this, we took the time to try and understand why so many failures and discovered some pretty poor fitting techniques along with a press that was 28 years old and if used actaully pressed the bearings in out of alignment! Needless to say, with a bit of training from the bearing provider (incidentally provided free of charge) and replacement of the press cylinder we were able to almost eliminate the early life failures. The point to the story is that I don't believe we should blindly accept the current failure pattern and through other techniques we can acyally have an impact on undesirable patterns.

However, I find myself disagreeing with your statement about predictive maintenance being able to extend useful operating life. Perhaps its a terminology thing, but by my way of thinking there's predictive maintenance DOES NOT extend useful life. I like to think of it in this way:

  1. Use the TPM approaches of defect elimination to eliminate the cause of the failure. For example, if motors are overheating because the cooling fins are filled with dust, then eliminate the source of the dust or improve the ventialtion or install extraction fans.
  2. If the source can't be eliminated (or doing so is cost prohibitive) then fall back to PREVENTative maintenance. Prevcentative maintenance is so named because it attempts to extend the useful life of the equipment (or prevent it from happening within a given timeframe). This may be through a process of cleaning off the dust on a regular basis, or regular lubrication or abritrary replacement or whatever. Each of these approaches attempts to postpone the failure.
  3. One all efforts have been made to eliminate the source AND postpone the life, then if the failure is still so undesirable or 'unforecastable' (from historical records and due to its failure pattern) then we need some PREDICTive maintenace. Predictive maintenance DOES NOT extend component life. For example regular vibration monitoring of a bearing WILL NOT increase the life of the bearing. It WILL however, ensure we get maximum use out of the bearing by running it until it is about to go bang (as opposed to arbtirarlity replacing a reasonably good bearing). By PREDICTing imminent failure and addressing it at a time of our convenience we are preventing a breakdown situation, but we are not preventing the failure. Nor are we extending the useful life.
  4. Of course, at the end of the day, if all our efforts to eliminate the cause of the failure, extend the life through preventative maintenace and predict the failure through CM are still unsatisfactory or too costly, there may well be the need for re-design or some form of real-time alarmed monitoring that does not rely in CM inspection intervals. All of this needs to be considered in the light of risk, criticality and cost.

I'm interested in hearing more of your thoughts.


From: "Keith Mobley"


You did a great job of defining the failure of most predictive maintenance programs. My question to you is way do you need TPM and other methods to eliminate the root-cause of problems or to extend the useful life of equipment? That should be the role of the predictive maintenance program. The predictive technologies, used correctly, provide all of the data needed to accomplish these goals.

A survey that we conducted, in conjunction with Plant Services magazine, indicates that less than 3% of those companies using predictive maintenance generate enough benefits to offset the program's cost. In most cases, the reason is that they have limited the program to simple predictions of failure rather than as a plant optimization tool. Using predictive technologies combined with other process-related data, we have shown plant personnel how to eliminate problems related to capacity, product quality and reliability. The result has consistently been a 100 to 1 or better return-on-investment.

Please do not sell these technologies short. They provide the means to really make a difference in overall plant performance.

Keith Mobley

Copyright 1996-2009, The Plant Maintenance Resource Center . All Rights Reserved.
Revised: Thursday, 08-Oct-2015 11:51:55 AEDT
Privacy Policy