Nut Plant Maintenance Resource Center
Using Speech Recognition To Increase Maintenance Management Efficiency
Join Now
FREE registration allows you to support this site and receive our regular M-News newsletter.

bkused120x60.gif - 3168 Bytes

Using Speech Recognition To Increase Maintenance Management Efficiency

Author : Michael J. Dougherty

Advances in speech-based technologies have emerged to provide computers with the capability to cost-effectively recognize and synthesize speech. Additionally, wireless communications have ascended to where the number of mobile phones will eclipse land-based phones and the Internet has become a commonplace communication mechanism for businesses. The confluence of these technologies portends interesting opportunities for maintenance management.

Maintenance, by its very nature, is a highly mobile activity. This mobility requirement constrains a craftsman's ability to receive and provide information that can improve productivity, reduce costs, and improve overall management of the maintenance process. Once the worker ventures beyond their wired environment, their options to gain access to information resources diminish.

An additional factor is that maintenance workers, along with other "skilled" practitioners such as physicians, often view computer technology as extraneous to the job at hand. Their affinity towards, as well as exposure to computer applications is often less than optimal for maintaining and fully utilizing information resources.

Organizations realize that management of the maintenance process can lead to significant cost savings as well as productivity improvements. Trends towards planned and scheduled maintenance programs to improve efficiency and effectiveness have prompted maintenance organizations to deploy computerized maintenance management systems (CMMS). While a CMMS is the backbone for automating the management approach, efficiently capturing data and providing easy access for system users are key success factors for the system.

Speech-enabled Applications

A compelling feature of speech-enabled maintenance management is that users can interact with applications to retrieve information and keep the systems up-to-date via standard or mobile phones. A prototypical dialogue between a maintenance worker and a speech-enabled application would sound something like this:

C: Welcome back Marshall. Your choices are Work Orders, Equipment, Inventory, or More choices.
P: Work orders
C: Update, Create, Work Order Details?
P: Update
C: What is the Work Order number?
P: One zero zero one
C: What would you like to update. Say "Choices" to hear a list of options.
P: Set failure class to pumps
P: Set problem to low volume
P: Set status to approved
C: Are you done with this work order?
P: Yep
C: Anything else?
P: No
C: Goodbye.

In this scenario, Marshall is a maintenance foreman on his way back from inspecting a failed piece of equipment. Rather than wait until he returns from the site, he is able to call in his information to the application and approve the work order for dispatch to a maintenance crew.

A few years ago, a scenario like this would have been highly improbable, but the technology exists today to build solid, fluent speech-enabled applications. The features available in speech technology can adequately deal with the complexity of human speech. For example when the foreman responds "one zero zero one" (1001) to the work order number question, the speech recognition engine could have also recognized "one thousand and one" or "ten o one".

Another example of the robustness of current speech technology is in a feature known as mixed-initiative dialogs. This allows a user to make a statement such as "set the failure class to pumps". The application can strip out the two key pieces of information "failure class" and "pumps". This is analogous to filling out a form on a computer screen.

Mark Agan, a Director with Professional Services Facility Management of Fort Washington, PA provides facility management and maintenance services to hospitals and universities. They currently use a standard voice response application to help dispatch services. "Right now the caller has to punch in their location code and the required service using the phone keys. We then dispatch a maintenance worker to the location via a beeper. However, with speech recognition, the staff could say "water leak in room 407". The advantages are that the caller can interact in a more natural manner, we can respond to many more types of requests, and we can also load the request directly into our work order database. This would let us to do a better job managing work requests."

Elements of a Speech-enabled Application

Users of current MS-Windows or Web based applications will quickly recognize that speech-enabled applications are in many ways similar to traditional computer applications. The main components of speech-enabled application include Dialogs, Navigation, Forms/Fields, and Grammars. These compare with the modules, menus, forms/pages and valid value lists of traditional applications.

Dialogs - These are the main components of a speech application. They group together the similar conversational elements of an application much the way functions are grouped into modules within a traditional computer application. An example of this would be a work order module versus a storeroom module in a CMMS application.

Navigation - Just as users can click on menu items within a computer application, speech-based applications provide a similar audio menu to the caller. The user is then transitioned to another sub-menu or a specific form. An important feature in speech applications is the ability to provide links directly to any portion of the application without traversing the audio menus. This feature, known as "barge in" allows the caller to jump directly to the desired function. For experienced users this allows them to use the system more efficiently.

Forms/Fields - Speech-based forms serve the same purpose as traditional computer application forms. They provide the user with fields in which data can be entered or retrieved. As mentioned above, mixed initiative dialogs allow the caller to fill out the form more naturally.

Grammars - The primary role of grammars are to predefine what a user can say and that the application can recognize. This is similar to the valid value lists that users pick from when entering data into an application. It should be noted that grammars have a broader context in speech applications as the user must also specify field names along with their values. In the above example where the speaker said "Set the failure class to pumps", the term "failure class" maps to the field name and the term "pumps" maps to the field value.

Considerations and Limitations

While there are obvious advantages to voice user interfaces (VUI, pronounced vooee), they do have limitations when compared to the more traditional graphical user interfaces (GUI). Some of the limitations of speech-based applications include:

  • People can read much faster than they can speak. For example, listening to a long list of open work orders over the phone is not practical.
  • People have a difficult time remembering what they just heard. Consider what happens when you stop to ask someone for driving directions. Speech applications are not efficient if unfamiliar data needs to be committed to short-term memory or repeated continuously.
  • People can say anything, but computers are limited in scope. Because a conversation among two individuals is unbounded, people will naturally infer that a conversation with a speech-based computer application is open-ended. However, the speech-based application must be carefully constructed to guide a user through the valid options.

Other considerations for the maintenance management world are the noisy environments where maintenance is often performed and the prevalence of two-way radio communications that limit some of the functionality of speech-based applications. Several years ago, Professional Services tried to deploy speech recognition to help inventory the equipment at customer locations. "Some of the areas like boiler rooms were just too noisy", said Mark Agan. However, as the technology improves to filter out background noise, properly designed speech applications will expand their reach.

And while talking over the phone is a task anyone can master, talking to a computer application still presents challenges. Professional Services still sees some of the functions being handled by more sophisticated users. "On some of the data entry activities, you still need to have a knowledge of what the application is supposed to do. You simply can not hide all of that from the caller or explain it all over the phone."


Some of the major speech-technology software vendors in the market include IBM, Nuance Communications, SpeechWorks International, and Motorola. Information on the ROI advantages related to speech-based applications can be downloaded from their web sites. Another important source of information can be found at

There are a number of technical details related to fully developing and deploying a speech-based application, however one of the primary technological components, VoiceXML, bears mentioning. Simply put, VoiceXML is an open standard used by developers to build speech-enabled applications. It has evolved out of the same technology architecture that produced HTML, which led to the rapid adoption of the World Wide Web over the Internet.

VoiceXML version 1.0 was released in March 2000. A draft of Version 2.0 was submitted in April 2002. Many of the voice recognition software companies are already supporting the proposed changes for Version 2.0. The importance of this technology is that it will allow companies to build platform independent speech-enabled applications and leverage a company's investments in web-based applications.

The other technical components or subsystems that round out the requirements include a network interface; a telephony (i.e. telephone) interface; a Text-to-speech (TTS) engine that translates computer text into spoken words; a speech recognition engine that translates spoken words into computer text; and an audio subsystem to record and play back audio files.


Major organizations in the airline, financial service, and transportation markets have successfully deployed speech-based applications to provide higher levels of customer service as well as save millions of dollars in operating costs. Some of the industry segments that could benefit from speech-enabled maintenance management include Utilities, Municipal Transportation Authorities, Food & Beverage, Academic institutions, Medical facilities, and Retail chains.

As organizations look to makes their workforces more efficient by implementing a CMMS, they need to address the accessibility of these applications by a mobile workforce. While arming them with laptop computers, PDAs, and other hand-held devices can surely help, the low-cost, ease of use, and ubiquity of voice communications presents an alternative that will be hard to ignore.

Organizations that want to ensure that their investments in CMMS applications are fully utilized should begin looking at how speech-enabled maintenance management can extend current applications to workers that are highly mobile and need to stay that way.

About the author: Michael Dougherty is President of Crossbar Solutions and has over 20 years experience developing asset, facility, and maintenance based solutions. He holds a BS in Computer Science/Mathematics from Drexel University and has completed graduate coursework in Industrial Engineering from Penn State University. He can be contacted at

Copyright 1996-2009, The Plant Maintenance Resource Center . All Rights Reserved.
Revised: Thursday, 08-Oct-2015 11:54:46 AEDT
Privacy Policy