Hazard Prioritization by the Numbers by
John H. Lindorfer CSP, P.E., CM (retired)

ABSTRACT

A revised hazard prioritization method is proposed which enhances visibility of risks to management and precision of forecasting probability of occurrence in the absence of historical failure or problem data. Among its advantages is that it provides an absolute priority of ranking of hazards by either hazard risk, cost and schedule risk, or both either with or in lieu of a probabilistic risk assessment.

DISCUSSION

One of the many developments which followed the Challenger accident of 1986 was a growing realization within the aerospace community that there was a basic communication difficulty between the safety professionals and engineers who were familiar with the problems encountered in hardware use and the managers who were responsible for that hardware. Safety and management were talking to each other, but it was not certain that each fully understood what the other was saying. To improve communication of risks to management, NASA established the NSTS Hazard Prioritization Working Group in March of 1987. The working group published a careful and comprehensive description of the problem and recommendations for priority of effort for its solution. However, the full effect of their findings and recommendations has not yet become a medium of communication for the safety community.

The purpose of the Working Group was to develop a hazard prioritization technique that would improve the manner in which risks are worked and brought to higher management attention. Their three objectives of improving evaluation and review process (related to severity and likelihood of occurrence), improving resolution and improving management decision making are partly carried out in the instructions for hazard reporting in the current revision to NSTS 22254, "Methodology for Conduct of NSTS Hazard Analyses." The hazard evaluation and safety review process is certainly enhanced by an activity which causes safety personnel to evaluate and review hazards. Also, ranking hazards in order of resolution priority should improve whatever resolution process currently exists. But hazard prioritization, by itself, does not necessarily improve the decision making ability of either NASA or contractor managers unless: 1) it provides sufficient quantitative information for management understanding and analysis, and 2) it provides cost and schedule impact information. This paper suggests techniques to improve this process in both areas.

Initial experience demonstrated that a ranking process in 3 x 3 format, which was originally used, did not serve as an effective discriminator. All of the hazards reported for the major elements of the STS potentially resulted in the loss of personnel, or the Shuttle vehicle, but were extremely unlikely due to the imposition of historically effective controls. They therefore all fell in the lower right hand box in the matrix. While each hazard was potentially catastrophic, the risk measuring technique originally used did not provide any means by which they could be ranked in order of priority for management attention. This difficulty was partially alleviated by a request by NASA for element contractors to present their "top ten" hazards. Although the contractors did present a list of ten hazards, there was no indication that the current hazard prioritization technique had been helpful in assigning precedence, or, indeed, on what basis such assignment was based. This was partly due to the imprecision of the 3 x 3 ranking, in part due to the exhaustive effort which had already been expended for resolution of potential hazards by all of the contractors. A further problem was the difficulty in assigning realistic likelihoods of occurrence to events which had never happened, and for which no experience data therefore existed. These same problems apply to the 3 x 4 ranking eventually adopted by the NSTS Hazard Prioritization Working Group and incorporated into NSTS 22254. This hazard matrix is the one current used and is shown below.

Risk Matrix Test for Agreement Between Closure Classification and Risk

An essentially identical hazard prioritization technique has been used extensively by the military. This technique is called a hazard risk assessment, and is described in Appendix A of MIL-STD-882B, "System Safety Program Requirements," and DN 2E1 of AFSC Design Handbook 1-6, "System Safety." This assessment uses four hazard categories and five frequencies of occurrence to arrive at a 5 x 4 matrix which prioritizes risks by the hazard risk index below.

The techniques presented by the Hazard Prioritization Working Group improved somewhat on the precision of the military system by the proposal of a 5 x 5 matrix, but eventually decided upon the current 3 x 4 matrix. In both the NASA and DoD prioritization matrices, frequency of occurrence increases from bottom to top, but NASA shows an increase in severity from left to right (the conventional Cartesian X-axis), while the military technique shows an increase in severity from right to left. Neither technique provides precise information on the expected frequency of occurrence or a numerical evaluation of the hazard severity.If the MIL-STD-882B matrix is reversed for compatibility to show an increase in severity from left to right, a comparison of the NASA and military matrices can be made as follows.

This matrix compares the adjective descriptions of hazard severity and likelihood or frequency of occurrence in the NASA and military matrices. The use of the terms "frequency" and "likelihood" may suggest that the former assumes a number of occurrences, whereas the latter assumes a probability of occurrence of less than one, although similar adjectives are used. The areas of agreement are shaded. There is no NASA equivalent for the DoD definitions of "impossible" frequency of occurrence or "negligible" hazard, which is perhaps why the Hazard Prioritization Working Group omitted them. However, there is as yet no precise means of determining where a given hazard should lie in either of these matrices.

To increase the precision of the prioritization technique, the matrix can be divided by continuous, rather than discrete measurements, both horizontally and vertically. In an attempt to provide some numeric indication of risk, the vertical axis can be arbitrarily assigned a dollar value. Such a matrix is shown below. The potential cost of a hazard ranges from one thousand dollars, which is near the lower limit of concern for the cost of a single mishap involving flight hardware, to ten billion dollars, the order of magnitude of the overall cost of the Challenger accident or the loss of the Mars Observer. The proposed scale is logarithmic. Upper and lower limits are, of course, arbitrary. NASA management may choose to assign different limits based upon its own definitions of acceptable risk. The horizontal and vertical lines do not delineate different boxes; they are merely measurements along each axis. The vertical scale is also logarithmic, ranging from a minimum frequency of occurrence of "one in a million" to unity (absolute certainty) per exposure. The inclusion of exposure as an element of hazard prioritization takes into account the possibility of reducing a necessary risk by reducing the exposure to it. The Air Force (AFSC DH 1-6, DN 2D1) considers a one in a million probability of fatality as a generally acceptable level of individual risk. Although admittedly arbitrary, the estimate of the potential cost of a hazard and the likelihood of incurring that cost provides an expected expenditure per exposure which can be considered by NASA management in making programmatic decisions, including limitation of exposure.

The assignment of numerical values to a more precise matrix more nearly meets the purpose of improving evaluation and review process and improving resolution which were two of the three purposes of the Hazard Prioritization Working Group. The hazard risk matrices by themselves have difficulty fulfilling this purpose, although the ancillary documentation developed or obtained by the safety analyst during the use of this technique may well do so. However, at present, adequate traceability for the safety analyst to document risk already exists in hazard analyses and failure modes and effects analyses submitted to NASA by the element contractors. The problem is that these hazard analyses are reviewed mostly by other safety professionals within the NASA organization, not managers. What is needed is a technique to more effectively communicate to management the assessment by the safety community of the risk associated with each reported hazard. The use of a probabilities and dollar values presents this information in a quantified form suitable for management review and tracking. The overall risk indicated by the matrix increases from lower left to upper right.

The definition of hazard severity often refers to severity of the possible hazard effects, or the severity of the mishap or failure. There may be fertile opportunity for development of alternate terms to describe hazardous conditions, hazardous events, mishaps, failures, and hazard effects, all of which now are included in the term "hazard." The term "severity" generally refers to the consequences of a mishap or an undesired event. Since the consequences of an event generally can be estimated with respect to the property damaged or destroyed, it seems reasonable to define these consequences in terms of dollars to achieve a common measurement of them all.

The use of dollar values allows a more precise measurement of the risk to property, but cannot be equated to risk to people, either the flight crew or the public. NASA's most severe choice is a hazard which could result in a mishap causing fatal injury to personnel, and/or loss of one or more major elements of the flight vehicle or ground facility. The least severe is a hazard which could result in a mishap of a minor nature inflicting first-aid injury to personnel, and/or damage to flight or ground equipment which can be tolerated without abort or repaired without significant program delay. In terms of dollars, it seems reasonable to use the equivalents below which are compatible with current NASA thinking as expressed in NSTS 22244 and NHB 5300.4 (1D-2). While it is not possible to assign a dollar value to an injury, the definitions above are based on considerations of what damages might be awarded by a court for injuries or death for which NASA or a contractor were found liable, which is a reasonable way to quantify the risk to people.

Negligible - Potential for less than $10,000 damage or first-aid injury

Marginal - Potential for recoverable injury or damage from $10,000 to $1,000,000

Critical - Potential for major or permanent injury or damage of $1,000,000 to $100,000,000

Catastrophic - Potential for death or damage greater than $100,000,000

Assigning a likelihood of occurrence is a simple matter of statistical analysis if the system has an extensive operational history. Unfortunately, the choices for likelihood of occurrence are somewhat obscure in any ranking system yet proposed where historical problem history is lacking. The problem is in the definition of what a hazard is, which has various definitions within the safety community. DN 2E1 of AFSC Design Handbook 1-6 defines a hazard as "An existing or potential condition that can result in a mishap." NHB 5300.4 (1D-2) defines a hazard as "the presence of a potential risk situation caused by an unsafe act or condition." MIL-STD-882B defines a hazard as a "condition that is prerequisite to a mishap." Many attempts at quantification appear to equate hazards with mishaps, or what are essentially events. There is little specific reference to hazards which are essentially conditions, which are prevented from becoming undesired events by appropriate controls. Such conditions may be impossible to prevent (such as the low temperature of cryogens), and therefore such hazards always occur. Without a distinction between hazards which are conditions and hazards which are events, appropriate appreciation for, and visibility of, the control measures on conditions are difficult to create. As working definitions, therefore, the definitions shown as "causes" are suggested. Note that none of these sources of danger are described by the word "hazard."

The graphic method of ranking produces a two dimensional description and avoids a single assessment of the severity of risk, which often results in an unnecessarily subjective approach in which a highly probable minor injury is ranked the same as unlikely major damage, and highly probable serious injury is ranked the same as unlikely equipment loss. The abrupt transitions from one box to the next favored by both the military and NASA suggest some absolute distinction between the degrees of risk in adjoining boxes. In the proposed matrix, the distinctions are avoided in favor of a continuous gradient of both degree of severity and probability of occurrence. A single assessment of the degree of risk is represented by a vector, the distance of the point on the graph representing each risk from the origin, which is the lower left corner. Thus, each hazard may be ranked in terms of overall risk in terms of this distance. This provides a quantitative measurement of risk, which is the concern in which management is interested. It also leaves judgement concerning at what level of risk each level of management should become involved to the discretion of the respective manager(s), by preserving both severity and likelihood of occurrence as observable parameters.

A recurring problem in any management assessment of risk is that subjective judgement based on cost and/or schedule impact of the corrective action, or perhaps a lack of understanding of the logic of the acceptance rationale, results in a bias in the manager's perception of degrees of severity or probability of occurrence. While cost or schedule impacts are important management criteria, any change to an measurement of either severity or likelihood of occurrence based on some other criterion, however important, may well disguise the true nature of the risk and impair its visibility to management. The purpose of hazard prioritization is the communication of safety concerns which, by themselves, have nothing whatever to do with budget or schedule. A presentation to management by the safety community should be limited to facts and opinions based solely on safety considerations. The resulting decisions, based on cost or schedule impact, should be made by the managers involved, using cost or schedule information supplied by experts in those fields. The most serious risks, in terms of likelihood of occurrence or severity, may well be those which are most costly to resolve, either in terms of funding, schedule impact, or both. Adjusting these criteria may in such a case reduce the management visibility of those risks which are precisely those which require the most management attention. For these reasons, any risk assessment should include not only a quantitative description of the risk of occurrence of an undesired event, but also an estimate of the cost and schedule impacts of the prevention of that occurrence.

The introduction of cost and schedule criteria into the consideration of when (or whether) the hazard should be resolved suggests that a two-dimensional matrix may not be sufficient for proper management visibility of the overall risk. For this reason, a separate cost/schedule analysis matrix may be used as shown above. The horizontal axis is logarithmic, the same as the cost scale on the hazard risk assessment matrix. Management may, of course, choose a different upper limit. The vertical scale is also logarithmic, representing an overall launch schedule of from zero to 631 days, or a little less than two years. Resolution of the concerns caused by the Apollo 1 fire and the Challenger accident fall within this time scale. A longer resolution period therefore does not seem reasonable, but here, too, management may choose a different scale. Once again, the overall quantitative degree of risk is represented by the distance from the lower left hand origin to the point on the graph represented by the risk.

The instructions and graphics beginning on page 8 of the Hazard Prioritization Working Group report assume that a hazard is caused by a failure or event. The reason for this is that the Group focused on fault tree analysis, which, by its nature, is limited to the investigation of faults. Since hazardous conditions can exist independently of faults, the use of fault tree analysis as a generic tool for hazard analysis is incapable of dealing appropriately with other hazards which require fully as much, if not more, attention as faults. On the other hand, fault tree analysis does provide a means by which the relationship between a probable event as cause and an undesired event as effect can be identified for the imposition of necessary controls. However, it does not effectively identify the contribution of a known hazardous condition (indicated by a "house" or "ribbon" in a fault tree) leading to an undesired event where it may also be appropriate to interpose an effective control. Indeed, if the hazardous condition is necessary for the undesired event, the imposition of an effective control on the condition would be sufficient to prevent the undesired event under all circumstances. The identification of the required nature and location of such a control is difficult, if not impossible, by fault tree analysis alone. For these reasons, the proposed hazard prioritization technique does not require the use of fault tree analysis, although it does not rule out its use as an ancillary technique when justified by the nature of the hazard being analyzed.

As a result of elimination of total reliance on fault tree analysis, new definitions which permit at least a subjective quantification are required. For example, one hazard cause could be that the "A condition or event which is required for operation of the system" or "A condition or event which can be expected to be encountered in normal operation." Since these hazards are inherent in the design of the system, they are not faults, and fault tree analysis cannot effectively analyze them. The cold temperature of cryogens or the flammability of ET TPS in oxygen are examples of such hazards. These two conditions are included in the list of causes, which may be used as a list of generic hazard causes.

The listing of hazard controls beginning on page 10 of the Working Group report does not include control by inspections or procedures. Also, in each of the given choices, the word "precluded" is used. This suggests that the hazard cannot occur if the controls considered are imposed. Perhaps the word "controlled" is better in certain cases, since there are some hazards (lightning damage, for example) which cannot be precluded within the limits of current technology but which are controlled by various design features. As an alternative to the NASA list, the list of generic hazard controls which follows could be used.

For essentially the same reasons that hazard conditions, in addition to events, should be considered, the term "failure history" might be more appropriately called "problem history." This term would acknowledge that a system might have problems other than failures. Consideration may be given to systems or components in which a potential failure has been identified and resolved by inspection, test, experience in other than critical applications, or tolerance for degraded operation; and similar conditions other than failures. High failure rates of components do not necessarily imply that such failures are hazardous. A common verification technique is to subject a component to stresses greater or of longer duration than those encountered during flight in order deliberately to make them fail. Those that do not fail during testing sometimes may be used with a high degree of confidence that they will not fail during flight. Failure during verification testing may produce a misleading high failure rate. If such failures are excluded from consideration in assessing the failure history, the resulting history is also misleading. The degrees of problem history which follow address this consideration.

Item D on page 11 of the Working Group report may be the first acknowledgement in the NASA report that hazards may be conditions. However, the lack of an entry for a hazardous condition which always exists implies that the hazardous condition is that which is to be precluded or, alternately, that the condition to be detected is a hazardous effect. In either case, the list does not make a discrimination between the ability to detect the condition and the likelihood of doing so. For example, a TPS fire on the ET can be detected by television monitors and IR scanners. But if nobody is watching the monitors, the hazard has a low likelihood of detection. It may be preferable on this list to call it "methods of detecting the onset of the undesired condition." This would allow choices of detection. Also, the divisions of time to effect into only three categories may not provide sufficient discrimination for a precise analysis. The time to effect categories are therefore proposed.

Subjective descriptions of failure probability are often unavoidable, even if a rigorous probabilistic risk assessment is available. The problem with assigning mathematical probabilities is twofold. First of all, a given confidence level of the probability must be agreed upon, and, second, acceptable reliabilities of components having catastrophic failure modes must be unrealistically high (0.999999 for the one in a million failure). Because of lack of public understanding of limitations of probability assessments, probabilities of risk arrived at by a rigorous technique may appear to the average person to be unacceptably high. This may make any realistic assignment of probability of failure politically or socially unacceptable. It is, of course, impossible when failure rates have not been obtained by experience. However, the more realistic the estimate of the risk is, the more use it is to management which must make the decision whether or not to accept it.

The Hazard Prioritization Working Group addressed this issue by defining an alternate method of determining probability of occurrence, which was to determine a numerical rating based upon values which are assigned to subjective subparameters. This method of defining a value for the likelihood of occurrence is amenable to some improvement. The weighed values of hazard causes, hazard controls, failure history and system maturity, methods of detecting hazardous condition, and time to effect result in minimum (best case) values of 4, 4, 3, 1, and 1, and maximum (worst case) values of 20, 20, 15, 5, and 5, respectively. This makes the likelihood of occurrence four times as sensitive to changes in hazard causes or hazard controls as to methods of detection and time to end result. The precise definition of what occurrence is being measured or expressed in each case is not absolutely clear.

Assigning values of from one to five for all of the subparameters results in the worst case in each category being five times the value for the best case, whatever the weighting factor might be. There does not appear to be any underlying reason why this should be so as opposed to, say, the worst case for one subparameter being 20 times its best case, and the worst case for another being four times the best case. Also, the best case value for the subparameter with the highest weighting factor is four, which is almost the value for the worst case subparameter with a weighting factor of one. This situation therefore offers some opportunity for improvement.

One possible improvement is to eliminate multiplication by weighting factors. Another is elimination of the step of dividing by 13. This would allow a range of values from 5 (best case) to 25 (worst case) for the current value assignment. However, there is no overwhelming reason for subparameter value assignments to be restricted to from one to five. Assigning values of from zero (best case) to a maximum represented by the relative risks of worst cases would allow a range of values of zero to some maximum (perhaps 60 or so) which would allow a more precise assignment of relative values as shown below. Parameters are assigned values according to their contribution to the total risk, and the values are added together to arrive at an assumed frequency of occurrence. The total of these values is 60, which is the number of divisions on the vertical axis of the proposed hazard risk matrix . Values for each of the subparameters are assigned as follows:

a. Causes:

0. A combination of two or more CIL hardware faults or failures, where each is independent of the others.
1. A combination of two or more hardware faults or failures, where each is independent of the others and at least one is a CIL hardware component.
2. A combination of two or more non-CIL hardware faults or failures, where each is independent of the others.
3. Fault or failure of two or more redundant CIL components.
4. Fault or failure of a CIL component which has a non-CIL redundant backup.
5. A combination of two or more CIL hardware faults or failures, where the fault or failure of one component may jeopardize the reliability of the other(s).
6. Two or more dependent failures or faults of CIL hardware.
7. Two or more dependent failures or faults of hardware, at least one of which is not a CIL component.
8. Failure of a single CIL single failure point component.
9. Failure of a single non-CIL single failure point component.
10. Failure of personnel to follow the requirements of a written procedural step which is required to be verified by an inspector.
11. Failure of personnel to follow the requirements of a written procedural step which is not normally verified by an inspector.
12. Failure of the flight crew to take appropriate action not contained in a written procedure.
13. Inadvertent personnel error which is not normally subject to inspection.
14. Any one of several unique or barely credible causes.
15. Any one of several credible causes.
16. A condition or event which can be expected to be encountered in normal operation.
17. A condition or event which is required for operation of the system.
b. Controls:
0. Potential for damage or injury is controlled by two or more hardware controls within the limits of current technology.
1. Potential for damage or injury is controlled by one hardware control within the limits of current technology.
2. Potential for damage or injury is controlled to the extent practical by at least one hardware control or design feature.
3. The undesired condition can be detected and corrected by test, inspection or other means prior to commitment to flight.
4. Potential for damage or injury is controlled by an automatic safety device.
5. The undesired condition is controlled by a manual written procedure initiated by a warning device or signal.
6. The undesired condition is controlled by a procedure.
7. One or more sufficient causes of the undesired condition cannot be or are not controlled.
c. Problem History:
0. The system is mature and is intrinsically safe within the limits of current technology.
1. The system is mature and has no problem or failure history.
2. The system is new but is intrinsically safe within the limits of current technology
3. The system is mature and has had some problems or failures, all of which have been benign.
4. The system is mature and has had some hazardous problem or failure history, none of which have been serious enough to warrant design, inspection or procedure modification.
5. The system is new and not intrinsically safe, but has had no problem history.
6. The system is based on mature technology, but is so new that a reliability history has not been established.
7. The system employs immature technology and is so new that a reliability history has not been established.
8. The system is mature and has a problem history sufficiently hazardous to require special procedures, controls or inspections.
9. The system is new and has an unfavorable problem history.
10. The system is so new that a reliability history has not been established, but the design is based on technology of questionable reliability.
d. Detection:
0. The undesired condition is known to exist and all remedial actions have already been taken.
1. The undesired condition is known to exist only as the result of conditions which can be reliably predicted.
2. The undesired condition is known to exist only as the result of circumstances which can be predicted with reasonable reliability.
3. All necessary causes of the undesired condition can be detected by two or more independent means normally employed.
4. All necessary causes of the undesired condition can be detected by at least a single means normally employed.
5. At least one necessary cause of the undesired condition can be detected by two or more independent means normally employed.
6. At least one necessary cause of the undesired condition can be detected by a single means normally employed.
7. All necessary causes of the undesired condition can be detected by sensory perception other than vision.
8. All necessary causes of the undesired condition can be detected by sensory perception, and at least one of them can be detected by visual inspection.
9. The existence of the undesired condition is detectable by visual inspection normally carried out prior to launch.
10. The existence of the undesired condition may or may not be detected by means normally employed prior to launch.
11. The existence of the undesired condition cannot be reliably detected.
12. The existence of the undesired condition is not normally detectable.
13. The existence of the undesired condition cannot be detected prior to commitment to launch.
14. The existence of the undesired condition cannot be detected.
e. Time to Effect
0. Onset of the system anomaly or undesired condition is slow enough to assure detection and correction by minor remedial action.
1. Onset of the system anomaly or undesired condition is slow enough to assure detection and correction by major remedial action.
2. Onset of the system anomaly or undesired condition is slow enough to assure detection and correction by extensive, disruptive remedial action.
3. Onset of the system anomaly or undesired condition is slow enough to allow detection and remedial action, but so rapid that such action cannot be assured without causing a launch delay.
4. Onset of the system anomaly or undesired condition is slow enough to allow detection and remedial action under most circumstances, but so rapid that such action cannot be assured.
5. Onset of the system anomaly or undesired condition is so rapid that complete remedial action is questionable.
6. Onset of the system anomaly or undesired condition is so rapid that complete remedial action is doubtful.
7. Onset of the system anomaly or undesired condition is so rapid that a mission scrub is the only feasible remedial action.
8. Onset of the system anomaly or undesired condition is so rapid that crew safety is jeopardized in spite of a mission scrub.
9. Onset of the system anomaly or undesired condition is so rapid that intact abort is the only feasible remedial action.
10. Onset of the system anomaly or undesired condition is so rapid that the possibility of resolution by intact abort is questionable.
11. Onset of the system anomaly or undesired condition is so rapid that the possibility of resolution by intact abort is doubtful.
12. Onset of the system anomaly or undesired condition is so rapid that any effective remedial action is doubtful.
13. Onset of the system anomaly or undesired condition is so rapid that no remedial action is possible.

On both the proposed hazard risk matrix and the program risk matrix, the distance from the lower right corner to the location which represents the hazard provides a quantitative discriminator between degrees of risk. This discriminator is logarithmic and therefore twice the distance does not indicate twice the degree of risk. The logarithmic scale takes into consideration that higher estimates of cost and time are less precise than lower ones. However, differences in distances do indicate differences in degrees of risk, which allows raking of hazards in relative order of priority. This may be sufficient as an immediate indicator of the risk involved.

If an absolute value of risk is desired, it may be computed by the following formula:

R = 10(p+c) or

R = 10(S/10+c-6) where:

R is the expected cost of the risk in dollars (or equivalent injury) per occurrence,
p is the logarithm of the probability of occurrence per exposure (on the y axis)
c is the logarithm of the estimated cost of the risk per occurrence (on the x axis)
S is the sum of the subjective indicators listed on pages 12 through 15.

If more than one exposure is considered, the above formulas become:

T = (1 - (1-10p)n) x 10cor

T = (1 - (1-10(S/10-6))n) x 10c where:

T is the total expected cost of the risk in dollars for n exposures. R or T may also be computed by the use of the following nomograph.

Either R or T could be considered a hazard risk index for each hazard. This index is measured in dollars, and so is an absolute measurement of risk. It represents the cost of the hazard and provides a direct comparison with the cost of resolution. If the cost of resolution is higher than the hazard risk index, it may be cost effective (without considering political or social factors) to accept the hazard rather than to spend the funds to correct it. Given a low probability of occurrence, the hazard risk index can be expected to be low for all but the most catastrophic hazards.

If injuries are considered, then R or T is the expectation of injury based on the previously listed equivalencies.

The definition of a hazard risk index this way also provides for expression of uncertainty. For example, if the probability of occurrence for all exposures is estimated to be from 1/80 to 1/200, and the potential cost of the occurrence is estimated to be between $150,000 to $800,000, then the hazard risk index varies from a low of $750 to a high of $10,000. From a cost effectiveness standpoint alone, management may wish to accept the hazard if resolving it would cost $10,000 or more. However, if this hazard involves the potential for injury, a decision to resolve it might be made on ethical grounds, especially since the value of the hazard risk index, $10,000, represents the expectation of minor injury. A graphical representation of this hazard is shown below:

A single program risk index could be determined by the formula:

I = 10d+s where

d is the logarithm of the estimated schedule impact in days (on the y axis) and
s is the logarithm of the estimated cost of resolution of the hazard in dollars.

However, the units of measurement of I are in dollar-days, which is not a meaningful measurement except as a relative indicator. Management needs to know both the value of d and the value of s (or the estimated time 10d in days and the estimated cost 10s in dollars) to make reasonable decisions based on both.

If management desires to have a single indicator for ranking of relative risks, the following formula could be used:

H = (1 - (1-10p)n) x 10c+d+s or

H = (1 - (1-10(S/10-6))n) x 10c+d+s where:

H is the relative risk index in dollar2-days. Again, this is not a meaningful measurement except as a means of relative ranking.

An alternate hazard risk index, which is a much more useful indicator of risk of both the hazard and its resolution is a vector defined by the formula:

V = [T, D, S] where

V is the overall risk index, expressed as a vector, T is the total risk in expected dollars (or equivalent injury). D = 10d, the estimated schedule impact for resolution, expressed in days, and S = 10s, the estimated cost of resolution of the hazard, expressed in dollars.

The value of T reduces to the value of R when the number of occurrences considered is 1. The vector V preserves all of the information needed by management for programmatic decisions

Using these values, management can determine which of its various levels should become involved in the reduction of hazards. Hazards having a low dollar risk, or a low cost of resolution or schedule impact could be resolved at the contractor level, while hazards having high dollar risk or high cost of resolution or schedule impact would be reported to and tracked by progressively higher levels of NASA management.

In summary, the technique of hazard prioritization presented in this paper appears to represent a significant improvement over current methods. This technique:

a. Provides both relative and absolute values of hazard and program risk.

b. Incorporates cost and schedule risks as concerns to be communicated to management,

c. Permits consideration of hazards other than faults or equipment failures,

d. Allows numerical values to be computed for hazard severity, frequency of occurrence, cost of resolution, schedule impact of resolution, hazard risk, and program risk,

e. Increases precision of prioritization over the 3 x 4 matrix while preserving a graphical format.

f. Provides a reasonable means of determining probability of occurrence in the absence of reliable problem experience,

g. Enhances management visibility of risks associated with hazards and their resolution,

I recommend that the hazard prioritization technique described in this paper be adopted to prioritize hazards in reports to NASA and its element contractor managers.

John Lindorfer