Reliability and ValidityReliability and validity are presented together because they are related, and are often confused with one another.
ReliabilityReliability is a property of a measure that refers to its precision, or the degree to which multiple observations of a given phenomenon yield identical results. In public health, measures such as death rates or birth outcomes are often used to indicate the true underlying risk of illness or disability in a population. But sometimes these measures of risk fluctuate when the true underlying risk of disease does not. The reasons for the variability usually include one or more of the following factors: 1) the health event is relatively rare, 2) the population size is relatively small, or 3) the health events do not occur at regular time intervals.
Even for complete count datasets, such as birth and death certificate datasets, random fluctuations over time can yield estimates that are not reliable. Consider the case of low birth weight in a small community. In this community one low birth weight infant is born each month, on average. But low birth weight is a health event that does not necessarily occur at regular intervals - there is randomness in the timing of low birth weight occurrence. In our small community, if three mothers give birth to low birth weight infants at the end of December of Year 1, and none do in January or February of Year 2, it may appear as though the rate of low birth weight births has declined from Year 1 to Year 2.
Fortunately, statistical techniques can be used to help assess whether there was a significant difference in rates from Year 1 to Year 2. The confidence interval is the statistical measure that conveys the reliability of an estimate. If an estimate has a wide confidence interval, it decreases the likelihood that the difference is statistically significant.
Rates that fluctuate over time, in the absence of changes in underlying risk, are considered unreliable. Such rates are also commonly referred to as "unstable." Since the underlying risk typically changes very slowly, the term, "unstable" is commonly used to refer to any rates that fluctuate in a random pattern over relatively short timeframes.
ValidityValidity is a property of a measurement that refers to its accuracy, or the degree to which observations reflect the true value of a phenomenon. In public health, the validity of most measures is quite good. The cause of death on a death certificate is certified by a physician, survey measures have been tested to maximize validity, and birthweight is measured and reported at the birth hospital. There are some measures that we question, for instance self-reported drug and alcohol use, but on the whole, public health measures have a high degree of validity.
The Bulls-Eye ExampleIn the three figures below, the bulls-eye of the target represents the true underlying risk of disease in a population and the holes in the target represent multiple objective measurements of the risk. In the first figure, the measure is reliable - it measures nearly the same value each time. But the measure in Figure 1 is not valid - the average of the scores is not close to the true underlying risk. In the second figure, the scores are not very reliable - there is a lot of variability in the scores, but they center around the true risk value, so they are valid (at least on average). In the third figure, the measure is both reliable and valid.
The term "precision" is often used in relation to reliability, while the term, "accuracy" is used to describe validity.