|
|
|
A.
Lawyers are increasingly
required to deal with a range of econometric and statistical concepts
in the courtroom. Regression analysis
informs a range of legal questions in important ways because it can
be used to detect relationships between variables and to distinguish
true relationships from spurious ones.
In Re: Polypropylene Carpet Antitrust Litigation, 996 F. Supp. 18
(N.D.Ga, 1997), at *25
(Multiple regression analysis is a statistical tool for understanding
the relationships among two or more 'variables'. . . . Use of regression
analysis allows one to . . . sort out those correlations that are
spurious from those that are not.
See especially the case's citations to Rubinfeld, Econometrics in the Courtroom, 85
Colum. L. Rev. 1048 and Sobel
v. Yeshiva Univ Regression can be used
to show that a publicly traded company's stock fell in price as a
result of the financial press revealing irregularities in the company's
financial statement. It can be used to show the absence of a stock
price reaction to such a disclosure. Regression can be used to discern
illegal relationships between age and termination rates or between
gender and rate of pay. Sheehan v. Daily Racing Form, Inc.
104 F.3d 940 (7th Cir. 1997) It can also be used to sort out
whether the variable correlated with the termination rate is actually
age or if termination patterns are being driven by lack of computer
sophistication, a characteristic notably present in older workers.
Just to fix the idea of a spurious
correlation, if a company adopts a new computer-based method in its
production and lays off its non-computer literate employees, the terminations
will tend to fall on older workers. If such terminated workers bring
suit they will likely be able to introduce evidence of a strong correlation
between age and layoff rates even if the layoff was done with no regard
to age. Regression and certain other statistical tools can inform
a decision as to whether that correlation is real or if it is spurious. Like many powerful tools,
regression must be used properly if it is to inform, and it has great
potential to mislead if it is used improperly. There are carefully articulated
standards for the use of regression analysis and courts are beginning
to look carefully at whether regression-based testimony comports with
the established standards for use of the regression model. See, for example, Estate of Bud Hill v. ConAgra, 1997
U.S. Dist. Lexis (1997, N.D.Ga.). PRACTICE POINTER: Opinions
at every level make a point of noting that judges are not scientists
and, with all due respect to the courts, what this means is that there
is an even chance that your judge may not understand the science put
before him. This is an opportunity for the scientifically informed
lawyer because a little scientifically informed lawyering may go a
long way in informing critical issues. Of course this can be bad news if
it is opposing counsel that informs the court. An important point here is that
there is non-trivial precedent in the circuits that is based on misinformed
readings of the science presented in Daubert hearings.
1.
Properly executed regression
studies apparently meet all of the Daubert Of course, the fact
that properly executed regression studies apparently meet all of the
Daubert criteria makes "properly
executed" the battle ground. In the academic literature, researchers
may spend years debating the finer points of what constitutes proper
regression procedure. Courts
do not have that luxury and must move quickly to decide such questions.
That notwithstanding, in recent cases, courts are even investigating
the assumptions that underlie regression analysis in order to determine
whether the expert's regression has been properly executed before
rendering decisions. Most
of the regression failures that courts are noting can be traced to
the failure of the model or data to meet the assumptions of the regression
model. See discussion in
sections V.B, V.C and V.D, especially of Bud Hill v. Conagra. Since regression
analysis is vulnerable to attack and exclusion when the assumptions
of the regression model can be shown to have been violated, some consideration
of the assumptions is warranted. Sections V.A.1 and V.A.2, immediately
following, discuss the assumptions of the regression model and the
model itself. But before we take that
up, it is useful to consider briefly what is at stake in the admissibility
decision. In the cases, successful attacks on the expert's methods
are voiding damage awards. Black v. Food Lion, Inc. 171 F.3d
308 (5th Cir. 1999), (plaintiffs disease was not shown
to have been caused by fall), Moore
v. Ashland Chemical Inc. 151 F.3d 269 (5th Cir. 1998); (the district court
did not abuse its discretion in finding that the "analytical gap between
Dr. Jenkins's causation opinion and the scientific knowledge and available
data advanced to support that opinion was too wide. The district court
was entitled to conclude that Dr. Jenkins's causation opinion was
not based on scientific knowledge that would assist the trier of fact
as required by Rule 702 of the Federal Rules of Evidence" The
eleventh Circuit has applied Daubert/Kumho analysis in affirming accountants
testimony in U.S. v. Majors Because
cases often turn on the admissibility of expert testimony, when confronted
with the proffer of adverse regression-based expert testimony it is
prudent to investigate whether the model is vulnerable to attack. Because this often turns on whether
the assumptions that underlie the regression model are met, we consider
those assumptions now. This may seem a daunting task because it is
pretty easy to make those assumptions sound intimidating, but in truth,
the assumptions are not too difficult to command at an intuitive level,
which is usually the level of command that a lawyer needs. 2.
For
lawyers, the central scientific point on regression in the post-Daubert The assumptions that underlie the regression model mostly relate to requirements that the data and models that the scientist is using not be misleading. The assumptions are most easily and precisely expressed as requirements on what are called "regression residuals." Economists use tests of residuals to diagnose the presence of problems and errors in regression in much the same way that physicians use tests of blood, etc. to diagnose the presence of disease processes. The term "residual" is the name given to the (typically small) errors made by the regression model. These errors are assumed to average out to zero and are assumed to have a constant standard deviation, a term with we will associate some intuition directly. The residuals are assumed to be uncorrelated with each other and they are assumed to be normally distributed, which basically means that they have the familiar bell-curve shape normally associated with exam grades like the LSAT or Bar exam grades. The important expert testimony cases analyzed in the balance of the chapter address the results of the failure of these assumptions.
3.
The
basic regression model is known as the ordinary least squares (OLS)
model. When the assumptions of the OLS model are met, the estimates
that the model generates have an important set of "desirable properties,"
that allow them to establish inferences that meet the Daubert criteria for admissibility
of expert testimony. When these assumptions are met OLS estimates
are said to be "BLUE," an acronym for "best, linear, unbiased estimate
See Kmenta, Elements of Econometrics, at 161.
This chapter considers only linear estimators because other
classes of estimators are somewhat esoteric and rarely found in litigation. A BLUE estimator is desirable because
it is possesses the desirable characteristics of being "best" and
"unbiased." Id.
a.
Intuitively speaking, an estimate is unbiased if, in repeated
trials, it misses in one direction with the same propensity as it
misses in the opposite direction.
Its errors are sort of like those of darts thrown at a bulls-eye
target by a sighted, competent, dart player. The throws are pretty well aimed
at the right target. They
group around the bulls-eye, although they may only rarely actually
hit the bulls-eye. Unbiased estimates have error, but the errors are
distributed around the right target.
By way of contrast, a biased estimator is not like a poor dart
player. The poor dart
player still centers his throws around the target, even though he
may miss very badly on frequent attempts.
Biased estimates are distributed around a different target. They are like the throws of a dart
player who is looking through distorted glasses that make the target
seem to be somewhere that it is not.
The resulting dart throws will now be centered around something
that is not the bulls-eye, maybe the lamp next to the dart board. A biased estimator has errors that
center on a target other than the target alleged.
b.
Standard Deviation
and Best Estimates.
Scientists have a measure of the size of a typical error of
an estimate that is known as the "standard deviation." This is a concept that has its most
elegant representation as an algebraic expression, but I have invited
thousands of beginning university finance and economics students to
think, on an intuitive level, of the standard deviation as being a
"typical deviation," Lawyers will rarely
need to be concerned with the actual calculation of the standard deviation
(which begins with the errors contained in a group of estimates and
weights these errors in ways that take account of their size and frequency
and then produces a number that represents the typical error made
in the estimates). One can understand the concept by imagining a man
who weighs 200 pounds going into WalMart and buying five different
scales, all of the same model.
Imagine that the man weighs himself on each scale and that
the five scales measure the man's weight as 197, 199, 200, 201 and
203 pounds. Since the man weighs 200 pounds,
the errors in these measurements are -3, -1, 0, 1 and 3. The standard deviation of these
errors is 2 (take my word for it).
Comparing the actual deviations of 3, 3, 1, 1 and 0 pounds
to the standard deviation of 2, a deviation of 2 seems "typical" of
the actual deviations associated with the different scales' measurements
( and therefore measurement errors) of the man's weight. If five scales of a different
model measured the man's weight as 190, 195, 200, 205, and 210, the
standard deviation of those scales would be 7.1 (take my word for
it). Comparing the standard
deviation of 7.1 to the actual deviations (or errors) of 0, 5, 5,
10 and 10, a deviation of 7.1 appears to be a fairly typical deviation. These example's respective
deviations of 2 and 7.1 are "typical" in the sense that the actual
errors cluster around them, some being higher, and some being lower. The estimator with the
smallest standard deviation of all possible estimators is called the
"best" estimator. Such
an estimator is also said to be "efficient." Kmenta at 158. This use of the term
"efficient" is unrelated to the use of "efficient" in the phrase " The best estimator is
like the best dart player in a group. A weak dart player, even when throwing
at the right target will miss often and badly. A good dart player will still miss
the bullseye, but his throws will tend to be more closely clustered
around the bullseye. The
"best"
c.
Often one of the most daunting impediments to understanding
the ideas of a foreign discipline is the terminology specific to that
discipline. In regression
lingo the "independent variable" is said to "explain" the "dependent
variable." For example, age and computer skill (independent variables)
might "explain" separation rates, (the dependent variable). A mnemonic
is that the dependent variable "depends" on the independent variables,
while the independent variables are assumed to be independent of other
variables in the model. Regression
equations are typically written in the form Y=a+bX, where "a" is the
value of Y when X is zero and "b" tells how much Y rises when X rises
by one unit of measure. This
number "b" is called the slope coefficient.
We have discussed that an estimator is "unbiased" if it is
correct on average, Kmenta at 162, while a biased estimator produces
estimates that differ systematically from the true value that they
are attempting to estimate.
A related notion to that of "unbiasedness" is that of "consistency"
which approximately means that as more observations are added to the
data used for calculations, the estimate becomes more accurate. Kmenta
at 165. Estimators that meet the regression assumptions are also consistent,
and consistency is perhaps the most important of the OLS "desirable
characteristics." Consistency is absolutely not an
empty technical concept.
d.
The first two of these are the statistical manifestations of
well-known conceptual problems and the presence of either of them
could cause the exclusion of proffered expert testimony ir they are
too severe. First, model
misspecification means that an analyst has, for example, modeled termination
rates as depending on age, when those termination rates could depend
on computer skill.
1.
A model is said to be "misspecified" if the true relationship
between the two variables of interest is given by one equation, but
the economist models the relationship as excluding some of the important
variables. Kmenta, at
391-405 (discussing model specification and econometric tests to determine
if a model is misspecified.)
See also Judge et. al, The
Theory and Practice of Econometrics
2.
Errors in the variables is the name that economists give to
the problem of attempting to estimate a relationship using data that
is measured with error. Kmenta
at 307-9. If a variable is measured with error, the estimators are
inconsistent, Kmenta at 309, and the hypothesis tests seem not to
meet the Daubert standards
because no error rate calculations can be carried out, the technique
has not been peer reviewed with approval and is not generally accepted. There is still technically an hypothesis
test, but the test is known to be incorrect in its execution, and
likely incorrect in its conclusion. Lest this be dismissed as a statistical
technicality, consider a simple example. A nurse takes the temperature of
a patient immediately after the patient has consumed a cup of hot
coffee. The thermometer
registers 106 degrees.
3.
Heteroskedasticity is another failure of the regression assumptions,
in particular, the failure of the residuals to have a constant variance
through time. This is
a problem that perhaps most often manifests in the event study method
discussed in section V.B. Heteroskedasticity does not cause inconsistency,
but does present its own set of problems that may keep heteroskedastic
regression from meeting Daubert’s
To the extent that the OLS estimates are best, linear and unbiased
(BLUE) and consistent, hypothesis tests done with them should meet
the Daubert standards. Hypothesis tests done with parameters
estimated by misspecified models or with inaccurately measured data
seem to fail the Daubert
standards, and at a minimum they seem to fail the testing, error rate
and general acceptance criteria. It cannot be over emphasized that
hypothesis tests that fail the Daubert
standard should not be excluded from being introduced into evidence
simply because they fail to meet a technicality that the Supreme Court
has imposed. Such hypothesis
tests should be excluded from evidence because they are wrong
|
| |||||||||