Econometric Issues in
Business Litigation Expert Testimony
Lawyers are increasingly required to deal with a range of econometric and statistical concepts in the courtroom. This section discusses statistical techniques used in litigation, focusing on how regression analysis is used as the basis of expert testimony in several business litigation practice areas.
Regression analysis informs a range of legal questions in important ways because it can be used to detect relationships between variables and to distinguish true relationships from spurious ones. In Re: Polypropylene Carpet Antitrust Litigation, 996 F. Supp. 18 (N.D.Ga, 1997), at *25 (Multiple regression analysis is a statistical tool for understanding the relationships among two or more 'variables'. . . . Use of regression analysis allows one to . . . sort out those correlations that are spurious from those that are not. See especially the case's citations to Rubinfeld, Econometrics in the Courtroom, 85 Colum. L. Rev. 1048 and Sobel v. Yeshiva Univ, 83 F.2d 18 (2d. Cir. 1988), both of which are cited repeatedly throughout this chapter.
Regression can be used to show that a publicly traded company's stock fell in price as a result of the financial press revealing irregularities in the company's financial statement. It can be used to show the absence of a stock price reaction to such a disclosure. Regression can be used to discern illegal relationships between age and termination rates or between gender and rate of pay. Sheehan v. Daily Racing Form, Inc.104 F.3d 940 (7th Cir. 1997) It can also be used to sort out whether the variable correlated with the termination rate is actually age or if termination patterns are being driven by lack of computer sophistication, a characteristic notably present in older workers. Sheehan.
Just to fix the idea of a spurious correlation, if a company adopts a new computer-based method in its production and lays off its non-computer literate employees, the terminations will tend to fall on older workers. If such terminated workers bring suit they will likely be able to introduce evidence of a strong correlation between age and layoff rates even if the layoff was done with no regard to age. Regression and certain other statistical tools can inform a decision as to whether that correlation is real or if it is spurious.
Like many powerful tools, regression must be used properly if it is to inform, and it has great potential to mislead if it is used improperly. There are carefully articulated standards for the use of regression analysis and courts are beginning to look carefully at whether regression-based testimony comports with the established standards for use of the regression model. See, for example, Estate of Bud Hill v. ConAgra, 1997 U.S. Dist. Lexis (1997, N.D.Ga.). The courts cite to some of the peer-reviewed regression literature and this chapter cites to those and other intuitive scientific sources
Perhaps the best news is that most of the scientific (if not the statistical) issues presented in the cases are almost completely non-mathematical and have intuitive explanations that bring home the central issues. Even most of the statistics have an intuition to them that allows full discussion with only minimal mathematics. This section concentrates on these intuitive notions. For the more ambitious, one day there will be an appendix that collects some of the more technical material and builds on the intuitive discussions presented here.
The Legal View of Regression Analysis
Properly executed regression studies apparently meet all of the Daubert criteria. Properly executed regression studies perform tests and specify the error rates associated with those tests. They are pervasively published in the peer-reviewed scientific journals of a panoply of scientific disciplines and properly executed regression is a generally accepted scientific research technique in dozens of disciplines. Regression has been widely used by scientists in a wide range of non-litigation settings for purely scientific purposes.
Of course, the fact that properly executed regression studies apparently meet all of the Daubert criteria makes "properly executed" the battle ground. In the academic literature, researchers may spend years debating the finer points of what constitutes proper regression procedure. Courts do not have that luxury and must move quickly to decide such questions. That notwithstanding, in recent cases, courts are even investigating the assumptions that underlie regression analysis in order to determine whether the expert's regression has been properly executed before rendering decisions. Most of the regression failures that courts are noting can be traced to the failure of the model or data to meet the assumptions of the regression model. See discussion in sections V.B, V.C and V.D, especially of Bud Hill v. Conagra. Since regression analysis is vulnerable to attack and exclusion when the assumptions of the regression model can be shown to have been violated, some consideration of the assumptions is warranted. Sections V.A.1 and V.A.2, immediately following, discuss the assumptions of the regression model and the model itself. These sections are the most technical in the chapter, but one needs only a passing familiarity with them to see their application in the cases.
But before we take that up, it is useful to consider briefly what is at stake in the admissibility decision. In the cases, successful attacks on the expert's methods are voiding damage awards. Black v. Food Lion, Inc. 171 F.3d 308 (5th Cir. 1999), (plaintiffs disease was not shown to have been caused by fall), Moore v. Ashland Chemical Inc. 151 F.3d 269 (5th Cir. 1998); (the district court did not abuse its discretion in finding that the "analytical gap between Dr. Jenkins's causation opinion and the scientific knowledge and available data advanced to support that opinion was too wide. The district court was entitled to conclude that Dr. Jenkins's causation opinion was not based on scientific knowledge that would assist the trier of fact as required by Rule 702 of the Federal Rules of Evidence") and Dupont v. Castillo, 1999 Fla. App. LEXIS 1447 (excluding teratologist's testimony that contact with pesticide was cause of birth defect).
The eleventh Circuit has applied Daubert/Kumho analysis in affirming accountants testimony in U.S. v. Majors, 196 F.3d 1206 (1999), and sections V.B, V.C and V.D discuss securities, antitrust and employment discrimination cases where expert testimony was successfully defended.
Because cases often turn on the admissibility of expert testimony, when confronted with the proffer of adverse regression-based expert testimony it is prudent to investigate whether the model is vulnerable to attack. Because this often turns on whether the assumptions that underlie the regression model are met, we consider those assumptions now. This may seem a daunting task because it is pretty easy to make those assumptions sound intimidating, but in truth, the assumptions are not too difficult to command at an intuitive level, which is usually the level of command that a lawyer needs. And it's only one page.
Lawyering and Regression Assumptions
For lawyers, the central scientific point on regression in the post-Daubert era is that if the regression model is properly done, then the regression estimators have a set of desirable properties that allow economists to do the testing and error rate analysis that is required for admissibility as expert testimony in federal courts. If counsel can establish that the regression is improperly done (perhaps because the regression assumptions have been substantially violated), the testimonyís scientific basis is discredited and the testimony loses evidentiary reliability.
The assumptions that underlie the regression model mostly relate to requirements that the data and models that the scientist is using not be misleading. The assumptions are most easily and precisely expressed as requirements on what are called "regression residuals." Economists use tests of residuals to diagnose the presence of problems and errors in regression in much the same way that physicians use tests of blood, etc. to diagnose the presence of disease processes.
The term "residual" is the name given to the (typically small) errors made by the regression model. These errors are assumed to average out to zero and are assumed to have a constant standard deviation, a term with we will associate some intuition directly. The residuals are assumed to be uncorrelated with each other and they are assumed to be normally distributed, which basically means that they have the familiar bell-curve shape normally associated with exam grades like the LSAT or Bar exam grades. The important expert testimony cases analyzed in the balance of the chapter address the results of the failure of these assumptions.
The basic regression model is known as the ordinary least squares (OLS) model. When the assumptions of the OLS model are met, the estimates that the model generates have an important set of "desirable properties," that allow them to establish inferences that meet the Daubert criteria for admissibility of expert testimony. When these assumptions are met OLS estimates are said to be "BLUE," an acronym for "best, linear, unbiased estimate See Kmenta, Elements of Econometrics, at 161. Strictly speaking, Kmenta discusses these as properties of estimators, not of estimates, however, BLUE estimators produce BLUE estimates and the distinction is not important for present purposes.
This chapter considers only linear estimators because other classes of estimators are somewhat esoteric and rarely found in litigation. A BLUE estimator is desirable because it is possesses the desirable characteristics of being "best" and "unbiased." Id. Both of these have mathematical definitions that are discussed in Kmenta but this section concentrates on developing the intuition of these two characteristics. Analogies to throwing darts at a target are useful in developing this intuition.
a. Unbiased Estimates
Intuitively speaking, an estimate is unbiased if, in repeated trials, it misses in one direction with the same propensity as it misses in the opposite direction. Its errors are sort of like those of darts thrown at a bulls-eye target by a sighted, competent, dart player. The throws are pretty well aimed at the right target. They group around the bulls-eye, although they may only rarely actually hit the bulls-eye. Unbiased estimates have error, but the errors are distributed around the right target.
By way of contrast, a biased estimator is not like a poor dart player. The poor dart player still centers his throws around the target, even though he may miss very badly on frequent attempts. Biased estimates are distributed around a different target. They are like the throws of a dart player who is looking through distorted glasses that make the target seem to be somewhere that it is not. The resulting dart throws will now be centered around something that is not the bulls-eye, maybe the lamp next to the dart board. A biased estimator has errors that center on a target other than the target alleged. Such an estimator does not measure what it purports to measure. It measures something else.
b. Standard Deviation and Best Estimates.
Scientists have a measure of the size of a typical error of an estimate that is known as the "standard deviation." This is a concept that has its most elegant representation as an algebraic expression, but I have invited thousands of beginning university finance and economics students to think, on an intuitive level, of the standard deviation as being a "typical deviation," or the typical error that an estimator makes when making an estimate.
Lawyers will rarely need to be concerned with the actual calculation of the standard deviation (which begins with the errors contained in a group of estimates and weights these errors in ways that take account of their size and frequency and then produces a number that represents the typical error made in the estimates). One can understand the concept by imagining a man who weighs 200 pounds going into WalMart and buying five different scales, all of the same model. Imagine that the man weighs himself on each scale and that the five scales measure the man's weight as 197, 199, 200, 201 and 203 pounds. Since the man weighs 200 pounds, the errors in these measurements are -3, -1, 0, 1 and 3. The standard deviation of these errors is 2 (take my word for it). Comparing the actual deviations of 3, 3, 1, 1 and 0 pounds to the standard deviation of 2, a deviation of 2 seems "typical" of the actual deviations associated with the different scales' measurements ( and therefore measurement errors) of the man's weight. If five scales of a different model measured the man's weight as 190, 195, 200, 205, and 210, the standard deviation of those scales would be 7.1 (take my word for it). Comparing the standard deviation of 7.1 to the actual deviations (or errors) of 0, 5, 5, 10 and 10, a deviation of 7.1 appears to be a fairly typical deviation. These example's respective deviations of 2 and 7.1 are "typical" in the sense that the actual errors cluster around them, some being higher, and some being lower.
The estimator with the smallest standard deviation of all possible estimators is called the "best" estimator. Such an estimator is also said to be "efficient." Kmenta at 158. This use of the term "efficient" is unrelated to the use of "efficient" in the phrase "efficient markets.
The best estimator is like the best dart player in a group. A weak dart player, even when throwing at the right target will miss often and badly. A good dart player will still miss the bullseye, but his throws will tend to be more closely clustered around the bullseye. The "best" estimator will produce estimates that are more tightly grouped around the true value of the variable that it is estimating than will any other estimator. Estimators with small standard deviations are desirable for a variety of reasons, one of which is that a small standard deviation makes for more powerful hypothesis tests.
c. Some Regression Terminology
Often one of the most daunting impediments to understanding the ideas of a foreign discipline is the terminology specific to that discipline. In regression lingo the "independent variable" is said to "explain" the "dependent variable." For example, age and computer skill (independent variables) might "explain" separation rates, (the dependent variable). A mnemonic is that the dependent variable "depends" on the independent variables, while the independent variables are assumed to be independent of other variables in the model. Regression equations are typically written in the form Y=a+bX, where "a" is the value of Y when X is zero and "b" tells how much Y rises when X rises by one unit of measure. This number "b" is called the slope coefficient. We have discussed that an estimator is "unbiased" if it is correct on average, Kmenta at 162, while a biased estimator produces estimates that differ systematically from the true value that they are attempting to estimate. A related notion to that of "unbiasedness" is that of "consistency" which approximately means that as more observations are added to the data used for calculations, the estimate becomes more accurate. Kmenta at 165. Estimators that meet the regression assumptions are also consistent, and consistency is perhaps the most important of the OLS "desirable characteristics." Consistency is absolutely not an empty technical concept. Rather, it plays a central and critical role not only in science, but also in the law, because if the estimates of the regression parameters are inconsistent then the hypothesis tests are scientifically invalid and evidentiarily invalid, as are the error rates associated with those tests.
d. Three Regression Problems Common in the Cases
The first two of these are the statistical manifestations of well-known conceptual problems and the presence of either of them could cause the exclusion of proffered expert testimony ir they are too severe. First, model misspecification means that an analyst has, for example, modeled termination rates as depending on age, when those termination rates could depend on computer skill. Second, errors in the variables means that data used to model a relationship have been measured with particular kinds of error, so that the data used in the analysis is itself corrupt.
1. Model Misspecification.
A model is said to be "misspecified" if the true relationship between the two variables of interest is given by one equation, but the economist models the relationship as excluding some of the important variables. Kmenta, at 391-405 (discussing model specification and econometric tests to determine if a model is misspecified.) See also Judge et. al, The Theory and Practice of Econometrics at 405-41 (providing an overview of regression model specification tests). Regression estimates from misspecified models are considered scientifically unreliable. This is an important consideration in a range of courtroom situations and comes to play in the securities cases discussed in section V.B, the antitrust cases discussed in section V.C and in the employment discrimination cases discussed in section V.D.
2. Errors in the variables.
Errors in the variables is the name that economists give to the problem of attempting to estimate a relationship using data that is measured with error. Kmenta at 307-9. If a variable is measured with error, the estimators are inconsistent, Kmenta at 309, and the hypothesis tests seem not to meet the Daubert standards because no error rate calculations can be carried out, the technique has not been peer reviewed with approval and is not generally accepted. There is still technically an hypothesis test, but the test is known to be incorrect in its execution, and likely incorrect in its conclusion. Lest this be dismissed as a statistical technicality, consider a simple example. A nurse takes the temperature of a patient immediately after the patient has consumed a cup of hot coffee. The thermometer registers 106 degrees. The nurse can not correctly reject the hypothesis that the patient has a normal temperature, because the patient's temperature has been measured with error.
Heteroskedasticity is another failure of the regression assumptions, in particular, the failure of the residuals to have a constant variance through time. This is a problem that perhaps most often manifests in the event study method discussed in section V.B. Heteroskedasticity does not cause inconsistency, but does present its own set of problems that may keep heteroskedastic regression from meeting Daubertís criteria for assessing the reliability of expert testimony.
To the extent that the OLS estimates are best, linear and unbiased (BLUE) and consistent, hypothesis tests done with them should meet the Daubert standards. Hypothesis tests done with parameters estimated by misspecified models or with inaccurately measured data seem to fail the Daubert standards, and at a minimum they seem to fail the testing, error rate and general acceptance criteria. It cannot be over emphasized that hypothesis tests that fail the Daubert standard should not be excluded from being introduced into evidence simply because they fail to meet a technicality that the Supreme Court has imposed. Such hypothesis tests should be excluded from evidence because they are wrong.