Published on

10-Jan-2017View

112Download

1

Transcript

QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIALSCIENCES

NIKHITA LUTHRA

1. INTRODUCTION

1.1. Motivation. The sixth highest killer of Americans is Alzheimers Disease. This fatal

neurological disease has stumped researchers from wealthy countries for over two hundred

years. A recent discovery, however, oers hope to the one-in-three senior citizens who will

develop the disease. Surprisingly, that hope doesnt come from the white walls of a medical

center at a well endowed research university in the States; it comes from the remote hills

of Antioquia, a village outside Medellin, Colombia. After going through historical priests

records at local churches, Dr. Francisco Lopera of Medellins University of Antioquia dis-

covered that members of a particular family had recorded early onset of Alzheimers for over

300 years. More research revealed that in this family, a person has a 50 percent chance of

inheriting a gene, PSEN1, that guarantees early onset of Alzheimers. After the National

Health Institute picked up word of this once in a lifetime experiment opportunity, resources

were quickly gathered to run clinical trials on an antibody that targets the protein amyloid

which is associated with the disease.

The unearthing of this family oers a researchers dream: the perfect conditions to run

a natural experiment. Because subjects and researchers didnt want to know which members

of the family carried the fatal gene, they could randomly apply treatments in a double blind

setting. A double blind study is when neither the subjects or the researchers know which

groups subjects are assigned to. In addition, the fact that members of the family had mostly

stayed in the same location with similar living situations meant that variables that dier

based o location of residence could be controlled for.

Most researchers are not as lucky to find the perfect conditions to infer that a treatment

caused the prevention or cure of a disease. In the real world, treatments tested in the social

sciences are conditionally dependent on covariates. For example, in medicine, drugs are only

tested on sick patients. In economics, researchers test a welfare benefit on the disadvantaged.

Without randomization, selection bias arises: are dierences in the outcome caused by the

selection of who is in the treatment group rather than the treatment itself? This inability

to infer causality has plagued social scientists for decades.1

2 NIKHITA LUTHRA

1.2. Overview. Luckily, statistics oers some solutions to overcome the inability to ran-

domly apply treatments. When random assignment is missing, matching samples based on

particular variables attempts to reduce the bias of estimates of treatment eects. Recent

literature has focused on matching methods that attempt to reduce the bias of confounding

variables that systematically dier between control and treatment populations. Rather than

focusing on various matching methods, we will derive two values, and max

, which are used

to evaluate the success of a matching method, both for the situation when we want to reduce

the bias of a single variable (section 2) and also when we want to reduce the bias of many

covariates (section 3). To do this, we will first describe how to estimate the treatment eect

on a particular outcome variable. Then, we will construct , which captures the reduction

in bias of an estimator due to matching. Finally, we will derive max

, the maximum

percent reduction in bias. This value is important because it intuitively acts as an up-

per bound on how much hope we can have in matchings ability to reduce bias, ultimately

saving the social sciences from their inability to infer causality.

2. MATCHING ON ONE VARIABLE

When is it the case that adjusting for some variable X or variables X gives unbiased

estimates of the treatment eect? This happens whenever the treatment assignment is

strongly ignorable. If r1 is the outcome after receiving the treatment and r0 is the outcome

after not receiving the treatment, treatment assignment is strongly ignorable when: (i) the

responses (r1, r0) are conditionally independent of the treatment z given X and (ii) at each

value of X, theres a positive probability of receiving each treatment [3]. These conditions

are represented mathematically:

Pr(r1, r0, z|X) = Pr(r1, r0|x)(Pr(x|X)) and0 < Pr(z = 1|X) < 1 for all possible X.

Thus, the goal of matching is to construct samples that make treatment assignment as

ignorable as possible. Figure 1 gives an overview of the process of matching.

2.1. Estimating the treatment eect. Imagine the situation where we are interested in

testing the eect of a particular drug on reducing cholesterol levels, represented by Y , a

continuous dependent variable. We begin by assuming that we only have 1 variable, X, to

match on. For example, let X be the age of a patient. We assign X = 1 for patients over 50

years of age and X = 0 for patients under 50 years of age. We want to remove the eect of X

on Y . Suppose we have two populations, P1 and P2, where P1 is the population of patients

that will receive the treatment (because they have high cholesterol), and P2 is the population

of patients that will not receive the treatment (also known as the control population). The

distribution of the matching variable X diers in P1 and P2; patients in the treatment group

QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 3

Figure 1. Summary of stages in matched sampling [3]

tend to be older than those in the control group. Then, f1(X, Y ) 6= f2(X, Y ) where f1, f2are the joint distributions for X and Y in P1, P2, respectively.

The remainder of this section will follow the approach of Rubin [4]. Let G1 be a random

sample from population P1 of size N . Let G2 be a random sample from population P2 of

size rN , r > 1. All subjects in G1 and G2 have recorded values for the random variable X.

Let us choose a sub sample of G2 of size N using a specified matching method. Call this

sub sample G2. Now, we want to estimate the eect of a treatment using G1, G2, both ofsize N . Note that if r = 1, G2 would be a random sample from P2, and matching would

4 NIKHITA LUTHRA

not be able to remove bias due to X. If r = 1, then infinite matches would be obtained,and all bias due to X could be removed.

Definition 1. Define the response surface for Y in Pi

at X = x, denoted Ri

(x) as such:

R

i

(x) = E(Y |X = x).

In our example, R1(old) = Echolesterol level | old

gives the expectation of the choles-

terol level for the treatment group given that the patient is old. R2(old) = Echolesterol

level | oldgives the expectation for the cholesterol level for the control group given that the

patient is old. R1(young) = Echolesterol level | young

gives the expectation for the choles-

terol level for the treated group given that the patient is young. R2(young) = Echolesterol

level | younggives the expectation for the cholesterol level for the control group given that

the patient is young.

Definition 2. The eect of the treatment at X = x is: R1(x)R2(x).

Following our example, R1(old) R2(old) reveals the eect of the treatment amongthe old patients. It is the expected cholesterol levels for old people who received the treat-

ment minus the expected cholesterol levels for old people who didnt receive the treatment.

R1(young) R2(young) reveals the eect of the treatment among the young patients. Itis the expected cholesterol level for young people who received the treatment minus the

expected cholesterol level for young people who didnt receive the treatment.

There are two possible cases for the eect of the treatment. The eect of the treatment

can be constant or it can vary with x. These two cases, also referred to as parallel and

non parallel response surfaces, are defined.

Definition 3. If R1(x)R2(x) is constant and independent of X, we call R1(x) and R2(x)parallel response surfaces. In this case, the goal is to estimate this constant dierence.

Parallel response surfaces are depicted in Figure 2.

Definition 4. If R1(x) R2(x) is not constant across all values of X, we call R1(x) andR2(x) non parallel response surfaces. In this case, the goal is to estimate the average

dierence between R1(x) and R2(x) across all x. Non parallel response surfaces are depicted

in Figure 3.

Definition 5. In both cases, we are interested in estimating the treatment eect among

the control and treated populations, , which is equal to the expected dierence in the

response surface: = E1R1(x)R2(x)

.

Allow y1j, x1j to represent the values of Y , X for the jth subject in G1 and y2j, x2jto represent the values of Y , X for the jth subject in G2, where j = 1...N . Then yij =

QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 5

Figure 2. Parallel uni variate response surfaces [4]

Figure 3. Nonparallel uni variate response surfaces [4]

R

i

(xij

)+ eij

, i = 1, 2; j = 1...N . Ec

(eij

) = 0. Ec

= the conditional expectation given the xij

.

We can use this notation to now express an estimator for the treatment eect that is based

o of data we can actually collect from our sub samples.

6 NIKHITA LUTHRA

Definition 6. The estimator for the treatment eect is the average dierence be-

tween the non-parallel response surfaces (or a constant dierence if the response surfaces are

parallel):

0 =1

N

X(Y1i)

1

N

X(Y2i) = y1. y2.

This estimator takes in the data after running the study and outputs an estimate which

is a numerical value that estimates the eect of the drug on cholestoral levels. That numerical

value is known as the estimate. Note that the estimator is a function while the estimate is

a number.

2.2. Bias of estimator. Now that we have an estimator for the treatment eect, we need

a way to assess whether the estimator with matching is better than the estimator without

matching. Essentially, we want to estimate how much matching can reduce the bias of an

estimator of the treatment eect. Let E be the expectation over the distribution of X in

the matched samples. Let E2 be the expectation over the distribution of X in the matchedG2 sub samples.

Theorem 1. Using the definition of bias, the expected bias of 0 over matched sam-

pling is EEc

(0 ) = E1R2(x) E2 R2(x).

Proof. Although the proof was not in Rubins original 1973 paper, is quite easy to derive.

Using the above definitions of 0 and ,

EE

c

(0 ) = EEch(y1. y2.) E1

R1(x)R2(x)

i.

Since expectations add,

EE

c

(0 ) = EEc(y1.) EEc(y2.) EEcE1R1(x)

+ EE

c

E1

R2(x)

.

We know EEc

(y1.) = E1R1(x) and EEc(y2.) = E2 R2(x) [4], so we can rewrite andsimplify:

EE

c

(0 ) = E1R1(x) E2 R2(x) E1R1(x) + E1R2(x)

= E1R2(x) E2 R2(x).

If the distribution of X in the G2 is the same as that in the random sample G1, thenE1R2(x) = E2 R2(x) and 0 has 0 expected bias. If r = 1 (or in other words if G2 is arandom sample from P2), then the expected bias of = E1R2(x) E2R2(x) where E2 =expectation over the distribution of X in P2. is the estimator of the treatment eect for

the unmatched samples and E1R2(x) E2R2(x) is the bias of that estimator.

QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 7

2.3. Measuring reduction in bias due to matching. Now we wish to determine how

much less biased the 0 based on matched sampling is compared to the based on ran-

dom sampling. We will use the percent reduction in expected bias to measure this. It

is essentially the expected bias for matched sampling over the expected bias for random

sampling:

1001 E1R2(x) E2 R2(x)

E1R2(x) E2R2(x)

.

The numerator, E1R2(x) E2 R2(x), represents the expected bias from matched sam-pling and the denominator, E1R2(x) E2R2(x), represents the expected bias from randomsampling. The terms that dier are E2R2(x) and E2 R2(x). Multipling by a commondenominator and simplifying yields the expression:

100E2 R2(x) E2R2(x)E1R2(x) E2R2(x)

We can see from this equation that the percent reduction in bias depends only on the

distribution of X in P1 and P2 and G2 and the response surface in P2. We assume thatthe response surface in P2 is linear, or can be estimated by a linear regression: R2(x) =

2 + 2(x 2) where 2 = mean of Y in P2, i = mean of X in Pi, and 2 = regressioncoecient of Y on X in P2. We can use this to rewrite E1R2(x) E2R2(x) = 2(1 2)and E2 R2(x)E2R2(x) = 2(2 2) where 2 = E2 (X) in G2. Substituting in thesevalues and this derivation now gives the following theorem:

Theorem 2. If G1 is a random sample and the response surface in P2 is linear, or can be

estimated by a linear approximation, the percent reduction in bias due to matched

sampling is:

= 100

2 21 2

!

.

This result allows us to measure the amount a matching method can reduce bias.

2.4. Finding the maximum possible bias reduction. Various matching methods will

yield dierent s. In addition to being able to compare the s of dierent matching methods

to each other, we also want an idea for how good a matching method is on its own. In other

words, it can be costly to apply many dierent matching methods to see which one has the

greatest percent reduction in bias. In real life, a researcher might just pick a single matching

method, but without trying other methods, wants see how successful or unsuccessful the

matching is.

This is why it is crucial to be able to calculate the maximum possible percent reduction

in bias due to matched sampling. If we can find an upper bound on how much we can

decrease the bias by, then it is much easier to compare a single matching method to that

8 NIKHITA LUTHRA

upper bound rather than repeating the study many times with dierent matching methods.

To get an expression for the maximum percent reduction in bias, we first propose a lemma

that is not proved here but can be found in Rubins work [4].

Lemma 3. We assume that in population Pi

, X has mean i

, var 2i

and that Xii

follows

f

i

, i = 1, 2. The initial bias in X is:

B =1 2q

21+

22

2

> 0.

This makes sense intuitively; in our example, the bias in the age is the dierence in

the mean age between the treatment and control population over the spread of the age in

both populations. From this we can see that if 21 = 22, then the bias is just the number of

standard deviations between the means of X in each of the populations. We are now ready

to present the maximum percent reduction in bias and its proof for the case when we are

matching on one X variable.

Theorem 4.

max

= 1002(r,N)

B

q1 +

21

222

where 2(r,N) = expected value of the average of the N largest observations from a sample

of size rN from f2. This sample could be the G2 sample we selected before constructing G2.

Proof. We have commented and added to following proof, which is adapted from Rubins

version [4]. Earlier, we assumed that 1 > 2. This happens to be consistent with our

example: that the average age in the treated population is higher than the average age in

the control population, since cholesterol is positively correlated with age. Then, is the

largest whenever the average age of the control subsample, 2 = E(x2.), is the greatest,which happens when we pick the oldest N subjects from G2 as making up the matched

subsample, G2. Intuitively, this means that matching reduces the bias of age dierencesbetween populations the most whenever the control sub sample has patients who are as close

in age to the sample of the treated patients.

The expected value of the N largest values from the G2 sample of size rN is: 2 +

2(r,N). Since the maximum reduction in bias is dependent on how large 2 is, and 2smaximum depends on 2(r,N), the maximum percent reduction in bias is the ratio of this

value over the true dierences in the x variable between the populations. The maximum

value of is:

max

= 10022(r,N)

1 2.

QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 9

Using the lemma from above, we can algebraically manipulate this result to get max

in

terms of B:

max

= 1002(r,N)

B

q1 +

21

222

This result is important because for a particular matching method, we can now compare

to the min(100, max

). That tells us how well a matching method obtains a G2 that hasan expected average of X that is close to that average in G1. If max is small, there is no

matching method that does this. If max

is large, most matching methods should perform

well. The special case where we can find parameters for max

such that max

100 impliesan existence of a matching method that obtains a 100 percent reduction in expected bias.

It is worth noting that max

is positively related to r,N and negatively related to B, 21

22,

holding other variables constant. If a researcher wants to increase the max

, then he or she

can adjust r and N .

Now that we have derived this important metric by which to measure the eectiveness

of a matching method on a single X, it is natural to apply the same process on multiple

covariates. Following our example, there might indeed be bias in the estimator for the treat-

ment eect not just due to dierences in age between the control and treatment population,

but also due to systematic dierences in other variables including weight, genetic history,

lifestyle choices, etc.

3. MATCHING ON MULTIPLE COVARIATES

Now, the objective is to estimate the eect of a binary treatment variable on many

dependent variables. The population can still be split into those who receive the treatment

and those who do not. We will refer to P1 as the population of those given the treatment,

and P2 as the population of those not given the treatment. The challenge is the same as

it was with one X variable: the treatment assignment is not random. We will solve this in

the same way as before: by finding samples from P1 and P2 in which the distribution of X

are almost the same. X is a vector that includes p matching variables (before p = 1). For

example, if we are estimating the eect of a drug on reducing cholesterol levels, X might be

a vector consisting of age, weight, and average amount of hours spent exercising in a week.

We will assume for simplicitys sake that all elements of X are not categorical. (So now, age

is no longer a 1 for old and a 0 for young, but is a number).

The process for constructing sub samples is similar to before. The approach of this

section will follow Rubins 1976 paper [5]. First, choose random samples G1 and G2 of size

N1 and N2 from P1 and P2 respectively, where N1 N2. Then record p matching variablesfor all individuals in G1 and G2. Using some matching method, find matched sub samples

10 NIKHITA LUTHRA

G1 and G2 of sizes N1 and N2, where G1 is chosen from G1 and G2 is chosen from G2[5].

One dierence that now arises in constructing the matched sub samples is that we want

to make sure that by matching samples to minimize the dierences in age, for example,

between the treated and control group, we dont increase the dierences in some other

variable, such as amount of hours spent exercising. Whatever matching method we use to

construct the subsamples must thus have a very special property: it should be equal percent

bias reducing (EPBR). The meaning of EPBR and the conditions under which a matching

method is EPBR is presented in the theorem below, summarizing Rubins discussion [5]:

Theorem 5. If X is the vector of covariates, then let u1 be the finite mean vector for P1,

and u2 be the finite mean vector for P2. For example, u1 consists of the mean age, weight,

and average weekly exercise for the treatment population, and u2 consists of the mean age,

weight, and average weekly exercise for the control population. The true values for these

means are unknown.

Let ui

be the expected mean vector of X in the sub samples Gi

for i = 1, 2. Thesevectors can be obtained by matching: given (i) fixed samples of sizes N1, N2, (ii) fixed

distributions of X in both P1 and P2, and (iii) a fixed matched method for obtaining sub

samples, repeating the process of randomly sampling and matching will result in the average

of the mean vectors of the matched sub samples converging to u1 and u2.We consider a matching method EPBR for X if (u1 u2) = (u1u2) where is a

constant. The interpretation of this is that the percent reduction in the biases of each of the

p matching variables is the same. If a matching method is not EPBR, then certain linear

functions of x increase the bias [2].

Why do we care about selecting a matching method that is EPBR? Looking at the

equation, (u1u2) = (u1u2), the left hand side represents the average mean imbalanceof the covariates in the sub samples and the right hand side represents the average mean

imbalance of the covariates in the populations. Directly stated, the EPBR property implies

that improving balance in the dierence in means on one variable also improves it on all

others (and their linear combinations) by a proportional amount [1]. These matching rules

are the easiest to evaluate when the dependent variables can be any linear combinations

of the covariates, since there is only one particular percent reduction in bias of interest.

Rosenbaum and Rubin overviewed some main EPBR methods and their technicalities can

be found in their paper [2].

3.1. Percent reduction in bias with multiple covariates. Now that we have defined

what it means for a matching method to be EPBR, we are naturally interested in evaluating

how much matching has reduced the bias due to covariates in evaluating a treatment eect.

QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 11

Section 3.2 follows the approach of Rubin [6]. We will now define the percent reduction in

bias, which is how we evaluate dierent EPBR matching methods:

Definition 7. Percent reduction in bias for matching on multiple covariates:

= 100[1 (u1 u2)0

(u1 u2)0

for any vector .

will dier based on the matching method, the distributions of X in the control and

treatment population, the sizes of the random samples and also of the sub samples. This

naturally leads us to the final result of this paper: the maximum percent reduction in bias

matching on multiple covariates using an EPBR method. Similar to the case with only one

X variable, the best case scenario of a given EPBR matching method is the min(100, max

).

The following theorem will define max

. The proof has been omitted because while the

algebra is untidy, the intuition is the same as the case when matching on one variable that

was presented in section 2. Essentially, the maximum percent reduction in bias is when

(i) the members of the randomly selected treatment sample G1 with the smallest expected

values of the covariates are chosen for the treatment sub sample G2 (ii) the members ofthe randomly selected control sample G2 with the largest expected values of the co variates

are chosen for the control sub sample G1. This minimizes the dierences between the two

sub samples. Similar to the situation when matching on one X, the proof also ends with

a substitution of B, the bias formula. If the reader wishes to see a formal proof, it can be

found in [6].

Theorem 6. Maximum percent reduction in bias

Given (a) fixed distributions of X in P1 and P2 with mean vectors u1 and u2 and

covariance matricesP

1 andP

2, (b) fixed sample sizes of G1 and G2, N1 = r1N1 andN2 = r2N2, r1 1, r2 > 1, and (c) fixed sizes of G1 and G2, N1 and N2, the maximumpercent reduction in bias for any matching method that is EPBR for X is:

max

=100

B

p(1 + 21/

22)/2

"+2 (r2, N2)

1

21 (r1, N1)

#,

where:

2i

= P

i

0, the variance of the best linear discriminant with respect to the P2inner product in P

i

, = (u1 u2)P1

2 ,

B = (12)/p

(21 + 22)/2, the number of standard deviations between the means

of X0 in P1 and P2, i = ui0,

+2 (r2, N2) = the expectation of the sample average of the N2 largest of the r2N2randomly chosen observations from F2, where F2 is the distribution of X

0 in P2

12 NIKHITA LUTHRA

normed to have zero mean and unit variance, i.e., the distribution of (X u2)0/2in P2, and

1 (r1, N1) = the expectation of the sample averages of the N1 smallest of r1N1randomly chosen observations from F1, F1 being the distribution of (Xu)0/1 inP1.

Knowing max

for a given EPBR matching method gives the same kind of information

as described in Section 2.4. First, we can observe that max

and B are inversely related. B

represents the systematic dierences between the populations due to the covariates. As this

bias increases, it becomes harder to make the sub samples similar and reduce the eects of

confounding variables. It is worth noting that B and 21/22 rely on parameters unknown to

the researcher but are easily estimated from the data.

Figure 4. Approximate ratio of sample sizes r2, needed to obtain a maximum

percent reduction in bias close to 100 percent [6]

Secondly, we can see that for a fixedN , as r increases, 2(r,N) increases, which increases

max

. Simultaneously, as r increases, 1(r,N) decreases, which also increases max. This is

useful for the researcher because he or she can increase the pool from which the sub samples

are selected. As the pool increases in size, the researcher is more likely to come across values

that make the samples better matched. Figure 4 shows what the ratio of G2 to G1 would

have to be in order to attain a maximum percent reduction in bias close to 100 percent for

dierent values of the total bias B and 21/22. As we can see, for the maximum value of B

and 21/22, the pool from which the control sub sample is chosen has to be 35 times the size

of the pool from which the treatment sub sample is chosen, while for the smallest values of

B and 21/22, it would only have to be 1.1 times the size.

3.2. Choosing a matching method. Whether we have a singleX or multipleX 0s to match

on, knowing the maximum percent reduction in bias allows us to evaluate how successful a

matching method is at achieving the goal: reducing bias from systematic dierences between

the control and treatment populations. It gives an anchor to the researcher to understand

QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 13

how successful they were at limiting the confounding eects of covariates on estimating a

treatment eect. To see a concrete example, the results of a Monte Carlo simulation of the

Mahalanobis-metric matching methods percent reduction in bias of co variates X is shown

in Figure 5. Consistent with what we would have expected from the theory derived in this

paper, it is clear from the table the percent reduction in bias is the highest for low values of

bias B and 2 and high values of r.

Figure 5. Percent reduction in bias of X, Mahalanobis-metric matching,

N = 50, X normal, Monte Carlo values [8]

Monte Carlo results also help compare dierent matching methods to each other. An

example of the results of a real life simulation that compared two matching methods, dis-

criminant matching and metric matching, and the percent in bias reduced for three dierent

estimators is shown with varying ratios of the samples in Figure 6. From this table, we

can see an example of a situation in which metric matching definitely seems superior to

discriminant matching because it does a better job at reducing bias.

It is worth noting while the percent reduction in bias is definitely a prime consideration

when selecting a matching method, it is not the only one. In practice, dierent matching

methods have dierent trade os. A common matching method is mean matching, where

each sub sample is constructed so that the means in each sub sample are as similar as

possible. While this will have a high percent reduction in bias, practically, it can be hard.

Researchers usually have one shot at choosing members of their subsample, and the means of

14 NIKHITA LUTHRA

Figure 6. Percentage reduction in expected squared bias averaging over dis-

tributional conditions [7]

the sub samples are only known after individuals have been chosen. In real life, researchers

find it easier to choose pairs of subjects with similar covariates.

That leads to pair-wise matching, another common method. This is when members of

the treatment group are ordered from low to high on some covariates, and subsequently so

are the members of the control group. A pair is constructed by matching a member of the

treatment group with a member of the control group, taking the individuals with the lowest

covariate values in each group, respectively. Then another pair is constructed, with each

member having the second lowest values of each of their respective groups. This process is

repeated. The downside to this method if a researcher orders from low to high, for example,

then the members with high values for the covariates are left out of the sub samples.

As we can see from this high level discussion, there are many practical concerns for

researchers when selecting a matching method. There is an abundance of recent literature,

to which Rubin has contributed to, concerned with various classes of matching methods.

Propensity score analysis has recently received has received a particular amount of attention.

All of this being said, the percent reduction in bias and the maximum percent reduction in

bias are still the most prominent concerns in mind, since at the end of the day, the goal of

any matching method is to reduce bias.

4. CONCLUSION

In this paper, we have tackled how researchers in the social sciences produce estimators

for the eect of treatments on some outcome variable between two populations, the treated

and the controlled, which are assumed to be systematically dierent. These systematic dif-

ferences, whether it be in one X variable or many, bias the results of the treatment estimates.

As a result, it is impossible to infer whether the observed dierences in the outcome vari-

able are due to the treatment applied or these systematic dierences. Essentially, inferring

causality becomes very challenging.

QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 15

This paper summarized the process of matching, a tactic used by researchers to construct

sub samples for the control and treatment groups that are as similar as possible with respect

to the covariates. We walked through how to infer the treatment eect after matching, and

also produced a metric that evaluates the success of matching: the percent reduction in

bias. Finally, for any particular matching method, this paper derived the maximum possible

percent reduction in bias, max

.

Morally, max

is important because it represents the scope of social scientists to use

matching to reduce bias. In a sense, it almost gives a level of hope for causal inference.

Since many social sciences have struggled with identifying causality in the real world due to

research limitations, max

has a deep meaning attached to it- it gives us the potential of hope

that matching oers a chance, in a way, to save the social sciences. Perhaps now, patients

with other fatal diseases can feel as hopeful as those with Alzheimers that a treatment can

be found within our lifetime.

16 NIKHITA LUTHRA

References

1. Iacus, Stefano M; King, Gary; Porro, Giuseppe. (2011). Journal of the American Statistical Association,

106(493), 345-361.

2. Rosenbaum, Paul R; Rubin, Donald B. (1985). Constructing a Control Group Using Multivariate

Matched Sampling Methods That Incorporate the Propensity Score. The American Statistician, 39(1),

33-38.

3. Rosenbaum, Paul R; Rubin, Donald B. (1985). The Bias Due to Incomplete Matching. Biometrics, 41,

103-116.

4. Rubin, B. (1973). Matching to remove bias in observational studies. Biometrics, 29, 159-83.

5. Rubin, B. (1976). Multivariate Matching Methods That Are Equal Percent Bias Reducing, I: Some

Examples. Biometrics, 32, 109-120.

6. Rubin, Donald B. (1976). Multivariate Matching Methods That Are Equal Percent Bias Reducing, II:

Maximums on Bias Reduction for Fixed Sample Sizes. Biometrics, 32, 121-132.

7. Rubin, Donald B. (1979). Using multivariate Matched Sampling and Regression Adjustment to Control

Bias in Observational Studies. The Journal of the American Statistical Association, 74(366), 318-328.

8. Rubin, Donald B. (1980). Bias Reduction Using Mahalanobis-Metric Matching. Biometrics, 36, 293-298.