NL senior conference paper (1)

  • Published on
    10-Jan-2017

  • View
    112

  • Download
    1

Transcript

  • QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIALSCIENCES

    NIKHITA LUTHRA

    1. INTRODUCTION

    1.1. Motivation. The sixth highest killer of Americans is Alzheimers Disease. This fatal

    neurological disease has stumped researchers from wealthy countries for over two hundred

    years. A recent discovery, however, oers hope to the one-in-three senior citizens who will

    develop the disease. Surprisingly, that hope doesnt come from the white walls of a medical

    center at a well endowed research university in the States; it comes from the remote hills

    of Antioquia, a village outside Medellin, Colombia. After going through historical priests

    records at local churches, Dr. Francisco Lopera of Medellins University of Antioquia dis-

    covered that members of a particular family had recorded early onset of Alzheimers for over

    300 years. More research revealed that in this family, a person has a 50 percent chance of

    inheriting a gene, PSEN1, that guarantees early onset of Alzheimers. After the National

    Health Institute picked up word of this once in a lifetime experiment opportunity, resources

    were quickly gathered to run clinical trials on an antibody that targets the protein amyloid

    which is associated with the disease.

    The unearthing of this family oers a researchers dream: the perfect conditions to run

    a natural experiment. Because subjects and researchers didnt want to know which members

    of the family carried the fatal gene, they could randomly apply treatments in a double blind

    setting. A double blind study is when neither the subjects or the researchers know which

    groups subjects are assigned to. In addition, the fact that members of the family had mostly

    stayed in the same location with similar living situations meant that variables that dier

    based o location of residence could be controlled for.

    Most researchers are not as lucky to find the perfect conditions to infer that a treatment

    caused the prevention or cure of a disease. In the real world, treatments tested in the social

    sciences are conditionally dependent on covariates. For example, in medicine, drugs are only

    tested on sick patients. In economics, researchers test a welfare benefit on the disadvantaged.

    Without randomization, selection bias arises: are dierences in the outcome caused by the

    selection of who is in the treatment group rather than the treatment itself? This inability

    to infer causality has plagued social scientists for decades.1

  • 2 NIKHITA LUTHRA

    1.2. Overview. Luckily, statistics oers some solutions to overcome the inability to ran-

    domly apply treatments. When random assignment is missing, matching samples based on

    particular variables attempts to reduce the bias of estimates of treatment eects. Recent

    literature has focused on matching methods that attempt to reduce the bias of confounding

    variables that systematically dier between control and treatment populations. Rather than

    focusing on various matching methods, we will derive two values, and max

    , which are used

    to evaluate the success of a matching method, both for the situation when we want to reduce

    the bias of a single variable (section 2) and also when we want to reduce the bias of many

    covariates (section 3). To do this, we will first describe how to estimate the treatment eect

    on a particular outcome variable. Then, we will construct , which captures the reduction

    in bias of an estimator due to matching. Finally, we will derive max

    , the maximum

    percent reduction in bias. This value is important because it intuitively acts as an up-

    per bound on how much hope we can have in matchings ability to reduce bias, ultimately

    saving the social sciences from their inability to infer causality.

    2. MATCHING ON ONE VARIABLE

    When is it the case that adjusting for some variable X or variables X gives unbiased

    estimates of the treatment eect? This happens whenever the treatment assignment is

    strongly ignorable. If r1 is the outcome after receiving the treatment and r0 is the outcome

    after not receiving the treatment, treatment assignment is strongly ignorable when: (i) the

    responses (r1, r0) are conditionally independent of the treatment z given X and (ii) at each

    value of X, theres a positive probability of receiving each treatment [3]. These conditions

    are represented mathematically:

    Pr(r1, r0, z|X) = Pr(r1, r0|x)(Pr(x|X)) and0 < Pr(z = 1|X) < 1 for all possible X.

    Thus, the goal of matching is to construct samples that make treatment assignment as

    ignorable as possible. Figure 1 gives an overview of the process of matching.

    2.1. Estimating the treatment eect. Imagine the situation where we are interested in

    testing the eect of a particular drug on reducing cholesterol levels, represented by Y , a

    continuous dependent variable. We begin by assuming that we only have 1 variable, X, to

    match on. For example, let X be the age of a patient. We assign X = 1 for patients over 50

    years of age and X = 0 for patients under 50 years of age. We want to remove the eect of X

    on Y . Suppose we have two populations, P1 and P2, where P1 is the population of patients

    that will receive the treatment (because they have high cholesterol), and P2 is the population

    of patients that will not receive the treatment (also known as the control population). The

    distribution of the matching variable X diers in P1 and P2; patients in the treatment group

  • QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 3

    Figure 1. Summary of stages in matched sampling [3]

    tend to be older than those in the control group. Then, f1(X, Y ) 6= f2(X, Y ) where f1, f2are the joint distributions for X and Y in P1, P2, respectively.

    The remainder of this section will follow the approach of Rubin [4]. Let G1 be a random

    sample from population P1 of size N . Let G2 be a random sample from population P2 of

    size rN , r > 1. All subjects in G1 and G2 have recorded values for the random variable X.

    Let us choose a sub sample of G2 of size N using a specified matching method. Call this

    sub sample G2. Now, we want to estimate the eect of a treatment using G1, G2, both ofsize N . Note that if r = 1, G2 would be a random sample from P2, and matching would

  • 4 NIKHITA LUTHRA

    not be able to remove bias due to X. If r = 1, then infinite matches would be obtained,and all bias due to X could be removed.

    Definition 1. Define the response surface for Y in Pi

    at X = x, denoted Ri

    (x) as such:

    R

    i

    (x) = E(Y |X = x).

    In our example, R1(old) = Echolesterol level | old

    gives the expectation of the choles-

    terol level for the treatment group given that the patient is old. R2(old) = Echolesterol

    level | oldgives the expectation for the cholesterol level for the control group given that the

    patient is old. R1(young) = Echolesterol level | young

    gives the expectation for the choles-

    terol level for the treated group given that the patient is young. R2(young) = Echolesterol

    level | younggives the expectation for the cholesterol level for the control group given that

    the patient is young.

    Definition 2. The eect of the treatment at X = x is: R1(x)R2(x).

    Following our example, R1(old) R2(old) reveals the eect of the treatment amongthe old patients. It is the expected cholesterol levels for old people who received the treat-

    ment minus the expected cholesterol levels for old people who didnt receive the treatment.

    R1(young) R2(young) reveals the eect of the treatment among the young patients. Itis the expected cholesterol level for young people who received the treatment minus the

    expected cholesterol level for young people who didnt receive the treatment.

    There are two possible cases for the eect of the treatment. The eect of the treatment

    can be constant or it can vary with x. These two cases, also referred to as parallel and

    non parallel response surfaces, are defined.

    Definition 3. If R1(x)R2(x) is constant and independent of X, we call R1(x) and R2(x)parallel response surfaces. In this case, the goal is to estimate this constant dierence.

    Parallel response surfaces are depicted in Figure 2.

    Definition 4. If R1(x) R2(x) is not constant across all values of X, we call R1(x) andR2(x) non parallel response surfaces. In this case, the goal is to estimate the average

    dierence between R1(x) and R2(x) across all x. Non parallel response surfaces are depicted

    in Figure 3.

    Definition 5. In both cases, we are interested in estimating the treatment eect among

    the control and treated populations, , which is equal to the expected dierence in the

    response surface: = E1R1(x)R2(x)

    .

    Allow y1j, x1j to represent the values of Y , X for the jth subject in G1 and y2j, x2jto represent the values of Y , X for the jth subject in G2, where j = 1...N . Then yij =

  • QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 5

    Figure 2. Parallel uni variate response surfaces [4]

    Figure 3. Nonparallel uni variate response surfaces [4]

    R

    i

    (xij

    )+ eij

    , i = 1, 2; j = 1...N . Ec

    (eij

    ) = 0. Ec

    = the conditional expectation given the xij

    .

    We can use this notation to now express an estimator for the treatment eect that is based

    o of data we can actually collect from our sub samples.

  • 6 NIKHITA LUTHRA

    Definition 6. The estimator for the treatment eect is the average dierence be-

    tween the non-parallel response surfaces (or a constant dierence if the response surfaces are

    parallel):

    0 =1

    N

    X(Y1i)

    1

    N

    X(Y2i) = y1. y2.

    This estimator takes in the data after running the study and outputs an estimate which

    is a numerical value that estimates the eect of the drug on cholestoral levels. That numerical

    value is known as the estimate. Note that the estimator is a function while the estimate is

    a number.

    2.2. Bias of estimator. Now that we have an estimator for the treatment eect, we need

    a way to assess whether the estimator with matching is better than the estimator without

    matching. Essentially, we want to estimate how much matching can reduce the bias of an

    estimator of the treatment eect. Let E be the expectation over the distribution of X in

    the matched samples. Let E2 be the expectation over the distribution of X in the matchedG2 sub samples.

    Theorem 1. Using the definition of bias, the expected bias of 0 over matched sam-

    pling is EEc

    (0 ) = E1R2(x) E2 R2(x).

    Proof. Although the proof was not in Rubins original 1973 paper, is quite easy to derive.

    Using the above definitions of 0 and ,

    EE

    c

    (0 ) = EEch(y1. y2.) E1

    R1(x)R2(x)

    i.

    Since expectations add,

    EE

    c

    (0 ) = EEc(y1.) EEc(y2.) EEcE1R1(x)

    + EE

    c

    E1

    R2(x)

    .

    We know EEc

    (y1.) = E1R1(x) and EEc(y2.) = E2 R2(x) [4], so we can rewrite andsimplify:

    EE

    c

    (0 ) = E1R1(x) E2 R2(x) E1R1(x) + E1R2(x)

    = E1R2(x) E2 R2(x).

    If the distribution of X in the G2 is the same as that in the random sample G1, thenE1R2(x) = E2 R2(x) and 0 has 0 expected bias. If r = 1 (or in other words if G2 is arandom sample from P2), then the expected bias of = E1R2(x) E2R2(x) where E2 =expectation over the distribution of X in P2. is the estimator of the treatment eect for

    the unmatched samples and E1R2(x) E2R2(x) is the bias of that estimator.

  • QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 7

    2.3. Measuring reduction in bias due to matching. Now we wish to determine how

    much less biased the 0 based on matched sampling is compared to the based on ran-

    dom sampling. We will use the percent reduction in expected bias to measure this. It

    is essentially the expected bias for matched sampling over the expected bias for random

    sampling:

    1001 E1R2(x) E2 R2(x)

    E1R2(x) E2R2(x)

    .

    The numerator, E1R2(x) E2 R2(x), represents the expected bias from matched sam-pling and the denominator, E1R2(x) E2R2(x), represents the expected bias from randomsampling. The terms that dier are E2R2(x) and E2 R2(x). Multipling by a commondenominator and simplifying yields the expression:

    100E2 R2(x) E2R2(x)E1R2(x) E2R2(x)

    We can see from this equation that the percent reduction in bias depends only on the

    distribution of X in P1 and P2 and G2 and the response surface in P2. We assume thatthe response surface in P2 is linear, or can be estimated by a linear regression: R2(x) =

    2 + 2(x 2) where 2 = mean of Y in P2, i = mean of X in Pi, and 2 = regressioncoecient of Y on X in P2. We can use this to rewrite E1R2(x) E2R2(x) = 2(1 2)and E2 R2(x)E2R2(x) = 2(2 2) where 2 = E2 (X) in G2. Substituting in thesevalues and this derivation now gives the following theorem:

    Theorem 2. If G1 is a random sample and the response surface in P2 is linear, or can be

    estimated by a linear approximation, the percent reduction in bias due to matched

    sampling is:

    = 100

    2 21 2

    !

    .

    This result allows us to measure the amount a matching method can reduce bias.

    2.4. Finding the maximum possible bias reduction. Various matching methods will

    yield dierent s. In addition to being able to compare the s of dierent matching methods

    to each other, we also want an idea for how good a matching method is on its own. In other

    words, it can be costly to apply many dierent matching methods to see which one has the

    greatest percent reduction in bias. In real life, a researcher might just pick a single matching

    method, but without trying other methods, wants see how successful or unsuccessful the

    matching is.

    This is why it is crucial to be able to calculate the maximum possible percent reduction

    in bias due to matched sampling. If we can find an upper bound on how much we can

    decrease the bias by, then it is much easier to compare a single matching method to that

  • 8 NIKHITA LUTHRA

    upper bound rather than repeating the study many times with dierent matching methods.

    To get an expression for the maximum percent reduction in bias, we first propose a lemma

    that is not proved here but can be found in Rubins work [4].

    Lemma 3. We assume that in population Pi

    , X has mean i

    , var 2i

    and that Xii

    follows

    f

    i

    , i = 1, 2. The initial bias in X is:

    B =1 2q

    21+

    22

    2

    > 0.

    This makes sense intuitively; in our example, the bias in the age is the dierence in

    the mean age between the treatment and control population over the spread of the age in

    both populations. From this we can see that if 21 = 22, then the bias is just the number of

    standard deviations between the means of X in each of the populations. We are now ready

    to present the maximum percent reduction in bias and its proof for the case when we are

    matching on one X variable.

    Theorem 4.

    max

    = 1002(r,N)

    B

    q1 +

    21

    222

    where 2(r,N) = expected value of the average of the N largest observations from a sample

    of size rN from f2. This sample could be the G2 sample we selected before constructing G2.

    Proof. We have commented and added to following proof, which is adapted from Rubins

    version [4]. Earlier, we assumed that 1 > 2. This happens to be consistent with our

    example: that the average age in the treated population is higher than the average age in

    the control population, since cholesterol is positively correlated with age. Then, is the

    largest whenever the average age of the control subsample, 2 = E(x2.), is the greatest,which happens when we pick the oldest N subjects from G2 as making up the matched

    subsample, G2. Intuitively, this means that matching reduces the bias of age dierencesbetween populations the most whenever the control sub sample has patients who are as close

    in age to the sample of the treated patients.

    The expected value of the N largest values from the G2 sample of size rN is: 2 +

    2(r,N). Since the maximum reduction in bias is dependent on how large 2 is, and 2smaximum depends on 2(r,N), the maximum percent reduction in bias is the ratio of this

    value over the true dierences in the x variable between the populations. The maximum

    value of is:

    max

    = 10022(r,N)

    1 2.

  • QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 9

    Using the lemma from above, we can algebraically manipulate this result to get max

    in

    terms of B:

    max

    = 1002(r,N)

    B

    q1 +

    21

    222

    This result is important because for a particular matching method, we can now compare

    to the min(100, max

    ). That tells us how well a matching method obtains a G2 that hasan expected average of X that is close to that average in G1. If max is small, there is no

    matching method that does this. If max

    is large, most matching methods should perform

    well. The special case where we can find parameters for max

    such that max

    100 impliesan existence of a matching method that obtains a 100 percent reduction in expected bias.

    It is worth noting that max

    is positively related to r,N and negatively related to B, 21

    22,

    holding other variables constant. If a researcher wants to increase the max

    , then he or she

    can adjust r and N .

    Now that we have derived this important metric by which to measure the eectiveness

    of a matching method on a single X, it is natural to apply the same process on multiple

    covariates. Following our example, there might indeed be bias in the estimator for the treat-

    ment eect not just due to dierences in age between the control and treatment population,

    but also due to systematic dierences in other variables including weight, genetic history,

    lifestyle choices, etc.

    3. MATCHING ON MULTIPLE COVARIATES

    Now, the objective is to estimate the eect of a binary treatment variable on many

    dependent variables. The population can still be split into those who receive the treatment

    and those who do not. We will refer to P1 as the population of those given the treatment,

    and P2 as the population of those not given the treatment. The challenge is the same as

    it was with one X variable: the treatment assignment is not random. We will solve this in

    the same way as before: by finding samples from P1 and P2 in which the distribution of X

    are almost the same. X is a vector that includes p matching variables (before p = 1). For

    example, if we are estimating the eect of a drug on reducing cholesterol levels, X might be

    a vector consisting of age, weight, and average amount of hours spent exercising in a week.

    We will assume for simplicitys sake that all elements of X are not categorical. (So now, age

    is no longer a 1 for old and a 0 for young, but is a number).

    The process for constructing sub samples is similar to before. The approach of this

    section will follow Rubins 1976 paper [5]. First, choose random samples G1 and G2 of size

    N1 and N2 from P1 and P2 respectively, where N1 N2. Then record p matching variablesfor all individuals in G1 and G2. Using some matching method, find matched sub samples

  • 10 NIKHITA LUTHRA

    G1 and G2 of sizes N1 and N2, where G1 is chosen from G1 and G2 is chosen from G2[5].

    One dierence that now arises in constructing the matched sub samples is that we want

    to make sure that by matching samples to minimize the dierences in age, for example,

    between the treated and control group, we dont increase the dierences in some other

    variable, such as amount of hours spent exercising. Whatever matching method we use to

    construct the subsamples must thus have a very special property: it should be equal percent

    bias reducing (EPBR). The meaning of EPBR and the conditions under which a matching

    method is EPBR is presented in the theorem below, summarizing Rubins discussion [5]:

    Theorem 5. If X is the vector of covariates, then let u1 be the finite mean vector for P1,

    and u2 be the finite mean vector for P2. For example, u1 consists of the mean age, weight,

    and average weekly exercise for the treatment population, and u2 consists of the mean age,

    weight, and average weekly exercise for the control population. The true values for these

    means are unknown.

    Let ui

    be the expected mean vector of X in the sub samples Gi

    for i = 1, 2. Thesevectors can be obtained by matching: given (i) fixed samples of sizes N1, N2, (ii) fixed

    distributions of X in both P1 and P2, and (iii) a fixed matched method for obtaining sub

    samples, repeating the process of randomly sampling and matching will result in the average

    of the mean vectors of the matched sub samples converging to u1 and u2.We consider a matching method EPBR for X if (u1 u2) = (u1u2) where is a

    constant. The interpretation of this is that the percent reduction in the biases of each of the

    p matching variables is the same. If a matching method is not EPBR, then certain linear

    functions of x increase the bias [2].

    Why do we care about selecting a matching method that is EPBR? Looking at the

    equation, (u1u2) = (u1u2), the left hand side represents the average mean imbalanceof the covariates in the sub samples and the right hand side represents the average mean

    imbalance of the covariates in the populations. Directly stated, the EPBR property implies

    that improving balance in the dierence in means on one variable also improves it on all

    others (and their linear combinations) by a proportional amount [1]. These matching rules

    are the easiest to evaluate when the dependent variables can be any linear combinations

    of the covariates, since there is only one particular percent reduction in bias of interest.

    Rosenbaum and Rubin overviewed some main EPBR methods and their technicalities can

    be found in their paper [2].

    3.1. Percent reduction in bias with multiple covariates. Now that we have defined

    what it means for a matching method to be EPBR, we are naturally interested in evaluating

    how much matching has reduced the bias due to covariates in evaluating a treatment eect.

  • QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 11

    Section 3.2 follows the approach of Rubin [6]. We will now define the percent reduction in

    bias, which is how we evaluate dierent EPBR matching methods:

    Definition 7. Percent reduction in bias for matching on multiple covariates:

    = 100[1 (u1 u2)0

    (u1 u2)0

    for any vector .

    will dier based on the matching method, the distributions of X in the control and

    treatment population, the sizes of the random samples and also of the sub samples. This

    naturally leads us to the final result of this paper: the maximum percent reduction in bias

    matching on multiple covariates using an EPBR method. Similar to the case with only one

    X variable, the best case scenario of a given EPBR matching method is the min(100, max

    ).

    The following theorem will define max

    . The proof has been omitted because while the

    algebra is untidy, the intuition is the same as the case when matching on one variable that

    was presented in section 2. Essentially, the maximum percent reduction in bias is when

    (i) the members of the randomly selected treatment sample G1 with the smallest expected

    values of the covariates are chosen for the treatment sub sample G2 (ii) the members ofthe randomly selected control sample G2 with the largest expected values of the co variates

    are chosen for the control sub sample G1. This minimizes the dierences between the two

    sub samples. Similar to the situation when matching on one X, the proof also ends with

    a substitution of B, the bias formula. If the reader wishes to see a formal proof, it can be

    found in [6].

    Theorem 6. Maximum percent reduction in bias

    Given (a) fixed distributions of X in P1 and P2 with mean vectors u1 and u2 and

    covariance matricesP

    1 andP

    2, (b) fixed sample sizes of G1 and G2, N1 = r1N1 andN2 = r2N2, r1 1, r2 > 1, and (c) fixed sizes of G1 and G2, N1 and N2, the maximumpercent reduction in bias for any matching method that is EPBR for X is:

    max

    =100

    B

    p(1 + 21/

    22)/2

    "+2 (r2, N2)

    1

    21 (r1, N1)

    #,

    where:

    2i

    = P

    i

    0, the variance of the best linear discriminant with respect to the P2inner product in P

    i

    , = (u1 u2)P1

    2 ,

    B = (12)/p

    (21 + 22)/2, the number of standard deviations between the means

    of X0 in P1 and P2, i = ui0,

    +2 (r2, N2) = the expectation of the sample average of the N2 largest of the r2N2randomly chosen observations from F2, where F2 is the distribution of X

    0 in P2

  • 12 NIKHITA LUTHRA

    normed to have zero mean and unit variance, i.e., the distribution of (X u2)0/2in P2, and

    1 (r1, N1) = the expectation of the sample averages of the N1 smallest of r1N1randomly chosen observations from F1, F1 being the distribution of (Xu)0/1 inP1.

    Knowing max

    for a given EPBR matching method gives the same kind of information

    as described in Section 2.4. First, we can observe that max

    and B are inversely related. B

    represents the systematic dierences between the populations due to the covariates. As this

    bias increases, it becomes harder to make the sub samples similar and reduce the eects of

    confounding variables. It is worth noting that B and 21/22 rely on parameters unknown to

    the researcher but are easily estimated from the data.

    Figure 4. Approximate ratio of sample sizes r2, needed to obtain a maximum

    percent reduction in bias close to 100 percent [6]

    Secondly, we can see that for a fixedN , as r increases, 2(r,N) increases, which increases

    max

    . Simultaneously, as r increases, 1(r,N) decreases, which also increases max. This is

    useful for the researcher because he or she can increase the pool from which the sub samples

    are selected. As the pool increases in size, the researcher is more likely to come across values

    that make the samples better matched. Figure 4 shows what the ratio of G2 to G1 would

    have to be in order to attain a maximum percent reduction in bias close to 100 percent for

    dierent values of the total bias B and 21/22. As we can see, for the maximum value of B

    and 21/22, the pool from which the control sub sample is chosen has to be 35 times the size

    of the pool from which the treatment sub sample is chosen, while for the smallest values of

    B and 21/22, it would only have to be 1.1 times the size.

    3.2. Choosing a matching method. Whether we have a singleX or multipleX 0s to match

    on, knowing the maximum percent reduction in bias allows us to evaluate how successful a

    matching method is at achieving the goal: reducing bias from systematic dierences between

    the control and treatment populations. It gives an anchor to the researcher to understand

  • QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 13

    how successful they were at limiting the confounding eects of covariates on estimating a

    treatment eect. To see a concrete example, the results of a Monte Carlo simulation of the

    Mahalanobis-metric matching methods percent reduction in bias of co variates X is shown

    in Figure 5. Consistent with what we would have expected from the theory derived in this

    paper, it is clear from the table the percent reduction in bias is the highest for low values of

    bias B and 2 and high values of r.

    Figure 5. Percent reduction in bias of X, Mahalanobis-metric matching,

    N = 50, X normal, Monte Carlo values [8]

    Monte Carlo results also help compare dierent matching methods to each other. An

    example of the results of a real life simulation that compared two matching methods, dis-

    criminant matching and metric matching, and the percent in bias reduced for three dierent

    estimators is shown with varying ratios of the samples in Figure 6. From this table, we

    can see an example of a situation in which metric matching definitely seems superior to

    discriminant matching because it does a better job at reducing bias.

    It is worth noting while the percent reduction in bias is definitely a prime consideration

    when selecting a matching method, it is not the only one. In practice, dierent matching

    methods have dierent trade os. A common matching method is mean matching, where

    each sub sample is constructed so that the means in each sub sample are as similar as

    possible. While this will have a high percent reduction in bias, practically, it can be hard.

    Researchers usually have one shot at choosing members of their subsample, and the means of

  • 14 NIKHITA LUTHRA

    Figure 6. Percentage reduction in expected squared bias averaging over dis-

    tributional conditions [7]

    the sub samples are only known after individuals have been chosen. In real life, researchers

    find it easier to choose pairs of subjects with similar covariates.

    That leads to pair-wise matching, another common method. This is when members of

    the treatment group are ordered from low to high on some covariates, and subsequently so

    are the members of the control group. A pair is constructed by matching a member of the

    treatment group with a member of the control group, taking the individuals with the lowest

    covariate values in each group, respectively. Then another pair is constructed, with each

    member having the second lowest values of each of their respective groups. This process is

    repeated. The downside to this method if a researcher orders from low to high, for example,

    then the members with high values for the covariates are left out of the sub samples.

    As we can see from this high level discussion, there are many practical concerns for

    researchers when selecting a matching method. There is an abundance of recent literature,

    to which Rubin has contributed to, concerned with various classes of matching methods.

    Propensity score analysis has recently received has received a particular amount of attention.

    All of this being said, the percent reduction in bias and the maximum percent reduction in

    bias are still the most prominent concerns in mind, since at the end of the day, the goal of

    any matching method is to reduce bias.

    4. CONCLUSION

    In this paper, we have tackled how researchers in the social sciences produce estimators

    for the eect of treatments on some outcome variable between two populations, the treated

    and the controlled, which are assumed to be systematically dierent. These systematic dif-

    ferences, whether it be in one X variable or many, bias the results of the treatment estimates.

    As a result, it is impossible to infer whether the observed dierences in the outcome vari-

    able are due to the treatment applied or these systematic dierences. Essentially, inferring

    causality becomes very challenging.

  • QUANTIFYING THE HOPE FOR REDUCING BIAS IN THE SOCIAL SCIENCES 15

    This paper summarized the process of matching, a tactic used by researchers to construct

    sub samples for the control and treatment groups that are as similar as possible with respect

    to the covariates. We walked through how to infer the treatment eect after matching, and

    also produced a metric that evaluates the success of matching: the percent reduction in

    bias. Finally, for any particular matching method, this paper derived the maximum possible

    percent reduction in bias, max

    .

    Morally, max

    is important because it represents the scope of social scientists to use

    matching to reduce bias. In a sense, it almost gives a level of hope for causal inference.

    Since many social sciences have struggled with identifying causality in the real world due to

    research limitations, max

    has a deep meaning attached to it- it gives us the potential of hope

    that matching oers a chance, in a way, to save the social sciences. Perhaps now, patients

    with other fatal diseases can feel as hopeful as those with Alzheimers that a treatment can

    be found within our lifetime.

  • 16 NIKHITA LUTHRA

    References

    1. Iacus, Stefano M; King, Gary; Porro, Giuseppe. (2011). Journal of the American Statistical Association,

    106(493), 345-361.

    2. Rosenbaum, Paul R; Rubin, Donald B. (1985). Constructing a Control Group Using Multivariate

    Matched Sampling Methods That Incorporate the Propensity Score. The American Statistician, 39(1),

    33-38.

    3. Rosenbaum, Paul R; Rubin, Donald B. (1985). The Bias Due to Incomplete Matching. Biometrics, 41,

    103-116.

    4. Rubin, B. (1973). Matching to remove bias in observational studies. Biometrics, 29, 159-83.

    5. Rubin, B. (1976). Multivariate Matching Methods That Are Equal Percent Bias Reducing, I: Some

    Examples. Biometrics, 32, 109-120.

    6. Rubin, Donald B. (1976). Multivariate Matching Methods That Are Equal Percent Bias Reducing, II:

    Maximums on Bias Reduction for Fixed Sample Sizes. Biometrics, 32, 121-132.

    7. Rubin, Donald B. (1979). Using multivariate Matched Sampling and Regression Adjustment to Control

    Bias in Observational Studies. The Journal of the American Statistical Association, 74(366), 318-328.

    8. Rubin, Donald B. (1980). Bias Reduction Using Mahalanobis-Metric Matching. Biometrics, 36, 293-298.