A information to avoiding the frequent pitfalls of occasion research
Occasion research are helpful instruments within the context of causal inference. They’re utilized in quasi-experimental conditions. In these conditions, the therapy shouldn’t be randomly assigned. Thus, in distinction to randomized experiments (i.e., A/B checks), one can’t depend on a easy comparability of the means between teams to make causal inferences. In these kinds of conditions, occasion research are very helpful.
Occasion research are additionally often used to see if there are any pre-treatment variations between the handled and nontreated teams as a option to pretest parallel traits, a vital assumption of a preferred causal inference methodology known as difference-in-differences (DiD).
Nevertheless, current literature illustrates quite a lot of pitfalls in occasion research. If ignored, these pitfalls can have vital penalties when utilizing occasion research for causal inference or as a pretest for parallel traits.
On this article, I’ll talk about these pitfalls and proposals on the best way to keep away from them. I’ll concentrate on the functions within the context of panel information the place I observe models over time. I’ll use a toy instance as an instance the pitfalls and proposals. You could find the complete code used to simulate and analyze the info right here. On this article, I restrict using code to probably the most essential components to keep away from cluttering.
An Illustrative Instance
Occasion research are generally used to analyze the affect of an occasion reminiscent of a brand new regulation in a rustic. A current instance of such an occasion is the implementation of lockdowns because of the pandemic. Within the case of the lockdowns, many companies received affected as a result of folks began spending extra time at dwelling. For instance, a music streaming platform could need to know whether or not folks’s music consumption patterns have modified as a consequence of lockdowns in order that they will deal with these adjustments and serve their clients higher.
A researcher working for this platform can examine whether or not the quantity of music consumed has modified after the lockdown. The researcher may use the nations that by no means imposed a lockdown or imposed a lockdown later as management teams. An occasion research can be acceptable on this scenario. Assume for this text that the nations that impose a lockdown keep so till the top of our remark interval and the implementation of the lockdown is binary (i.e., ignore that the strictness of the lockdown can fluctuate).
Occasion Research Specification
I’ll concentrate on occasion research within the form of:
Yᵢₜ is the result of curiosity. αᵢ is the unit-fixed results and it controls for time-constant unit traits. γₜ is the time-fixed results and it controls for time traits or seasonality. l is the time relative to the therapy and it signifies what number of intervals it has been because the therapy at a given time t. For instance, l = -1 signifies that it’s one interval earlier than the therapy, l = 2 signifies that it’s two intervals after the therapy. Dˡᵢₜ is the therapy dummy for the relative time interval l at time t for unit i. Principally, we embrace each the leads and lags of the therapy. ϵᵢₜ is the random error.
The coefficient of curiosity βₗ signifies the common therapy impact in a given relative time interval l. Within the remark interval, there are T intervals, thus, the intervals vary from 0 to T-1. The models get handled at totally different intervals. Every group of models which are handled on the identical time composes a therapy cohort. This kind of occasion research is a difference-in-differences (DiD) design through which models obtain the therapy at totally different closing dates (Borusyak et al. 2021)
Illustrative instance continued:
According to our illustrative instance, I simulate a panel dataset. On this dataset, there are 10,000 clients (or models) and 5 intervals (from interval 0 to 4). I pattern unit- and time-fixed results at random for these models and intervals, respectively. General, we now have 50,000 (10,000 models x 5 intervals) observations on the customer-period degree. The result of curiosity is the music consumption measured in hours.
I randomly assign the purchasers to three totally different nations. Considered one of these nations imposed a lockdown in interval 2, the opposite in interval 3, and one by no means imposed a lockdown. Thus, clients from these totally different nations are handled at totally different instances. To make it simple to comply with, I’ll consult with the purchasers by their therapy cohorts relying on once they have been handled: cohort interval 2 and cohort interval 3 for purchasers handled in intervals 2 and three, respectively. One of many cohorts isn’t handled and, thus, I consult with them as cohort interval 99 for the convenience of coding.
Within the simulation, after these clients are randomly assigned to one in all these cohorts, I create the therapy dummy variable deal with
which equals 1 if cohort_period >= interval
, 0 in any other case. deal with
signifies whether or not a unit is handled in a given interval. Subsequent, I create a dynamic therapy impact that grows in every handled interval (e.g., 1 hour within the interval the place therapy occurs and a pair of hours within the interval after that). Therapy results are zero for pre-treatment intervals.
I calculate the result of curiosity hrs_listened
because the sum of a continuing that I randomly selected (80), unit- and time-fixed results, the therapy impact, and error (random noise) for every unit and interval. By building, the therapy (lockdowns) has a rising optimistic affect on music consumption.
I skip a number of the setup and simulation components of the code to keep away from cluttering however you will discover the complete code right here.
Within the following picture, I present a snapshot of the info. unit
refers to clients, cohort_period
refers to when a unit was handled. hrs_listened
is the dependent variable and it measures the music consumption in a given interval in hours for a given buyer.
rm(checklist = ls())
library(information.desk)
library(fastDummies)
library(tidyverse)
library(ggthemes)
library(fixest)
library(kableExtra)information <- make_data(...)
kable(head(information[, ..select_cols]), 'easy')
Within the following picture, I illustrate the traits within the common music listening by cohort and interval. I additionally mark when the nations have imposed lockdowns for the primary time. You possibly can see that there appears to be a optimistic affect of the lockdowns for each the earlier- and later-treated nations in comparison with the purchasers from the untreated cohort.
# Graph common music listening by cohort and interval
avg_dv_period <- information[, .(mean_hrs_listened = mean(hrs_listened)), by = c('cohort_period','period')]
ggplot(avg_dv_period, aes(fill=issue(cohort_period), y=mean_hrs_listened, x=interval)) +
geom_bar(place="dodge", stat="identification") + coord_cartesian(ylim=c(79,85))+
labs(x = "Interval", y = "Hours", title = 'Common music listening (hours)',
caption = 'Cohort 2 is the early handled, cohort 3 is the late handled and cohort 99 is the by no means handled group.') +
theme(legend.place = 'backside',
axis.title = element_text(measurement = 14),
axis.textual content = element_text(measurement = 12)) + scale_fill_manual(values=cbPalette) +
geom_vline(xintercept = 1.5, shade = '#999999', lty = 5)+
geom_vline(xintercept = 2.5, shade = '#E69F00', lty = 5) +
geom_text(label = 'Cohort interval 2 is handled',aes(1.4,83), shade = '#999999', angle = 90)+
geom_text(label = 'Cohort interval 3 is handled',aes(2.4,83), shade = '#E69F00', angle = 90) +
guides(fill=guide_legend(title="Therapy cohort interval"))
Since this dataset is simulated, I do know the true therapy impact of lockdowns for every cohort and every interval. Within the following graph, I current the true therapy impact of the lockdowns.
Within the first interval after the therapy (relative interval 1), each cohorts enhance their listening by 1 hour. Within the second interval relative to the therapy, the therapy impact is 2 hours for each cohorts. For the relative interval 3, we see that the therapy impact is 3 hours.
One factor to note right here is that the therapy impact is homogenous throughout cohorts over relative intervals (e.g., 1 hrs in relative interval 1; 2 hrs in relative interval 2). Later, we are going to see what occurs if this isn’t the case.
# Graph the true therapy results
avg_treat_period <- information[treat == 1, .(mean_treat_effect = mean(tau_cum)), by = c('cohort_period','period')]
ggplot(avg_treat_period, aes(fill=issue(cohort_period), y=mean_treat_effect, x=interval)) +
geom_bar(place="dodge", stat="identification") +
labs(x = "Interval", y = "Hours", title = 'True therapy impact (hrs)',
caption = 'Cohort 2 is the early handled, cohort 3 is the late handled and cohort 99 is the by no means handled group.') +
theme(legend.place = 'backside',
axis.title = element_text(measurement = 14),
axis.textual content = element_text(measurement = 12)) + scale_fill_manual(values=cbPalette) +
guides(fill=guide_legend(title="Therapy cohort interval"))
Now, we do an occasion research by regressing the hrs_listened
on relative interval dummies. The relative interval is the distinction between interval
and cohort_period
. The adverse relative intervals point out the intervals earlier than the therapy and the optimistic ones point out the intervals after the therapy. We use unit fixed-effects (αᵢ) and interval fixed-effects (γₜ) for all of the occasion research regressions.
Within the following desk, I report the outcomes of this occasion research. Unsurprisingly, there aren’t any results detected pre-treatment. Put up-treatment results are exactly and accurately estimated as 1, 2, and three hours. So every thing works thus far! Let’s see conditions the place issues don’t work as nicely…
# Create relative time dummies to make use of within the regression
information <- information %>%
# make relative yr indicator
mutate(rel_period = ifelse(cohort_period == 99,99,interval - cohort_period))
abstract(information$rel_period)information <- information %>%
dummy_cols(select_columns = "rel_period")
rel_per_dummies <- colnames(information)[grepl('rel_period_', colnames(data))]
# Change identify w/ minuses to deal with them extra simply
rel_per_dummies_new<-gsub('-','min', rel_per_dummies)
setnames(information, rel_per_dummies, rel_per_dummies_new)
# Occasion research
covs <- setdiff(rel_per_dummies_new, c('rel_period_99','rel_period_min1'))
covs_collapse <- paste0(covs, collapse='+')
system <- as.system(paste0('hrs_listened ~ ',covs_collapse))
mannequin <- feols(system,
information = information, panel.id = "unit",
fixef = c("unit", "interval"))
abstract(mannequin)
All the things labored nicely thus far however listed below are the highest 4 issues to watch out of to keep away from the potential pitfalls when utilizing the occasion research strategy:
1. No anticipation assumption
Many functions of occasion research within the literature impose a no-anticipation assumption. No anticipation assumption implies that handled models don’t change their habits in expectation of the therapy earlier than the therapy. When the no-anticipation assumption holds, one can use the interval earlier than the occasion as (one in all) the reference interval(s) and evaluate different intervals to this era.
Nevertheless, no anticipation assumption may not maintain in some instances, e.g., when the therapy is introduced to the panel earlier than the therapy is imposed and the models can reply to the announcement by adjusting their habits. On this case, one wants to decide on the reference intervals rigorously to keep away from bias. You probably have an thought of when the topics begin to anticipate the therapy and alter their habits you need to use that interval because the de facto starting of the therapy and use the interval(s) earlier than that because the reference interval (Borusyak et al. 2021).
For instance, in the event you suspect that the topics change their habits in l = -1 (one interval earlier than the therapy) as a result of they anticipate the therapy you need to use l = -2 (two intervals earlier than the therapy) as your reference interval. You are able to do this by dropping the Dˡᵢₜ the place l = -2 from the equation as a substitute of dropping the dummy for l = -2. This fashion you utilize the l = -2 interval because the reference interval. To verify whether or not your hunch on models altering their habits in l = -1 is true, you may verify if the estimated therapy impact in l = -1 is statistically vital.
Illustrative instance continued:
Going again to our illustrative instance, lockdowns are often introduced a bit earlier than the imposition of the lockdown, which could have an effect on the models’ pre-treatment habits. For instance, folks may already begin working from dwelling as soon as the lockdown is introduced however not but imposed.
Because of this, folks can change their music-listening habits even earlier than the precise implementation of the lockdown. If the lockdown is introduced 1 interval earlier than the precise implementation one can use the relative interval = -2 because the reference interval by dropping the dummy for the relative interval -1 from the specification.
According to this instance, I copy and modify the unique information to introduce some anticipation results. I introduce a 0.5 hrs enhance within the hours listened to all models in relative interval -1. I name this new dataset with anticipation data_anticip
.
The following graph reveals the common music listening time over relative intervals. It’s simple to note that the listening time already begins to choose up within the relative interval -1 in comparison with the relative intervals -2 and -3. Ignoring this vital change within the listening time can create deceptive outcomes.
# Summarize the hours listened over relative interval (excluding the untreated cohort)
avg_dep_anticip <- data_anticip[rel_period != 99, .(mean_hrs_listened = mean(hrs_listened)), (rel_period)]
setorder(avg_dep_anticip, 'rel_period')rel_periods <- type(distinctive(avg_dep_anticip$rel_period))
ggplot(avg_dep_anticip, aes(y=mean_hrs_listened, x=rel_period)) +
geom_bar(place="dodge", stat="identification", fill = 'deepskyblue') + coord_cartesian(ylim=c(79,85))+
labs(x = "Relative interval", y = "Hours", title = 'Common music listening over relative time interval',
caption = 'Just for the handled models') +
theme(legend.place = 'backside',
legend.title = element_blank(),
axis.title = element_text(measurement = 14),
axis.textual content = element_text(measurement = 12)) + scale_x_continuous(breaks = min(rel_periods):max(rel_periods))
Now, let’s do an occasion research as we did earlier than by regressing the hours listened on the relative time interval dummies. Take into account that the one factor I modified is the impact within the relative interval -1 and the remainder of the info is precisely the identical as earlier than.
You possibly can see within the following desk that the pre-treatment results are adverse and vital though there aren’t any actual therapy results in these intervals. The reason being that we use the relative interval -1 because the reference interval and this messes up all of the impact estimations. What we have to do is to make use of a interval the place there is no such thing as a anticipation because the reference interval.
system <- as.system(paste0('hrs_listened ~ ',covs_collapse))
mannequin <- feols(system,
information = data_anticip, panel.id = "unit",
fixef = c("unit", "interval"))
abstract(mannequin)
Within the following desk, I report the occasion research outcomes from the brand new regression the place I exploit relative interval -2 because the reference interval. Now, we now have the proper estimates! There isn’t a impact detected within the relative interval -3, although an impact is accurately detected for the relative interval -1. Moreover, the impact sizes for the post-treatment intervals at the moment are accurately estimated.
# Use launch interval -2 because the reference interval as a substitute
covs_anticip <- setdiff(c(covs,'rel_period_min1'),'rel_period_min2')
covs_anticip_collapse <- paste0(covs_anticip,collapse = '+')system <- as.system(paste0('hrs_listened ~ ',covs_anticip_collapse))
mannequin <- feols(system,
information = data_anticip, panel.id = "unit",
fixef = c("unit", "interval"))
abstract(mannequin)
2. Assumption of homogenous therapy results throughout cohorts
Within the equation proven earlier than, the therapy impact can solely fluctuate by the relative time interval. The implicit assumption right here is that these therapy results are homogenous throughout therapy cohorts. Nevertheless, if this implicit assumption is flawed the estimated therapy results might be considerably totally different than the precise therapy impact inflicting bias (Borusyak et al. 2021). An instance scenario could possibly be the place earlier cohorts profit extra from the therapy in comparison with the later handled teams. Which means the therapy results throughout cohorts differ.
The best answer to deal with this challenge is to enable for heterogeneity. To permit for the therapy impact heterogeneity between cohorts, one can estimate relative time and cohort-specific therapy results, as seen within the following specification. Within the following specification, c stands for the therapy cohort. Right here, every thing is identical because the earlier specification besides that the therapy results are going to be estimated for every relative time & treatment-cohort mixture with the estimator for βₗ,c. Dᵢᶜ stands for the therapy cohort dummy for a given unit i.
Illustrative instance continued:
Within the lockdown instance, it is perhaps that the impact of lockdowns is totally different throughout handled nations for various causes (e.g., perhaps in one of many nations, individuals are extra prone to adjust to the brand new regulation). Thus, one ought to estimate the nation and relative time-specific therapy results as a substitute of merely estimating the relative time-specific therapy impact.
Within the authentic simulated dataset, I introduce cohort heterogeneity in therapy results throughout intervals and name this new dataset data_hetero
. The therapy impact for cohort interval 2 is 1.5 instances greater than the cohort interval 3 throughout all handled intervals as illustrated within the subsequent graph.
Now, as we did earlier than, let’s run an occasion research for the data_hetero
. The outcomes of this occasion research are reported within the following desk. Despite the fact that there aren’t any therapy or anticipation results within the pre-treatment intervals, the occasion research detects statistically vital results! It is because we don’t account for the heterogeneity throughout cohorts.
# Occasion research
system <- as.system(paste0('hrs_listened ~ ',covs_collapse))
mannequin <- feols(system,
information = data_hetero, panel.id = "unit",
fixef = c("unit", "interval"))
abstract(mannequin)
Let’s account for the heterogeneity in therapy results throughout cohorts by working the hours listened on cohort-specific relative interval dummies. Within the following desk, I report the outcomes of this occasion research. On this desk, the therapy impact estimates for every cohort and relative interval are reported. By permitting the therapy results to fluctuate per cohort, we account for the heterogeneity and in consequence, we now have the proper estimates! No results are detected for the pre-treatment as they need to be.
# Create dummies for the cohort-period
information <- data_hetero %>%
dummy_cols(select_columns = "cohort_period")
cohort_dummies <- c('cohort_period_2','cohort_period_3')
# Create interactions between relative interval and cohort dummies
work together <- as.information.desk(expand_grid(cohort_dummies, covs))
work together[, interaction := paste0(cohort_dummies,':',covs)]
interact_covs <- work together$interplay
interact_covs_collapse <- paste0(interact_covs,collapse = '+')# Run the occasion research
system <- as.system(paste0('hrs_listened ~ ',interact_covs_collapse))
mannequin <- feols(system,
information = data_hetero, panel.id = "unit",
fixef = c("unit", "interval"))
abstract(mannequin)
3. Below-identification within the absolutely dynamic specification within the absence of a never-treated group
In a completely dynamic occasion research specification the place one consists of all leads and lags (often solely relative time -1 is dropped to keep away from good multicollinearity) of the therapy, the therapy impact coefficients will not be recognized within the absence of a non-treated group. The explanation for that is that the dynamic causal results can’t be distinguished from the mix of unit and time results (Borusyak et al. 2021). The sensible answer for that is to drop one other pre-treatment dummy (i.e., one other one of many lead therapy dummies) to keep away from the under-identification drawback.
Illustrative instance continued:
Think about that we should not have information on any untreated nations. Thus, we solely have the handled nations in our pattern. We will nonetheless do an occasion research using the variation within the therapy timing. On this case, nonetheless, we now have to make use of not just one however a minimum of two reference intervals to keep away from under-identification. One can do that by dropping the interval proper earlier than the therapy and probably the most adverse relative interval dummies from the specification.
Within the simulated dataset, I drop the observations from the untreated cohort and name this new dataset data_under_id
. Now, we now have solely handled cohorts in our pattern. The remaining is identical as the unique simulated dataset. Thus, we now have to make use of a minimum of two reference intervals by dropping the dummies for any of the pre-treatment relative interval dummies. I select to exclude the dummies for the relative intervals -1 and -3. I report the outcomes from this occasion research beneath. As you may see now, I’ve just one relative interval estimated within the mannequin. The estimates are right, nice!
4. Utilizing occasion research as a pretest for parallel traits assumption
It’s a frequent technique to make use of occasion research as a pretest for the parallel traits assumption (PTA), an important assumption of the difference-in-differences (DiD) strategy. PTA states that within the absence of the therapy, the handled and untreated models would comply with parallel traits when it comes to the result of curiosity. Occasion research are used to see whether or not the handled group behaves in a different way than the non-treated group earlier than the therapy happens. It’s thought that if a statistically vital distinction shouldn’t be detected between the handled and untreated teams the PTA is prone to maintain.
Nevertheless, Roth (2022) reveals that this strategy might be problematic. One challenge is that these kinds of pretests have decrease statistical energy. This makes it tougher to detect differing traits. One other challenge is that you probably have excessive statistical energy you may detect differing pre-treatment results (pre-trends) though they don’t seem to be so vital.
Roth (2022) recommends a few approaches to deal with this drawback:
- Don’t rely solely on the statistical significance of the pretest coefficients and take the statistical energy of the pretest under consideration. If the ability is low the occasion research received’t be very informative with regard to the existence of a pre-tend. You probably have excessive statistical energy the outcomes of the pretest may nonetheless be deceptive as you may discover a statistically vital pre-trend that isn’t so essential.
- Think about approaches that keep away from pretesting altogether, e.g., use financial data in a given context to decide on the proper PTA reminiscent of a conditional PTA. One other approach is to make use of the later handled group because the management group in the event you assume the handled and untreated teams comply with totally different traits and will not be as comparable. Please, see Callaway & Sant’Anna’s 2021 paper for potential methods to calm down the PTA.
Illustrative instance continued:
Going again to the unique instance the place we now have three nations, let’s say that we need to carry out a DiD evaluation and we need to discover assist indicating that the PTA holds on this context. This may imply that if the handled nations had been to not be handled the music consumption would transfer in parallel to the music consumption within the untreated nation.
We consider using a fair research as a option to pretest the PTA as a result of there is no such thing as a option to take a look at the PTA immediately. First, we have to take the statistical energy of the take a look at under consideration. Roth (2021) gives some instruments to do that. Though that is out of the scope of this text, I can say that on this simulated dataset we now have a comparatively excessive statistical energy. As a result of the random noise is low and we now have a comparatively huge pattern measurement with not that many coefficients to estimate. Nonetheless, it may be good to run situation analyses to see how huge of a pre-treatment impact one can accurately detect.
Secondly, whatever the statistical significance standing of the pre-treatment estimates take the precise context under consideration. Do I count on the handled nations to comply with the identical traits because the untreated nation? In my simulated information, I do know this for certain as I decide what the info appears to be like like. Nevertheless, in the true world, it’s unlikely that this could maintain unconditionally. Thus, I might think about using a conditional PTA by conditioning the PTA on numerous covariates that make nations extra comparable to one another.
Conclusion
Occasion research are highly effective instruments. Nevertheless, one ought to concentrate on their potential pitfalls. On this article, I explored probably the most generally encountered pitfalls and supplied suggestions on the best way to deal with these utilizing a simulated dataset. I mentioned the problems regarding no anticipation assumption, heterogeneity of the therapy results throughout cohorts, under-identification within the absence of an untreated cohort, and utilizing occasion research as a pretest for PTA.