Title: | Causal Inference for Qualitative Outcomes |
---|---|
Description: | Implements the framework introduced in Di Francesco and Mellace (2025) <doi:10.48550/arXiv.2502.11691>, shifting the focus to well-defined and interpretable estimands that quantify how treatment affects the probability distribution over outcome categories. It supports selection-on-observables, instrumental variables, regression discontinuity, and difference-in-differences designs. |
Authors: | Riccardo Di Francesco [aut, cre, cph] |
Maintainer: | Riccardo Di Francesco <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2025-02-25 07:22:24 UTC |
Source: | https://github.com/riccardo-df/causalqual |
Fit two-group/two-period models for qualitative outcomes to estimate the probabilities of shift on the treated.
causalQual_did(Y_pre, Y_post, D)
causalQual_did(Y_pre, Y_post, D)
Y_pre |
Qualitative outcome before treatment. Must be labeled as |
Y_post |
Qualitative outcome after treatment. Must be labeled as |
D |
Binary treatment indicator. |
Under a difference-in-difference design, identification requires that the probabilities time shift for for class
evolve similarly for the treated and control groups (parallel
trends on the probability mass functions of
). If this assumption holds, we can recover the probability of shift on the treated for class
:
causalQual_did
applies, for each class , the canonical two-group/two-period method to the binary variable
. Specifically,
consider the following linear model:
The OLS estimate of
is our estimate of the probability shift on the treated for class
m
. Standard errors are clustered at the unit level and used to construct
conventional confidence intervals.
An object of class causalQual
.
Riccardo Di Francesco
Di Francesco, R., and Mellace, G. (2025). Causal Inference for Qualitative Outcomes. arXiv preprint arXiv:2502.11691. doi:10.48550/arXiv.2502.11691.
causalQual_soo
causalQual_iv
causalQual_rd
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_did(100, assignment = "observational", outcome_type = "ordered") Y_pre <- data$Y_pre Y_post <- data$Y_post D <- data$D ## Estimate probabilities of shift on the treated. fit <- causalQual_did(Y_pre, Y_post, D) summary(fit) plot(fit)
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_did(100, assignment = "observational", outcome_type = "ordered") Y_pre <- data$Y_pre Y_post <- data$Y_post D <- data$D ## Estimate probabilities of shift on the treated. fit <- causalQual_did(Y_pre, Y_post, D) summary(fit) plot(fit)
Fit two-stage least squares models for qualitative outcomes to estimate the local probabilities of shift.
causalQual_iv(Y, D, Z)
causalQual_iv(Y, D, Z)
Y |
Qualitative outcome before treatment. Must be labeled as |
D |
Binary treatment indicator. |
Z |
Binary instrument. |
Under an instrumental-variables design, identification requires the instrument to be independent of potential outcomes and potential treatments (exogeneity), that the
instrument influences the outcome solely through its effect on treatment (exclusion restriction), that the instrument has a nonzero effect on treatment probability (relevance), and that the instrument can only
increase/decrease the treatment probability (monotonicity). If these assumptions hold, we can recover the local probabilities of shift for all classes:
causalQual_iv
applies, for each class m
, the standard two-stage least squares method to the binary variable . Specifically, the routine first estimates
the following first-stage regression model via OLS:
and constructs the predicted values . It then uses these predicted values in the second-stage regressions:
The OLS estimate of
is then our estimate of
. Standard errors are computed using conventional procedures and used to construct
conventional confidence intervals. All of this is done by calling the
ivreg
function.
An object of class causalQual
.
Riccardo Di Francesco
Di Francesco, R., and Mellace, G. (2025). Causal Inference for Qualitative Outcomes. arXiv preprint arXiv:2502.11691. doi:10.48550/arXiv.2502.11691.
causalQual_soo
causalQual_rd
causalQual_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_iv(100, outcome_type = "ordered") Y <- data$Y D <- data$D Z <- data$Z ## Estimate local probabilities of shift. fit <- causalQual_iv(Y, D, Z) summary(fit) plot(fit)
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_iv(100, outcome_type = "ordered") Y <- data$Y D <- data$D Z <- data$Z ## Estimate local probabilities of shift. fit <- causalQual_iv(Y, D, Z) summary(fit) plot(fit)
Fit local polynomial regression models for qualitative outcomes to estimate the probabilities of shift at the cutoff.
causalQual_rd(Y, running_variable, cutoff)
causalQual_rd(Y, running_variable, cutoff)
Y |
Qualitative outcome. Must be labeled as |
running_variable |
Running variable determining treatment assignment. |
cutoff |
Cutoff or threshold. Units with |
Under a regression discontinuity design, identification requires that the probability mass functions for class of potential outcomes are continuous in the running variable (continuity). If this assumption holds,
we can recover the probability shift at the cutoff for class
:
causalQual_rd
applies, for each class , standard local polynomial estimators to the binary variable
. Specifically, the ruotine implements the
robust bias-corrected inference procedure of Calonico et al. (2014) (see the
rdrobust
function).
An object of class causalQual
.
Riccardo Di Francesco
Di Francesco, R., and Mellace, G. (2025). Causal Inference for Qualitative Outcomes. arXiv preprint arXiv:2502.11691. doi:10.48550/arXiv.2502.11691.
causalQual_soo
causalQual_iv
causalQual_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_rd(100, outcome_type = "ordered") Y <- data$Y running_variable <- data$running_variable cutoff <- data$cutoff ## Estimate probabilities of shift at the cutoff. fit <- causalQual_rd(Y, running_variable, cutoff) summary(fit) plot(fit)
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_rd(100, outcome_type = "ordered") Y <- data$Y running_variable <- data$running_variable cutoff <- data$cutoff ## Estimate probabilities of shift at the cutoff. fit <- causalQual_rd(Y, running_variable, cutoff) summary(fit) plot(fit)
Construct and average doubly robust scores for qualitative outcomes to estimate the probabilities of shift.
causalQual_soo(Y, D, X, outcome_type, K = 5)
causalQual_soo(Y, D, X, outcome_type, K = 5)
Y |
Qualitative outcome. Must be labeled as |
D |
Binary treatment indicator. |
X |
Covariate matrix (no intercept). |
outcome_type |
String controlling the outcome type. Must be either |
K |
Number of folds for nuisance functions estimation. |
Under a selection-on-observables design, identification requires the treatment indicator to be (conditionally) independent of potential outcomes (unconfoundedness), and that each unit has a non-zero probability of being treated (common support). If these assumptions hold, we can recover the probabilities of shift of all classes:
causalQual_soo
constructs and averages doubly robust scores for qualitative outcomes
to estimate . For each class
, the doubly robust score for unit
is defined as:
The estimator for is then the average of the scores:
with its variance estimated as:
causalQual_soo
uses these estimates to construct confidence intervals based on conventional normal approximations.
If outcome_type == "multinomial"
, and
are estimated using a
multinomial_ml
strategy with regression forests
as base learners. Else, if outcome_type == "ordered"
, and
are estimated using the honest version of the
ocf
estimator.
is always estimated via a honest
regression_forest
. K
-fold cross-fitting is employed for the estimation of all these functions.
Folds are created by random split. If some class of Y
is not observed in one or more folds for one or both treatment groups, a new random partition is performed. This process is repeat until when all
classes are observed in all folds and for all treatment groups up to 1000 times, after which the routine raises an error.
An object of class causalQual
.
Riccardo Di Francesco
Di Francesco, R., and Mellace, G. (2025). Causal Inference for Qualitative Outcomes. arXiv preprint arXiv:2502.11691. doi:10.48550/arXiv.2502.11691.
causalQual_iv
causalQual_rd
causalQual_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(200, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X # Estimate probabilities of shift. fit <- causalQual_soo(Y, D, X, outcome_type = "ordered", K = 2) summary(fit) plot(fit)
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(200, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X # Estimate probabilities of shift. fit <- causalQual_soo(Y, D, X, outcome_type = "ordered", K = 2) summary(fit) plot(fit)
Generate a synthetic data set with qualitative outcomes under a difference-in-differences design. The data include two time periods, a binary treatment indicator (applied only in the second period), and a matrix of covariates. Probabilities time shift among the treated and control groups evolve similarly across the two time periods (parallel trends on the probability mass functions).
generate_qualitative_data_did(n, assignment, outcome_type)
generate_qualitative_data_did(n, assignment, outcome_type)
n |
Sample size. |
assignment |
String controlling treatment assignment. Must be either |
outcome_type |
String controlling the outcome type. Must be either |
Potential outcomes are generated differently according to outcome_type
. If outcome_type == "multinomial"
, generate_qualitative_data_did
computes linear predictors for each class using the covariates:
and then transforms into valid probability distributions using the softmax function:
It then generates potential outcomes ,
,
, and
by sampling from {1, 2, 3} using
.
If instead outcome_type == "ordered"
, generate_qualitative_data_did
first generates latent potential outcomes:
with . It then constructs
by discretizing
using threshold parameters
and
. Then,
which allows us to analytically compute the probabilities of shift on the treated.
Treatment is always assigned as . If
assignment == "randomized"
, then the propensity score is specified as .
If instead
assignment == "observational"
, then .
The function always generates three independent covariates from . Observed outcomes
are always constructed using the usual observational rule.
A list storing a data frame with the observed data, the true propensity score, and the true probabilities of shift on the treated.
Riccardo Di Francesco
generate_qualitative_data_soo
generate_qualitative_data_iv
generate_qualitative_data_rd
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_did(100, assignment = "observational", outcome_type = "ordered") data$pshifts_treated
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_did(100, assignment = "observational", outcome_type = "ordered") data$pshifts_treated
Generate a synthetic data set with qualitative outcomes under an instrumental variables design. The data include a binary treatment indicator and a binary instrument. Potential outcomes and potential treatments are independent of the instrument. Moreover, the instrument does not directly impact potential outcomes, has an impact on treatment probability, and can only increase the probability of treatment.
generate_qualitative_data_iv(n, outcome_type)
generate_qualitative_data_iv(n, outcome_type)
n |
Sample size. |
outcome_type |
String controlling the outcome type. Must be either |
Potential outcomes are generated differently according to outcome_type
. If outcome_type == "multinomial"
, generate_qualitative_data_iv
computes linear predictors for each class using the covariates:
and then transforms into valid probability distributions using the softmax function:
It then generates potential outcomes and
by sampling from {1, 2, 3} using
.
If instead outcome_type == "ordered"
, generate_qualitative_data_iv
first generates latent potential outcomes:
with . It then constructs
by discretizing
using threshold parameters
and
. Then,
which allows us to analytically compute the local probabilities of shift.
The instrument is always generated as . Treatment is always modeled as
, with
. Thus,
can increase the probability of treatment intake but cannot decrease it.
The function always generates three independent covariates from . Observed outcomes
are always constructed using the usual observational rule.
A list storing a data frame with the observed data, the true propensity score, the true instrument propensity score, and the true local probabilities of shift.
Riccardo Di Francesco
generate_qualitative_data_soo
generate_qualitative_data_rd
generate_qualitative_data_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_iv(100, outcome_type = "ordered") data$local_pshifts
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_iv(100, outcome_type = "ordered") data$local_pshifts
Generate a synthetic data set with qualitative outcomes under a regression discontinuity design. The data include a binary treatment indicator and a single covariate (the running variable). The conditional probability mass fuctions of potential outcomes are continuous in the running variable.
generate_qualitative_data_rd(n, outcome_type)
generate_qualitative_data_rd(n, outcome_type)
n |
Sample size. |
outcome_type |
String controlling the outcome type. Must be either |
Potential outcomes are generated differently according to outcome_type
. If outcome_type == "multinomial"
, generate_qualitative_data_rd
computes linear predictors for each class using the covariates:
and then transforms into valid probability distributions using the softmax function:
It then generates potential outcomes and
by sampling from {1, 2, 3} using
.
If instead outcome_type == "ordered"
, generate_qualitative_data_rd
first generates latent potential outcomes:
with . It then constructs
by discretizing
using threshold parameters
and
. Then,
which allows us to analytically compute the probabilities of shift at the cutoff.
Treatment is always assigned as .
The function always generates three independent covariates from . Observed outcomes
are always constructed using the usual observational rule.
A list storing a data frame with the observed data, and the true probabilities of shift at the cutoff.
Riccardo Di Francesco
generate_qualitative_data_soo
generate_qualitative_data_iv
generate_qualitative_data_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_rd(100, outcome_type = "ordered") data$pshifts_cutoff
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_rd(100, outcome_type = "ordered") data$pshifts_cutoff
Generate a synthetic data set with qualitative outcomes under a selection-on-observables design. The data include a binary treatment indicator and a matrix of covariates. The treatment is either independent or conditionally (on the covariates) independent of potential outcomes, depending on users' choices.
generate_qualitative_data_soo(n, assignment, outcome_type)
generate_qualitative_data_soo(n, assignment, outcome_type)
n |
Sample size. |
assignment |
String controlling treatment assignment. Must be either |
outcome_type |
String controlling the outcome type. Must be either |
Potential outcomes are generated differently according to outcome_type
. If outcome_type == "multinomial"
, generate_qualitative_data_soo
computes linear predictors for each class using the covariates:
and then transforms into valid probability distributions using the softmax function:
It then generates potential outcomes and
by sampling from {1, 2, 3} using
.
If instead outcome_type == "ordered"
, generate_qualitative_data_soo
first generates latent potential outcomes:
with . It then constructs
by discretizing
using threshold parameters
and
. Then,
which allows us to analytically compute the probabilities of shift.
Treatment is always assigned as . If
assignment == "randomized"
, then the propensity score is specified as .
If instead
assignment == "observational"
, then .
The function always generates three independent covariates from . Observed outcomes
are always constructed using the usual observational rule.
A list storing a data frame with the observed data, the true propensity score, and the true probabilities of shift.
Riccardo Di Francesco
generate_qualitative_data_iv
generate_qualitative_data_rd
generate_qualitative_data_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(100, assignment = "observational", outcome_type = "ordered") data$pshifts
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(100, assignment = "observational", outcome_type = "ordered") data$pshifts
Plots an causalQual
object.
## S3 method for class 'causalQual' plot(x, hline = TRUE, ...)
## S3 method for class 'causalQual' plot(x, hline = TRUE, ...)
x |
An |
hline |
Logical, whether to display an horizontal line at zero in the plot. |
... |
Further arguments passed to or from other methods. |
Plots an causalQual object.
Riccardo Di Francesco
causalQual
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") plot(fit)
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") plot(fit)
Prints an causalQual
object.
## S3 method for class 'causalQual' print(x, ...)
## S3 method for class 'causalQual' print(x, ...)
x |
An |
... |
Further arguments passed to or from other methods. |
Prints an causalQual
object.
Riccardo Di Francesco
causalQual
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") print(fit)
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") print(fit)
Summarizes an causalQual
object.
## S3 method for class 'causalQual' summary(object, ...)
## S3 method for class 'causalQual' summary(object, ...)
object |
An |
... |
Further arguments passed to or from other methods. |
Summarizes an causalQual
object.
Riccardo Di Francesco
causalQual
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") summary(fit)
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") summary(fit)