| Title: | Causal Inference for Qualitative Outcomes |
|---|---|
| Description: | Implements the framework introduced in Di Francesco and Mellace (2025) <doi:10.48550/arXiv.2502.11691>, shifting the focus to well-defined and interpretable estimands that quantify how treatment affects the probability distribution over outcome categories. It supports selection-on-observables, instrumental variables, regression discontinuity, and difference-in-differences designs. |
| Authors: | Riccardo Di Francesco [aut, cre, cph] |
| Maintainer: | Riccardo Di Francesco <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-22 08:15:25 UTC |
| Source: | https://github.com/riccardo-df/causalqual |
Causal Inference for Qualitative Outcomes under Difference-in-Differences
causalQual_did(Y, D, unit_id, time)causalQual_did(Y, D, unit_id, time)
Y |
Qualitative outcome before treatment. Must be labeled as |
D |
Binary treatment indicator. |
unit_id |
Unit identifier. |
time |
Time identifier. |
treatment_start |
Denots at which |
... |
Other arguments for the |
Under a difference-in-difference design, identification requires that the probabilities time shift for for class evolve similarly for the treated and control groups (parallel
trends on the probability mass functions of ). If this assumption holds, we can recover the probability of shift on the treated for class :
causalQual_did applies, for each class , the canonical two-group/two-period method to the binary variable . Specifically,
consider the following linear model:
The OLS estimate of is our estimate of the probability shift on the treated for class m. Standard errors are clustered at the unit level and used to construct
conventional confidence intervals.
An object of class causalQual.
Riccardo Di Francesco
Di Francesco, R., and Mellace, G. (2025). Causal Inference for Qualitative Outcomes. arXiv preprint arXiv:2502.11691. doi:10.48550/arXiv.2502.11691.
causalQual_soo causalQual_iv causalQual_rd
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_did(100, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D unit_id <- data$unit_id time <- data$time ## Estimate probabilities of shift on the treated. fit <- causalQual_did(Y, D, unit_id, time) summary(fit) plot(fit)## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_did(100, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D unit_id <- data$unit_id time <- data$time ## Estimate probabilities of shift on the treated. fit <- causalQual_did(Y, D, unit_id, time) summary(fit) plot(fit)
Causal Inference for Qualitative Outcomes under Instrumental Variables
causalQual_iv(Y, D, Z)causalQual_iv(Y, D, Z)
Y |
Qualitative outcome before treatment. Must be labeled as |
D |
Binary treatment indicator. |
Z |
Binary instrument. |
Under an instrumental-variables design, identification requires the instrument to be independent of potential outcomes and potential treatments (exogeneity), that the
instrument influences the outcome solely through its effect on treatment (exclusion restriction), that the instrument has a nonzero effect on treatment probability (relevance), and that the instrument can only
increase/decrease the treatment probability (monotonicity). If these assumptions hold, we can recover the local probabilities of shift for all classes:
causalQual_iv applies, for each class m, the standard two-stage least squares method to the binary variable . Specifically, the routine first estimates
the following first-stage regression model via OLS:
and constructs the predicted values . It then uses these predicted values in the second-stage regressions:
The OLS estimate of is then our estimate of . Standard errors are computed using conventional procedures and used to construct
conventional confidence intervals. All of this is done by calling the ivreg function.
An object of class causalQual.
Riccardo Di Francesco
Di Francesco, R., and Mellace, G. (2025). Causal Inference for Qualitative Outcomes. arXiv preprint arXiv:2502.11691. doi:10.48550/arXiv.2502.11691.
causalQual_soo causalQual_rd causalQual_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_iv(100, outcome_type = "ordered") Y <- data$Y D <- data$D Z <- data$Z ## Estimate local probabilities of shift. fit <- causalQual_iv(Y, D, Z) summary(fit) plot(fit)## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_iv(100, outcome_type = "ordered") Y <- data$Y D <- data$D Z <- data$Z ## Estimate local probabilities of shift. fit <- causalQual_iv(Y, D, Z) summary(fit) plot(fit)
Causal Inference for Qualitative Outcomes under Regression Discontinuity
causalQual_rd(Y, running_variable, cutoff)causalQual_rd(Y, running_variable, cutoff)
Y |
Qualitative outcome. Must be labeled as |
running_variable |
Running variable determining treatment assignment. |
cutoff |
Cutoff or threshold. Units with |
Under a regression discontinuity design, identification requires that the probability mass functions for class of potential outcomes are continuous in the running variable (continuity). If this assumption holds,
we can recover the probability shift at the cutoff for class :
causalQual_rd applies, for each class , standard local polynomial estimators to the binary variable . Specifically, the ruotine implements the
robust bias-corrected inference procedure of Calonico et al. (2014) (see the rdrobust function).
An object of class causalQual.
Riccardo Di Francesco
Di Francesco, R., and Mellace, G. (2025). Causal Inference for Qualitative Outcomes. arXiv preprint arXiv:2502.11691. doi:10.48550/arXiv.2502.11691.
causalQual_soo causalQual_iv causalQual_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_rd(100, outcome_type = "ordered") Y <- data$Y running_variable <- data$running_variable cutoff <- data$cutoff ## Estimate probabilities of shift at the cutoff. fit <- causalQual_rd(Y, running_variable, cutoff) summary(fit) plot(fit)## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_rd(100, outcome_type = "ordered") Y <- data$Y running_variable <- data$running_variable cutoff <- data$cutoff ## Estimate probabilities of shift at the cutoff. fit <- causalQual_rd(Y, running_variable, cutoff) summary(fit) plot(fit)
Causal Inference for Qualitative Outcomes under Selection-on-Observables
causalQual_soo(Y, D, X, outcome_type, K = 5)causalQual_soo(Y, D, X, outcome_type, K = 5)
Y |
Qualitative outcome. Must be labeled as |
D |
Binary treatment indicator. |
X |
Covariate matrix (no intercept). |
outcome_type |
String controlling the outcome type. Must be either |
K |
Number of folds for nuisance functions estimation. |
Under a selection-on-observables design, identification requires the treatment indicator to be (conditionally) independent of potential outcomes (unconfoundedness), and that each unit has a non-zero probability of being treated (common support). If these assumptions hold, we can recover the probabilities of shift of all classes:
causalQual_soo constructs and averages doubly robust scores for qualitative outcomes
to estimate . For each class , the doubly robust score for unit is defined as:
The estimator for is then the average of the scores:
with its variance estimated as:
causalQual_soo uses these estimates to construct confidence intervals based on conventional normal approximations.
If outcome_type == "multinomial", and are estimated using a multinomial_ml strategy with regression forests
as base learners. Else, if outcome_type == "ordered", and are estimated using the honest version of the ocf estimator.
is always estimated via a honest regression_forest. K-fold cross-fitting is employed for the estimation of all these functions.
Folds are created by random split. If some class of Y is not observed in one or more folds for one or both treatment groups, a new random partition is performed. This process is repeat until when all
classes are observed in all folds and for all treatment groups up to 1000 times, after which the routine raises an error.
An object of class causalQual.
Riccardo Di Francesco
Di Francesco, R., and Mellace, G. (2025). Causal Inference for Qualitative Outcomes. arXiv preprint arXiv:2502.11691. doi:10.48550/arXiv.2502.11691.
causalQual_iv causalQual_rd causalQual_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(200, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X # Estimate probabilities of shift. fit <- causalQual_soo(Y, D, X, outcome_type = "ordered", K = 2) summary(fit) plot(fit)## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(200, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X # Estimate probabilities of shift. fit <- causalQual_soo(Y, D, X, outcome_type = "ordered", K = 2) summary(fit) plot(fit)
Generate a synthetic data set with qualitative outcomes under a difference-in-differences design. The data include two time periods, a binary treatment indicator (applied only in the second period), and a matrix of covariates. Probabilities time shift among the treated and control groups evolve similarly across the two time periods (parallel trends on the probability mass functions).
generate_qualitative_data_did(n, assignment, outcome_type)generate_qualitative_data_did(n, assignment, outcome_type)
n |
Sample size. |
assignment |
String controlling treatment assignment. Must be either |
outcome_type |
String controlling the outcome type. Must be either |
Potential outcomes are generated differently according to outcome_type. If outcome_type == "multinomial", generate_qualitative_data_did computes linear predictors for each class using the covariates:
and then transforms into valid probability distributions using the softmax function:
It then generates potential outcomes , , , and by sampling from {1, 2, 3} using .
If instead outcome_type == "ordered", generate_qualitative_data_did first generates latent potential outcomes:
with . It then constructs by discretizing using threshold parameters and . Then,
which allows us to analytically compute the probabilities of shift on the treated.
Treatment is always assigned as . If assignment == "randomized", then the propensity score is specified as .
If instead assignment == "observational", then .
The function always generates three independent covariates from . Observed outcomes are always constructed using the usual observational rule.
A list storing a data frame with the observed data, the true propensity score, and the true probabilities of shift on the treated.
Riccardo Di Francesco
generate_qualitative_data_soo generate_qualitative_data_iv generate_qualitative_data_rd
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_did(100, assignment = "observational", outcome_type = "ordered") data$pshifts_treated## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_did(100, assignment = "observational", outcome_type = "ordered") data$pshifts_treated
Generate a synthetic data set with qualitative outcomes under an instrumental variables design. The data include a binary treatment indicator and a binary instrument. Potential outcomes and potential treatments are independent of the instrument. Moreover, the instrument does not directly impact potential outcomes, has an impact on treatment probability, and can only increase the probability of treatment.
generate_qualitative_data_iv(n, outcome_type)generate_qualitative_data_iv(n, outcome_type)
n |
Sample size. |
outcome_type |
String controlling the outcome type. Must be either |
Potential outcomes are generated differently according to outcome_type. If outcome_type == "multinomial", generate_qualitative_data_iv computes linear predictors for each class using the covariates:
and then transforms into valid probability distributions using the softmax function:
It then generates potential outcomes and by sampling from {1, 2, 3} using .
If instead outcome_type == "ordered", generate_qualitative_data_iv first generates latent potential outcomes:
with . It then constructs by discretizing using threshold parameters and . Then,
which allows us to analytically compute the local probabilities of shift.
The instrument is always generated as . Treatment is always modeled as , with
. Thus, can increase the probability of treatment intake but cannot decrease it.
The function always generates three independent covariates from . Observed outcomes are always constructed using the usual observational rule.
The DGPs outlined above ensure that is a valid instrument for D_i.
A list storing a data frame with the observed data, the true propensity score, the true instrument propensity score, and the true local probabilities of shift.
Riccardo Di Francesco
generate_qualitative_data_soo generate_qualitative_data_rd generate_qualitative_data_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_iv(100, outcome_type = "ordered") data$local_pshifts## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_iv(100, outcome_type = "ordered") data$local_pshifts
Generate a synthetic data set with qualitative outcomes under a regression discontinuity design. The data include a binary treatment indicator and a single covariate (the running variable). The conditional probability mass fuctions of potential outcomes are continuous in the running variable.
generate_qualitative_data_rd(n, outcome_type)generate_qualitative_data_rd(n, outcome_type)
n |
Sample size. |
outcome_type |
String controlling the outcome type. Must be either |
Potential outcomes are generated differently according to outcome_type. If outcome_type == "multinomial", generate_qualitative_data_rd computes linear predictors for each class using the covariates:
and then transforms into valid probability distributions using the softmax function:
It then generates potential outcomes and by sampling from {1, 2, 3} using .
If instead outcome_type == "ordered", generate_qualitative_data_rd first generates latent potential outcomes:
with . It then constructs by discretizing using threshold parameters and . Then,
which allows us to analytically compute the probabilities of shift at the cutoff.
Treatment is always assigned as .
The function always generates three independent covariates from . Observed outcomes are always constructed using the usual observational rule.
The DGPs outlined above ensure identification under a standard regression discontinuity design.
A list storing a data frame with the observed data, and the true probabilities of shift at the cutoff.
Riccardo Di Francesco
generate_qualitative_data_soo generate_qualitative_data_iv generate_qualitative_data_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_rd(100, outcome_type = "ordered") data$pshifts_cutoff## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_rd(100, outcome_type = "ordered") data$pshifts_cutoff
Generate a synthetic data set with qualitative outcomes under a selection-on-observables design. The data include a binary treatment indicator and a matrix of covariates. The treatment is either independent or conditionally (on the covariates) independent of potential outcomes, depending on users' choices.
generate_qualitative_data_soo(n, assignment, outcome_type)generate_qualitative_data_soo(n, assignment, outcome_type)
n |
Sample size. |
assignment |
String controlling treatment assignment. Must be either |
outcome_type |
String controlling the outcome type. Must be either |
Potential outcomes are generated differently according to outcome_type. If outcome_type == "multinomial", generate_qualitative_data_soo computes linear predictors for each class using the covariates:
and then transforms into valid probability distributions using the softmax function:
It then generates potential outcomes and by sampling from {1, 2, 3} using .
If instead outcome_type == "ordered", generate_qualitative_data_soo first generates latent potential outcomes:
with . It then constructs by discretizing using threshold parameters and . Then,
which allows us to analytically compute the probabilities of shift.
Treatment is always assigned as . If assignment == "randomized", then the propensity score is specified as .
If instead assignment == "observational", then .
The function always generates three independent covariates from . Observed outcomes are always constructed using the usual observational rule.
Controlling for and is sufficient for selection-on-observables to hold.
A list storing a data frame with the observed data, the true propensity score, and the true probabilities of shift.
Riccardo Di Francesco
generate_qualitative_data_iv generate_qualitative_data_rd generate_qualitative_data_did
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(100, assignment = "observational", outcome_type = "ordered") data$pshifts## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(100, assignment = "observational", outcome_type = "ordered") data$pshifts
Plots an causalQual object.
## S3 method for class 'causalQual' plot(x, hline = TRUE, ...)## S3 method for class 'causalQual' plot(x, hline = TRUE, ...)
x |
An |
hline |
Logical, whether to display an horizontal line at zero in the plot. |
... |
Further arguments passed to or from other methods. |
Plots an causalQual object.
Riccardo Di Francesco
causalQual
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") plot(fit)## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") plot(fit)
Prints an causalQual object.
## S3 method for class 'causalQual' print(x, ...)## S3 method for class 'causalQual' print(x, ...)
x |
An |
... |
Further arguments passed to or from other methods. |
Prints an causalQual object.
Riccardo Di Francesco
causalQual
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") print(fit)## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") print(fit)
Summarizes an causalQual object.
## S3 method for class 'causalQual' summary(object, ...)## S3 method for class 'causalQual' summary(object, ...)
object |
An |
... |
Further arguments passed to or from other methods. |
Summarizes an causalQual object.
Riccardo Di Francesco
causalQual
## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") summary(fit)## Generate synthetic data. set.seed(1986) data <- generate_qualitative_data_soo(1000, assignment = "observational", outcome_type = "ordered") Y <- data$Y D <- data$D X <- data$X ## Estimate probabilities of shifts. fit <- causalQual_soo(Y = Y, D = D, X = X, outcome_type = "ordered") summary(fit)