Title: | Ordered Correlation Forest |
---|---|
Description: | Machine learning estimator specifically optimized for predictive modeling of ordered non-numeric outcomes. 'ocf' provides forest-based estimation of the conditional choice probabilities and the covariates’ marginal effects. Under an "honesty" condition, the estimates are consistent and asymptotically normal and standard errors can be obtained by leveraging the weight-based representation of the random forest predictions. Please reference the use as Di Francesco (2025) <doi:10.1080/07474938.2024.2429596>. |
Authors: | Riccardo Di Francesco [aut, cre, cph] |
Maintainer: | Riccardo Di Francesco <[email protected]> |
License: | GPL-3 |
Version: | 1.0.3 |
Built: | 2025-02-16 14:20:12 UTC |
Source: | https://github.com/riccardo-df/ocf |
Generate a synthetic data set with an ordered non-numeric outcome, together with conditional probabilities and covariates' marginal effects.
generate_ordered_data(n)
generate_ordered_data(n)
n |
Sample size. |
First, a latent outcome is generated as follows:
with:
Second, the observed outcomes are obtained by discretizing the latent outcome into three classes using uniformly spaced threshold parameters.
Third, the conditional probabilities and the covariates' marginal effects at the mean are generated using standard textbook formulas. Marginal effects are approximated using a sample of 1,000,000 observations.
A list storing a data frame with the observed data, a matrix of true conditional probabilities, and a matrix of true marginal effects at the mean of the covariates.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(1000) head(data$true_probs) data$me_at_mean sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X)
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(1000) head(data$true_probs) data$me_at_mean sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X)
Nonparametric estimation of marginal effects using an ocf
object.
marginal_effects( object, data = NULL, these_covariates = NULL, eval = "atmean", bandwitdh = 0.1, inference = FALSE )
marginal_effects( object, data = NULL, these_covariates = NULL, eval = "atmean", bandwitdh = 0.1, inference = FALSE )
object |
An |
data |
Data set of class |
these_covariates |
Named list with covariates' names as keys and strings denoting covariates' types as entries. Strings must be either |
eval |
Evaluation point for marginal effects. Either |
bandwitdh |
How many standard deviations |
inference |
Whether to extract weights and compute standard errors. The weights extraction considerably slows down the program. |
marginal_effects
can estimate mean marginal effects, marginal effects at the mean, or marginal effects at the
median, according to the eval
argument.
If these_covariates
is NULL
(the default), the routine assumes that covariates with with at most ten unique values are categorical and treats the remaining covariates as continuous.
Object of class ocf.marginal
.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Marginal effects at the mean. me <- marginal_effects(forests, eval = "atmean") print(me) print(me, latex = TRUE) plot(me) ## Compute standard errors. This requires honest forests. honest_forests <- ocf(Y, X, honesty = TRUE) honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE) print(honest_me, latex = TRUE) plot(honest_me) ## Subset covariates and select covariates' types. my_covariates <- list("x1" = "continuous", "x2" = "discrete", "x4" = "discrete") honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE, these_covariates = my_covariates) print(honest_me) plot(honest_me)
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Marginal effects at the mean. me <- marginal_effects(forests, eval = "atmean") print(me) print(me, latex = TRUE) plot(me) ## Compute standard errors. This requires honest forests. honest_forests <- ocf(Y, X, honesty = TRUE) honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE) print(honest_me, latex = TRUE) plot(honest_me) ## Subset covariates and select covariates' types. my_covariates <- list("x1" = "continuous", "x2" = "discrete", "x4" = "discrete") honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE, these_covariates = my_covariates) print(honest_me) plot(honest_me)
Accuracy measures for evaluating ordered probability predictions.
mean_squared_error(y, predictions, use.true = FALSE) mean_absolute_error(y, predictions, use.true = FALSE) mean_ranked_score(y, predictions, use.true = FALSE) classification_error(y, predictions)
mean_squared_error(y, predictions, use.true = FALSE) mean_absolute_error(y, predictions, use.true = FALSE) mean_ranked_score(y, predictions, use.true = FALSE) classification_error(y, predictions)
y |
Either the observed outcome vector or a matrix of true probabilities. |
predictions |
Predictions. |
use.true |
If |
When calling one of mean_squared_error
, mean_absolute_error
, or mean_ranked_score
,
predictions
must be a matrix of predicted class probabilities, with as many rows as observations in y
and as
many columns as classes of y
.
If use.true == FALSE
, the mean squared error (MSE), the mean absolute error (MAE), and the mean ranked probability score
(RPS) are computed as follows:
If use.true == TRUE
, the MSE, the MAE, and the RPS are computed as follows (useful for simulation studies):
where:
When calling classification_error
, predictions
must be a vector of predicted class labels.
Classification error (CE) is computed as follows:
where Y_i are the observed class labels.
The MSE, the MAE, the RPS, or the CE of the method.
Riccardo Di Francesco
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit ocf on training sample. forests <- ocf(Y_tr, X_tr) ## Accuracy measures on test sample. predictions <- predict(forests, X_test) mean_squared_error(Y_test, predictions$probabilities) mean_ranked_score(Y_test, predictions$probabilities) classification_error(Y_test, predictions$classification)
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit ocf on training sample. forests <- ocf(Y_tr, X_tr) ## Accuracy measures on test sample. predictions <- predict(forests, X_test) mean_squared_error(Y_test, predictions$probabilities) mean_ranked_score(Y_test, predictions$probabilities) classification_error(Y_test, predictions$classification)
Estimation strategy to estimate conditional choice probabilities for ordered non-numeric outcomes.
multinomial_ml(Y = NULL, X = NULL, learner = "forest", scale = TRUE)
multinomial_ml(Y = NULL, X = NULL, learner = "forest", scale = TRUE)
Y |
Outcome vector. |
X |
Covariate matrix (no intercept). |
learner |
String, either |
scale |
Logical, whether to scale the covariates. Ignored if |
Multinomial machine learning expresses conditional choice probabilities as expectations of binary variables:
This allows us to estimate each expectation separately using any regression algorithm to get an estimate of conditional probabilities.
multinomial_ml
combines this strategy with either regression forests or penalized logistic regressions with an L1 penalty,
according to the user-specified parameter learner
.
If learner == "l1"
, the penalty parameters are chosen via 10-fold cross-validation
and model.matrix
is used to handle non-numeric covariates. Additionally, if scale == TRUE
, the covariates are scaled to
have zero mean and unit variance.
Object of class mml
.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit multinomial machine learning on training sample using two different learners. multinomial_forest <- multinomial_ml(Y_tr, X_tr, learner = "forest") multinomial_l1 <- multinomial_ml(Y_tr, X_tr, learner = "l1") ## Predict out of sample. predictions_forest <- predict(multinomial_forest, X_test) predictions_l1 <- predict(multinomial_l1, X_test) ## Compare predictions. cbind(head(predictions_forest), head(predictions_l1))
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit multinomial machine learning on training sample using two different learners. multinomial_forest <- multinomial_ml(Y_tr, X_tr, learner = "forest") multinomial_l1 <- multinomial_ml(Y_tr, X_tr, learner = "l1") ## Predict out of sample. predictions_forest <- predict(multinomial_forest, X_test) predictions_l1 <- predict(multinomial_l1, X_test) ## Compare predictions. cbind(head(predictions_forest), head(predictions_l1))
Nonparametric estimator for ordered non-numeric outcomes. The estimator modifies a standard random forest splitting criterion to build a collection of forests, each estimating the conditional probability of a single class.
ocf( Y = NULL, X = NULL, honesty = FALSE, honesty.fraction = 0.5, inference = FALSE, alpha = 0.2, n.trees = 2000, mtry = ceiling(sqrt(ncol(X))), min.node.size = 5, max.depth = 0, replace = FALSE, sample.fraction = ifelse(replace, 1, 0.5), n.threads = 1 )
ocf( Y = NULL, X = NULL, honesty = FALSE, honesty.fraction = 0.5, inference = FALSE, alpha = 0.2, n.trees = 2000, mtry = ceiling(sqrt(ncol(X))), min.node.size = 5, max.depth = 0, replace = FALSE, sample.fraction = ifelse(replace, 1, 0.5), n.threads = 1 )
Y |
Outcome vector. |
X |
Covariate matrix (no intercept). |
honesty |
Whether to grow honest forests. |
honesty.fraction |
Fraction of honest sample. Ignored if |
inference |
Whether to extract weights and compute standard errors. The weights extraction considerably slows down the routine. |
alpha |
Controls the balance of each split. Each split leaves at least a fraction |
n.trees |
Number of trees. |
mtry |
Number of covariates to possibly split at in each node. Default is the square root of the number of covariates. |
min.node.size |
Minimal node size. |
max.depth |
Maximal tree depth. A value of 0 corresponds to unlimited depth, 1 to "stumps" (one split per tree). |
replace |
If |
sample.fraction |
Fraction of observations to sample. |
n.threads |
Number of threads. Zero corresponds to the number of CPUs available. |
Object of class ocf
.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit ocf on training sample. forests <- ocf(Y_tr, X_tr) ## We have compatibility with generic S3-methods. print(forests) summary(forests) predictions <- predict(forests, X_test) head(predictions$probabilities) table(Y_test, predictions$classification) ## Compute standard errors. This requires honest forests. honest_forests <- ocf(Y_tr, X_tr, honesty = TRUE, inference = TRUE) head(honest_forests$predictions$standard.errors) ## Marginal effects. me <- marginal_effects(forests, eval = "atmean") print(me) print(me, latex = TRUE) plot(me) ## Compute standard errors. This requires honest forests. honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE) print(honest_me, latex = TRUE) plot(honest_me)
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit ocf on training sample. forests <- ocf(Y_tr, X_tr) ## We have compatibility with generic S3-methods. print(forests) summary(forests) predictions <- predict(forests, X_test) head(predictions$probabilities) table(Y_test, predictions$classification) ## Compute standard errors. This requires honest forests. honest_forests <- ocf(Y_tr, X_tr, honesty = TRUE, inference = TRUE) head(honest_forests$predictions$standard.errors) ## Marginal effects. me <- marginal_effects(forests, eval = "atmean") print(me) print(me, latex = TRUE) plot(me) ## Compute standard errors. This requires honest forests. honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE) print(honest_me, latex = TRUE) plot(honest_me)
Estimation strategy to estimate conditional choice probabilities for ordered non-numeric outcomes.
ordered_ml(Y = NULL, X = NULL, learner = "forest", scale = TRUE)
ordered_ml(Y = NULL, X = NULL, learner = "forest", scale = TRUE)
Y |
Outcome vector. |
X |
Covariate matrix (no intercept). |
learner |
String, either |
scale |
Logical, whether to scale the covariates. Ignored if |
Ordered machine learning expresses conditional choice probabilities as the difference between the cumulative probabilities of two adjacent classes, which in turn can be expressed as conditional expectations of binary variables:
Then we can separately estimate each expectation using any regression algorithm and pick the difference between the m-th and the
(m-1)-th estimated surfaces to estimate conditional probabilities.
ordered_ml
combines this strategy with either regression forests or penalized logistic regressions with an L1 penalty,
according to the user-specified parameter learner
.
If learner == "forest"
, then the orf
function is called from an external package, as this estimator has already been proposed by Lechner and Okasa (2019).
If learner == "l1"
,
the penalty parameters are chosen via 10-fold cross-validation and model.matrix
is used to handle non-numeric covariates.
Additionally, if scale == TRUE
, the covariates are scaled to have zero mean and unit variance.
Object of class oml
.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit ordered machine learning on training sample using two different learners. ordered_forest <- ordered_ml(Y_tr, X_tr, learner = "forest") ordered_l1 <- ordered_ml(Y_tr, X_tr, learner = "l1") ## Predict out of sample. predictions_forest <- predict(ordered_forest, X_test) predictions_l1 <- predict(ordered_l1, X_test) ## Compare predictions. cbind(head(predictions_forest), head(predictions_l1))
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit ordered machine learning on training sample using two different learners. ordered_forest <- ordered_ml(Y_tr, X_tr, learner = "forest") ordered_l1 <- ordered_ml(Y_tr, X_tr, learner = "l1") ## Predict out of sample. predictions_forest <- predict(ordered_forest, X_test) predictions_l1 <- predict(ordered_l1, X_test) ## Compare predictions. cbind(head(predictions_forest), head(predictions_l1))
Plots an ocf.marginal
object.
## S3 method for class 'ocf.marginal' plot(x, ...)
## S3 method for class 'ocf.marginal' plot(x, ...)
x |
An |
... |
Further arguments passed to or from other methods. |
If standard errors have been estimated, 95% confidence intervals are shown.
Plots an ocf.marginal
object.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Marginal effects at the mean. me <- marginal_effects(forests, eval = "atmean") plot(me) ## Add standard errors. honest_forests <- ocf(Y, X, honesty = TRUE) honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE) plot(honest_me)
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Marginal effects at the mean. me <- marginal_effects(forests, eval = "atmean") plot(me) ## Add standard errors. honest_forests <- ocf(Y, X, honesty = TRUE) honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE) plot(honest_me)
Prediction method for class mml
.
## S3 method for class 'mml' predict(object, data = NULL, ...)
## S3 method for class 'mml' predict(object, data = NULL, ...)
object |
An |
data |
Data set of class |
... |
Further arguments passed to or from other methods. |
If object$learner == "l1"
, then model.matrix
is used to handle non-numeric covariates. If we also
have object$scaling == TRUE
, then data
is scaled to have zero mean and unit variance.
Matrix of predictions.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit multinomial machine learning on training sample using two different learners. multinomial_forest <- multinomial_ml(Y_tr, X_tr, learner = "forest") multinomial_l1 <- multinomial_ml(Y_tr, X_tr, learner = "l1") ## Predict out of sample. predictions_forest <- predict(multinomial_forest, X_test) predictions_l1 <- predict(multinomial_l1, X_test) ## Compare predictions. cbind(head(predictions_forest), head(predictions_l1))
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit multinomial machine learning on training sample using two different learners. multinomial_forest <- multinomial_ml(Y_tr, X_tr, learner = "forest") multinomial_l1 <- multinomial_ml(Y_tr, X_tr, learner = "l1") ## Predict out of sample. predictions_forest <- predict(multinomial_forest, X_test) predictions_l1 <- predict(multinomial_l1, X_test) ## Compare predictions. cbind(head(predictions_forest), head(predictions_l1))
Prediction method for class ocf
.
## S3 method for class 'ocf' predict(object, data = NULL, type = "response", ...)
## S3 method for class 'ocf' predict(object, data = NULL, type = "response", ...)
object |
An |
data |
Data set of class |
type |
Type of prediction. Either |
... |
Further arguments passed to or from other methods. |
If type == "response"
, the routine returns the predicted conditional class probabilities and the predicted class
labels. If forests are honest, the predicted probabilities are honest.
If type == "terminalNodes"
, the IDs of the terminal node in each tree for each observation in data
are returned.
Desired predictions.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit ocf on training sample. forests <- ocf(Y_tr, X_tr) ## Predict on test sample. predictions <- predict(forests, X_test) head(predictions$probabilities) predictions$classification ## Get terminal nodes. predictions <- predict(forests, X_test, type = "terminalNodes") predictions$forest.1[1:10, 1:20] # Rows are observations, columns are forests.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit ocf on training sample. forests <- ocf(Y_tr, X_tr) ## Predict on test sample. predictions <- predict(forests, X_test) head(predictions$probabilities) predictions$classification ## Get terminal nodes. predictions <- predict(forests, X_test, type = "terminalNodes") predictions$forest.1[1:10, 1:20] # Rows are observations, columns are forests.
Prediction method for class oml
.
## S3 method for class 'oml' predict(object, data = NULL, ...)
## S3 method for class 'oml' predict(object, data = NULL, ...)
object |
An |
data |
Data set of class |
... |
Further arguments passed to or from other methods. |
If object$learner == "l1"
, then model.matrix
is used to handle non-numeric covariates. If we also
have object$scaling == TRUE
, then data
is scaled to have zero mean and unit variance.
Matrix of predictions.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit ordered machine learning on training sample using two different learners. ordered_forest <- ordered_ml(Y_tr, X_tr, learner = "forest") ordered_l1 <- ordered_ml(Y_tr, X_tr, learner = "l1") ## Predict out of sample. predictions_forest <- predict(ordered_forest, X_test) predictions_l1 <- predict(ordered_l1, X_test) ## Compare predictions. cbind(head(predictions_forest), head(predictions_l1))
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Training-test split. train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5)) Y_tr <- Y[train_idx] X_tr <- X[train_idx, ] Y_test <- Y[-train_idx] X_test <- X[-train_idx, ] ## Fit ordered machine learning on training sample using two different learners. ordered_forest <- ordered_ml(Y_tr, X_tr, learner = "forest") ordered_l1 <- ordered_ml(Y_tr, X_tr, learner = "l1") ## Predict out of sample. predictions_forest <- predict(ordered_forest, X_test) predictions_l1 <- predict(ordered_l1, X_test) ## Compare predictions. cbind(head(predictions_forest), head(predictions_l1))
Prints an ocf
object.
## S3 method for class 'ocf' print(x, ...)
## S3 method for class 'ocf' print(x, ...)
x |
An |
... |
Further arguments passed to or from other methods. |
Prints an ocf
object.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Print. print(forests)
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Print. print(forests)
Prints an ocf.marginal
object.
## S3 method for class 'ocf.marginal' print(x, latex = FALSE, ...)
## S3 method for class 'ocf.marginal' print(x, latex = FALSE, ...)
x |
An |
latex |
If |
... |
Further arguments passed to or from other methods. |
Compilation of the LATEX code requires the following packages: booktabs
, float
, adjustbox
. If
standard errors have been estimated, they are printed in parenthesis below each point estimate.
Prints an ocf.marginal
object.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Marginal effects at the mean. me <- marginal_effects(forests, eval = "atmean") print(me) print(me, latex = TRUE) ## Add standard errors. honest_forests <- ocf(Y, X, honesty = TRUE) honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE) print(honest_me, latex = TRUE)
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Marginal effects at the mean. me <- marginal_effects(forests, eval = "atmean") print(me) print(me, latex = TRUE) ## Add standard errors. honest_forests <- ocf(Y, X, honesty = TRUE) honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE) print(honest_me, latex = TRUE)
Summarizes an ocf
object.
## S3 method for class 'ocf' summary(object, ...)
## S3 method for class 'ocf' summary(object, ...)
object |
An |
... |
Further arguments passed to or from other methods. |
Summarizes an ocf
object.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Summary. summary(forests)
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Summary. summary(forests)
Summarizes an ocf.marginal
object.
## S3 method for class 'ocf.marginal' summary(object, latex = FALSE, ...)
## S3 method for class 'ocf.marginal' summary(object, latex = FALSE, ...)
object |
An |
latex |
If |
... |
Further arguments passed to or from other methods. |
Compilation of the LATEX code requires the following packages: booktabs
, float
, adjustbox
. If
standard errors have been estimated, they are printed in parenthesis below each point estimate.
Summarizes an ocf.marginal
object.
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Marginal effects at the mean. me <- marginal_effects(forests, eval = "atmean") summary(me) summary(me, latex = TRUE) ## Add standard errors. honest_forests <- ocf(Y, X, honesty = TRUE) honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE) summary(honest_me, latex = TRUE)
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(100) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Marginal effects at the mean. me <- marginal_effects(forests, eval = "atmean") summary(me) summary(me, latex = TRUE) ## Add standard errors. honest_forests <- ocf(Y, X, honesty = TRUE) honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE) summary(honest_me, latex = TRUE)
Extracts tree information from a ocf.forest
object.
tree_info(object, tree = 1)
tree_info(object, tree = 1)
object |
|
tree |
Number of the tree of interest. |
Nodes and variables IDs are 0-indexed, i.e., node 0 is the root node.
All values smaller than or equal to splitval
go to the left and all values larger go to the right.
A data.frame
with the following columns:
nodeID |
Node IDs. |
leftChild |
IDs of the left child node. |
rightChild |
IDs of the right child node. |
splitvarID |
IDs of the splitting variable. |
splitvarName |
Name of the splitting variable. |
splitval |
Splitting value. |
terminal |
Logical, TRUE for terminal nodes. |
prediction |
One column with the predicted conditional class probabilities. |
Riccardo Di Francesco
Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17. doi:10.1080/07474938.2024.2429596.
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(1000) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Extract information from tenth tree of first forest. info <- tree_info(forests$forests.info$forest.1, tree = 10) head(info)
## Generate synthetic data. set.seed(1986) data <- generate_ordered_data(1000) sample <- data$sample Y <- sample$Y X <- sample[, -1] ## Fit ocf. forests <- ocf(Y, X) ## Extract information from tenth tree of first forest. info <- tree_info(forests$forests.info$forest.1, tree = 10) head(info)