if(!requireNamespace("fabricatr", quietly = TRUE)) {
install.packages("fabricatr")
}
library(CausalQueries)
library(dplyr)
library(knitr)
Generating: To make a model you need to provide a
DAG statement to make_model
.
For instance
"X->Y"
"X -> M -> Y <- X"
or"Z -> X -> Y <-> X"
.Graphing: Once you have made a model you can inspect the DAG:
Simple summaries: You can access a simple summary
using summary()
summary(xy_model)
#>
#> Causal statement:
#> X -> Y
#>
#> Nodal types:
#> $X
#> 0 1
#>
#> node position display interpretation
#> 1 X NA X0 X = 0
#> 2 X NA X1 X = 1
#>
#> $Y
#> 00 10 01 11
#>
#> node position display interpretation
#> 1 Y 1 Y[*]* Y | X = 0
#> 2 Y 2 Y*[*] Y | X = 1
#>
#> Number of types by node:
#> X Y
#> 2 4
#>
#> Number of causal types: 8
#>
#> Note: Model does not contain: posterior_distribution, stan_objects;
#> to include these objects use update_model()
#>
#> Note: To pose causal queries of this model use query_model()
or you can examine model details using inspect()
.
Inspecting: The model has a set of parameters and a default distribution over these.
xy_model |> inspect("parameters_df")
#>
#> parameters_df
#> Mapping of model parameters to nodal types:
#>
#> param_names: name of parameter
#> node: name of endogeneous node associated
#> with the parameter
#> gen: partial causal ordering of the
#> parameter's node
#> param_set: parameter groupings forming a simplex
#> given: if model has confounding gives
#> conditioning nodal type
#> param_value: parameter values
#> priors: hyperparameters of the prior
#> Dirichlet distribution
#>
#> param_names node gen param_set nodal_type given param_value priors
#> 1 X.0 X 1 X 0 0.50 1
#> 2 X.1 X 1 X 1 0.50 1
#> 3 Y.00 Y 2 Y 00 0.25 1
#> 4 Y.10 Y 2 Y 10 0.25 1
#> 5 Y.01 Y 2 Y 01 0.25 1
#> 6 Y.11 Y 2 Y 11 0.25 1
Tailoring: These features can be edited using
set_restrictions
, set_priors
and
set_parameters
.
Here is an example of setting a monotonicity restriction (see
?set_restrictions
for more):
Here is an example of setting priors (see ?set_priors
for more):
Simulation: Data can be drawn from a model like this:
Z | X | Y |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 0 |
1 | 1 | 1 |
Updating: Update using update_model
.
You can pass all rstan
arguments to
update_model
.
df <-
data.frame(X = rbinom(100, 1, .5)) |>
mutate(Y = rbinom(100, 1, .25 + X*.5))
xy_model <-
xy_model |>
update_model(df, refresh = 0)
Inspecting: You can access the posterior distribution on model parameters directly thus:
X.0 | X.1 | Y.00 | Y.10 | Y.01 | Y.11 |
---|---|---|---|---|---|
0.5497411 | 0.4502589 | 0.1904580 | 0.0031660 | 0.5474077 | 0.2589684 |
0.5964844 | 0.4035156 | 0.0472313 | 0.2275006 | 0.6864478 | 0.0388203 |
0.6109730 | 0.3890270 | 0.0135648 | 0.1257091 | 0.8030706 | 0.0576555 |
0.5500959 | 0.4499041 | 0.0469942 | 0.0814410 | 0.7599636 | 0.1116013 |
0.5678592 | 0.4321408 | 0.1536443 | 0.0983049 | 0.7034709 | 0.0445798 |
0.5147611 | 0.4852389 | 0.0604909 | 0.0882231 | 0.7774623 | 0.0738237 |
where each row is a draw of parameters.
Querying: You ask arbitrary causal queries of the model.
Examples of unconditional queries:
xy_model |>
query_model("Y[X=1] > Y[X=0]", using = c("priors", "posteriors"))
#>
#> Causal queries generated by query_model (all at population level)
#>
#> |query |using | mean| sd| cred.low| cred.high|
#> |:---------------|:----------|-----:|-----:|--------:|---------:|
#> |Y[X=1] > Y[X=0] |priors | 0.244| 0.192| 0.007| 0.712|
#> |Y[X=1] > Y[X=0] |posteriors | 0.685| 0.079| 0.513| 0.820|
Examples of conditional queries:
xy_model |>
query_model("Y[X=1] > Y[X=0]", using = c("priors", "posteriors"),
given = "X==1 & Y == 1")
#>
#> Causal queries generated by query_model (all at population level)
#>
#> |query |given |using | mean| sd| cred.low| cred.high|
#> |:---------------|:-------------|:----------|-----:|-----:|--------:|---------:|
#> |Y[X=1] > Y[X=0] |X==1 & Y == 1 |priors | 0.503| 0.289| 0.024| 0.976|
#> |Y[X=1] > Y[X=0] |X==1 & Y == 1 |posteriors | 0.854| 0.084| 0.679| 0.987|
Queries can even be conditional on counterfactual quantities. Here the probability of a positive effect given some effect:
xy_model |>
query_model("Y[X=1] > Y[X=0]", using = c("priors", "posteriors"),
given = "Y[X=1] != Y[X=0]")
#>
#> Causal queries generated by query_model (all at population level)
#>
#> |query |given |using | mean| sd| cred.low| cred.high|
#> |:---------------|:----------------|:----------|-----:|-----:|--------:|---------:|
#> |Y[X=1] > Y[X=0] |Y[X=1] != Y[X=0] |priors | 0.506| 0.285| 0.031| 0.977|
#> |Y[X=1] > Y[X=0] |Y[X=1] != Y[X=0] |posteriors | 0.895| 0.061| 0.773| 0.993|
Query output is ready for printing as tables, but can also be plotted, which is especially useful with batch requests:
batch_queries <- xy_model |>
query_model(queries = c("Y[X=1] - Y[X=0]", "Y[X=1] > Y[X=0]"),
using = c("priors", "posteriors"),
given = c(TRUE, "Y[X=1] != Y[X=0]"),
expand_grid = TRUE)
batch_queries |> kable(digits = 2, caption = "tabular output")
query | given | using | case_level | mean | sd | cred.low | cred.high |
---|---|---|---|---|---|---|---|
Y[X=1] - Y[X=0] | - | priors | FALSE | 0.00 | 0.32 | -0.63 | 0.66 |
Y[X=1] - Y[X=0] | - | posteriors | FALSE | 0.60 | 0.08 | 0.44 | 0.74 |
Y[X=1] - Y[X=0] | Y[X=1] != Y[X=0] | priors | FALSE | 0.00 | 0.58 | -0.94 | 0.94 |
Y[X=1] - Y[X=0] | Y[X=1] != Y[X=0] | posteriors | FALSE | 0.79 | 0.12 | 0.55 | 0.99 |
Y[X=1] > Y[X=0] | - | priors | FALSE | 0.25 | 0.20 | 0.01 | 0.72 |
Y[X=1] > Y[X=0] | - | posteriors | FALSE | 0.68 | 0.08 | 0.51 | 0.82 |
Y[X=1] > Y[X=0] | Y[X=1] != Y[X=0] | priors | FALSE | 0.50 | 0.29 | 0.03 | 0.97 |
Y[X=1] > Y[X=0] | Y[X=1] != Y[X=0] | posteriors | FALSE | 0.89 | 0.06 | 0.77 | 0.99 |