Getting Started

if(!requireNamespace("fabricatr", quietly = TRUE)) {
  install.packages("fabricatr")
}

library(CausalQueries)
library(dplyr)
library(knitr)

Make a model

Generating: To make a model you need to provide a DAG statement to make_model.
For instance

  • "X->Y"
  • "X -> M -> Y <- X" or
  • "Z -> X -> Y <-> X".
# examples of models
xy_model <- make_model("X -> Y")
iv_model <- make_model("Z -> X -> Y <-> X")

Graphing: Once you have made a model you can inspect the DAG:

plot(xy_model)

Simple summaries: You can access a simple summary using summary()

summary(xy_model)
#> 
#> Causal statement: 
#> X -> Y
#> 
#> Nodal types: 
#> $X
#> 0  1
#> 
#>   node position display interpretation
#> 1    X       NA      X0          X = 0
#> 2    X       NA      X1          X = 1
#> 
#> $Y
#> 00  10  01  11
#> 
#>   node position display interpretation
#> 1    Y        1   Y[*]*      Y | X = 0
#> 2    Y        2   Y*[*]      Y | X = 1
#> 
#> Number of types by node:
#> X Y 
#> 2 4 
#> 
#> Number of causal types:  8
#> 
#> Note: Model does not contain: posterior_distribution, stan_objects;
#> to include these objects use update_model()
#> 
#> Note: To pose causal queries of this model use query_model()

or you can examine model details using inspect().

Inspecting: The model has a set of parameters and a default distribution over these.

xy_model |> inspect("parameters_df") 
#> 
#> parameters_df
#> Mapping of model parameters to nodal types: 
#> 
#>   param_names: name of parameter
#>   node:        name of endogeneous node associated
#>                with the parameter
#>   gen:         partial causal ordering of the
#>                parameter's node
#>   param_set:   parameter groupings forming a simplex
#>   given:       if model has confounding gives
#>                conditioning nodal type
#>   param_value: parameter values
#>   priors:      hyperparameters of the prior
#>                Dirichlet distribution 
#> 
#>   param_names node gen param_set nodal_type given param_value priors
#> 1         X.0    X   1         X          0              0.50      1
#> 2         X.1    X   1         X          1              0.50      1
#> 3        Y.00    Y   2         Y         00              0.25      1
#> 4        Y.10    Y   2         Y         10              0.25      1
#> 5        Y.01    Y   2         Y         01              0.25      1
#> 6        Y.11    Y   2         Y         11              0.25      1

Tailoring: These features can be edited using set_restrictions, set_priors and set_parameters.

Here is an example of setting a monotonicity restriction (see ?set_restrictions for more):

iv_model <- 
  iv_model |> set_restrictions(decreasing('Z', 'X'))

Here is an example of setting priors (see ?set_priors for more):

iv_model <- 
  iv_model |> set_priors(distribution = "jeffreys")
#> Altering all parameters.

Simulation: Data can be drawn from a model like this:

data <- make_data(iv_model, n = 4) 

data |> kable()
Z X Y
0 0 0
0 1 1
1 0 0
1 1 1

Update the model

Updating: Update using update_model. You can pass all rstan arguments to update_model.

df <- 
  data.frame(X = rbinom(100, 1, .5)) |>
  mutate(Y = rbinom(100, 1, .25 + X*.5))

xy_model <- 
  xy_model |> 
  update_model(df, refresh = 0)

Inspecting: You can access the posterior distribution on model parameters directly thus:


xy_model |> grab("posterior_distribution") |> 
  head() |> kable()
X.0 X.1 Y.00 Y.10 Y.01 Y.11
0.5497411 0.4502589 0.1904580 0.0031660 0.5474077 0.2589684
0.5964844 0.4035156 0.0472313 0.2275006 0.6864478 0.0388203
0.6109730 0.3890270 0.0135648 0.1257091 0.8030706 0.0576555
0.5500959 0.4499041 0.0469942 0.0814410 0.7599636 0.1116013
0.5678592 0.4321408 0.1536443 0.0983049 0.7034709 0.0445798
0.5147611 0.4852389 0.0604909 0.0882231 0.7774623 0.0738237

where each row is a draw of parameters.

Query the model

Arbitrary queries

Querying: You ask arbitrary causal queries of the model.

Examples of unconditional queries:

xy_model |> 
  query_model("Y[X=1] > Y[X=0]", using = c("priors", "posteriors")) 
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |query           |using      |  mean|    sd| cred.low| cred.high|
#> |:---------------|:----------|-----:|-----:|--------:|---------:|
#> |Y[X=1] > Y[X=0] |priors     | 0.244| 0.192|    0.007|     0.712|
#> |Y[X=1] > Y[X=0] |posteriors | 0.685| 0.079|    0.513|     0.820|

Examples of conditional queries:

xy_model |> 
  query_model("Y[X=1] > Y[X=0]", using = c("priors", "posteriors"),
              given = "X==1 & Y == 1") 
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |query           |given         |using      |  mean|    sd| cred.low| cred.high|
#> |:---------------|:-------------|:----------|-----:|-----:|--------:|---------:|
#> |Y[X=1] > Y[X=0] |X==1 & Y == 1 |priors     | 0.503| 0.289|    0.024|     0.976|
#> |Y[X=1] > Y[X=0] |X==1 & Y == 1 |posteriors | 0.854| 0.084|    0.679|     0.987|

Queries can even be conditional on counterfactual quantities. Here the probability of a positive effect given some effect:

xy_model |> 
  query_model("Y[X=1] > Y[X=0]", using = c("priors", "posteriors"),
              given = "Y[X=1] != Y[X=0]") 
#> 
#> Causal queries generated by query_model (all at population level)
#> 
#> |query           |given            |using      |  mean|    sd| cred.low| cred.high|
#> |:---------------|:----------------|:----------|-----:|-----:|--------:|---------:|
#> |Y[X=1] > Y[X=0] |Y[X=1] != Y[X=0] |priors     | 0.506| 0.285|    0.031|     0.977|
#> |Y[X=1] > Y[X=0] |Y[X=1] != Y[X=0] |posteriors | 0.895| 0.061|    0.773|     0.993|

Output

Query output is ready for printing as tables, but can also be plotted, which is especially useful with batch requests:

batch_queries <- xy_model |> 
  query_model(queries = c("Y[X=1] - Y[X=0]", "Y[X=1] > Y[X=0]"), 
              using = c("priors", "posteriors"), 
              given = c(TRUE, "Y[X=1] != Y[X=0]"),
              expand_grid = TRUE) 

batch_queries |> kable(digits = 2, caption = "tabular output")
tabular output
query given using case_level mean sd cred.low cred.high
Y[X=1] - Y[X=0] - priors FALSE 0.00 0.32 -0.63 0.66
Y[X=1] - Y[X=0] - posteriors FALSE 0.60 0.08 0.44 0.74
Y[X=1] - Y[X=0] Y[X=1] != Y[X=0] priors FALSE 0.00 0.58 -0.94 0.94
Y[X=1] - Y[X=0] Y[X=1] != Y[X=0] posteriors FALSE 0.79 0.12 0.55 0.99
Y[X=1] > Y[X=0] - priors FALSE 0.25 0.20 0.01 0.72
Y[X=1] > Y[X=0] - posteriors FALSE 0.68 0.08 0.51 0.82
Y[X=1] > Y[X=0] Y[X=1] != Y[X=0] priors FALSE 0.50 0.29 0.03 0.97
Y[X=1] > Y[X=0] Y[X=1] != Y[X=0] posteriors FALSE 0.89 0.06 0.77 0.99
batch_queries |> plot()