---
title: "Through the front door"
output:
  rmarkdown::html_vignette:
    md_extensions: [
      "-autolink_bare_uris"
    ]
vignette: >
  %\VignetteIndexEntry{Through the front door}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


``` r
library(CausalQueries)
library(dplyr)
library(knitr)
```

Here is an example of a model in which `X` causes `M` and  `M` causes `Y`. There is, in addition, unobservable confounding between `X` and `Y`. This is an example of a model in which you might use information on `M` to figure out whether `X` caused `Y` making use of the "front door criterion."

The DAG is defined using `dagitty` syntax like this:


``` r
model <- make_model("X -> M -> Y <-> X")
```

We might set priors thus:


``` r
model <- set_priors(model, distribution = "jeffreys")
#> Altering all parameters.
```

You can plot the dag like this:


``` r
plot(model)
```

![Front door model](dfd-1.png)


Updating is done like this:


``` r
# Lets imagine highly correlated data; here an effect of .9 at each step
data <- data.frame(X = rep(0:1, 2000)) |>
  mutate(
    M = rbinom(n(), 1, .05 + .9*X),
    Y = rbinom(n(), 1, .05 + .9*M))

# Updating
model <- model |> update_model(data, refresh = 0)
```

Finally you can calculate an estimand of interest like this:


``` r
query_model(
    model = model,
    using = c("priors", "posteriors"),
    query = "Y[X=1] - Y[X=0]",
    ) |>
  kable(digits = 2)
```


|label           |query           |given |using      |case_level | mean|   sd| cred.low| cred.high|
|:---------------|:---------------|:-----|:----------|:----------|----:|----:|--------:|---------:|
|Y[X=1] - Y[X=0] |Y[X=1] - Y[X=0] |-     |priors     |FALSE      | 0.00| 0.14|    -0.34|      0.29|
|Y[X=1] - Y[X=0] |Y[X=1] - Y[X=0] |-     |posteriors |FALSE      | 0.79| 0.02|     0.76|      0.82|


This uses the posterior distribution and the model to assess the average treatment effect estimand.

Let's compare now with the case where you do not have data on `M`:


``` r

model |>
  update_model(data |> dplyr::select(X, Y), refresh = 0) |>
  query_model(
    using = c("priors", "posteriors"),
    query = "Y[X=1] - Y[X=0]") |>
  kable(digits = 2)
```


|label           |query           |given |using      |case_level | mean|   sd| cred.low| cred.high|
|:---------------|:---------------|:-----|:----------|:----------|----:|----:|--------:|---------:|
|Y[X=1] - Y[X=0] |Y[X=1] - Y[X=0] |-     |priors     |FALSE      |  0.0| 0.14|    -0.34|      0.34|
|Y[X=1] - Y[X=0] |Y[X=1] - Y[X=0] |-     |posteriors |FALSE      |  0.1| 0.17|    -0.03|      0.60|


Here we update much less and are (relatively) much less certain in our beliefs precisely because we are aware of the confounded related between `X` and `Y`, without having the data on `M` we could use to address it.

# Try it

Say `X`, `M`, and `Y` were perfectly correlated. Would the average treatment effect be identified?