image
VincentWei

天地间,浩然正气长存,为天地立心,为生民立命,为往圣继绝学,为万世开太平!

Introduction to Causal Inference.pdf


VincentWei

Course Lecture Notes
Introduction to Causal Inference
from a Machine Learning Perspective
Brady Neal
December 17, 2020
Contents
Preface
ii
Contents
iii
1 Motivation: Why You Might Care
1
1.1 Simpson’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2 Applications of Causal Inference . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3 Correlation Does Not Imply Causation . . . . . . . . . . . . . . . . . . . .
3
1.3.1 Nicolas Cage and Pool Drownings . . . . . . . . . . . . . . . . . . .
3
1.3.2 Why is Association Not Causation? . . . . . . . . . . . . . . . . . .
4
1.4 Main Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2 Potential Outcomes
6
2.1 Potential Outcomes and Individual Treatment Effffects . . . . . . . . . . . .
6
2.2 The Fundamental Problem of Causal Inference . . . . . . . . . . . . . . . .
7
2.3 Getting Around the Fundamental Problem . . . . . . . . . . . . . . . . . .
8
2.3.1 Average Treatment Effffects and Missing Data Interpretation . . . .
8
2.3.2 Ignorability and Exchangeability
. . . . . . . . . . . . . . . . . . .
9
2.3.3 Conditional Exchangeability and Unconfoundedness . . . . . . . .
10
2.3.4 Positivity/Overlap and Extrapolation . . . . . . . . . . . . . . . . .
12
2.3.5 No interference, Consistency, and SUTVA
. . . . . . . . . . . . . .
13
2.3.6 Tying It All Together
. . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.4 Fancy Statistics Terminology Defancifified . . . . . . . . . . . . . . . . . . .
15
2.5 A Complete Example with Estimation . . . . . . . . . . . . . . . . . . . . .
16
3 The Flow of Association and Causation in Graphs
19
3.1 Graph Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.2 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Causal Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Two-Node Graphs and Graphical Building Blocks . . . . . . . . . . . . . . 23
3.5 Chains and Forks
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Colliders and their Descendants . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 d-separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.8 Flow of Association and Causation
. . . . . . . . . . . . . . . . . . . . . . 30
4 Causal Models
32
4.1 The do-operator and Interventional Distributions
. . . . . . . . . . . . . . 32
4.2 The Main Assumption: Modularity . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Truncated Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 Example Application and Revisiting “Association is Not Causation” 36
4.4 The Backdoor Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.1 Relation to Potential Outcomes . . . . . . . . . . . . . . . . . . . . . 39
4.5 Structural Causal Models (SCMs) . . . . . . . . . . . . . . . . . . . . . . . 40
4.5.1 Structural Equations
. . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5.2 Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5.3
Collider Bias and Why to Not Condition on Descendants of Treatment 43
4.6 Example Applications of the Backdoor Adjustment . . . . . . . . . . . . . 44
4.6.1 Association vs. Causation in a Toy Example . . . . . . . . . . . . . 44
4.6.2 A Complete Example with Estimation
. . . . . . . . . . . . . . . . 45
4.7 Assumptions Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Randomized Experiments
49
5.1 Comparability and Covariate Balance . . . . . . . . . . . . . . . . . . . . . 49
5.2 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 No Backdoor Paths
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
6 Nonparametric Identifification
52
6.1 Frontdoor Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 do-calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2.1 Application: Frontdoor Adjustment . . . . . . . . . . . . . . . . . . 57
6.3 Determining Identififiability from the Graph . . . . . . . . . . . . . . . . . . 58
7 Estimation
62
7.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2 Conditional Outcome Modeling (COM) . . . . . . . . . . . . . . . . . . . . 63
7.3 Grouped Conditional Outcome Modeling (GCOM) . . . . . . . . . . . . . 64
7.4 Increasing Data Effiffifficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.4.1 TARNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.4.2 X-Learner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.5 Propensity Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.6 Inverse Probability Weighting (IPW) . . . . . . . . . . . . . . . . . . . . . . 68
7.7 Doubly Robust Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.8 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
7.9.1 Confifidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . .
71
7.9.2 Comparison to Randomized Experiments . . . . . . . . . . . . . . 72
8 Unobserved Confounding: Bounds and Sensitivity Analysis
73
8.1 Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.1.1 No-Assumptions Bound . . . . . . . . . . . . . . . . . . . . . . . . 74
8.1.2 Monotone Treatment Response . . . . . . . . . . . . . . . . . . . . 76
8.1.3 Monotone Treatment Selection . . . . . . . . . . . . . . . . . . . . . 78
8.1.4 Optimal Treatment Selection . . . . . . . . . . . . . . . . . . . . . . 79
8.2 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.2.1 Sensitivity Basics in Linear Setting . . . . . . . . . . . . . . . . . . . 82
8.2.2 More General Settings
. . . . . . . . . . . . . . . . . . . . . . . . . 85
9 Instrumental Variables
86
9.1 What is an Instrument? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.2 No Nonparametric Identifification of the ATE . . . . . . . . . . . . . . . . . 87
9.3 Warm-Up: Binary Linear Setting . . . . . . . . . . . . . . . . . . . . . . . . 87
9.4 Continuous Linear Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
9.5 Nonparametric Identifification of Local ATE . . . . . . . . . . . . . . . . . . 90
9.5.1 New Potential Notation with Instruments
. . . . . . . . . . . . . . 90
9.5.2 Principal Stratifification
. . . . . . . . . . . . . . . . . . . . . . . . . 90
9.5.3 Local ATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
9.6 More General Settings for ATE Identifification . . . . . . . . . . . . . . . . . 94
10 Difffference in Difffferences
95
10.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.2 Introducing Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.3 Identifification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.3.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.3.2 Main Result and Proof . . . . . . . . . . . . . . . . . . . . . . . . . 97
10.4 Major Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11 Causal Discovery from Observational Data
100
11.1 Independence-Based Causal Discovery . . . . . . . . . . . . . . . . . . . . 100
11.1.1 Assumptions and Theorem . . . . . . . . . . . . . . . . . . . . . . . 100
11.1.2 The PC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
11.1.3 Can We Get Any Better Identifification? . . . . . . . . . . . . . . . . 104
11.2 Semi-Parametric Causal Discovery . . . . . . . . . . . . . . . . . . . . . . . 104
11.2.1 No Identififiability Without Parametric Assumptions . . . . . . . . . 105
11.2.2 Linear Non-Gaussian Noise . . . . . . . . . . . . . . . . . . . . . . 105
11.2.3 Nonlinear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
11.3 Further Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
12 Causal Discovery from Interventional Data
110
12.1 Structural Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.1.1 Single-Node Interventions . . . . . . . . . . . . . . . . . . . . . . . 110
12.1.2 Multi-Node Interventions
. . . . . . . . . . . . . . . . . . . . . . . 110
12.2 Parametric Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.2.1 Coming Soon
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.3 Interventional Markov Equivalence . . . . . . . . . . . . . . . . . . . . . . 110
12.3.1 Coming Soon
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.4 Miscellaneous Other Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.4.1 Coming Soon
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
13 Transfer Learning and Transportability
111
13.1 Causal Insights for Transfer Learning . . . . . . . . . . . . . . . . . . . . . 111
13.1.1 Coming Soon
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
13.2 Transportability of Causal Effffects Across Populations . . . . . . . . . . . . 111
13.2.1 Coming Soon
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
14 Counterfactuals and Mediation
112
14.1 Counterfactuals Basics
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
14.1.1 Coming Soon
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
14.2 Important Application: Mediation . . . . . . . . . . . . . . . . . . . . . . . 112
14.2.1 Coming Soon
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Appendix
113
A Proofs
114
A.1 Proof of Equation 6.1 from Section 6.1 . . . . . . . . . . . . . . . . . . . . . 114
A.2 Proof of Propensity Score Theorem (7.1) . . . . . . . . . . . . . . . . . . . . 114
A.3 Proof of IPW Estimand (7.18) . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Bibliography
117
Alphabetical Index
123
Views 300   Last Modified: 2021-11-21 14:48