Publications
2025
- JGR:MLCSpace-Time Causal Discovery in Earth System Science: A Local Stencil Learning ApproachJ. Jake Nichol, Michael Weylandt, G. Matthew Fricke , and 3 more authorsJournal of Geophysical Research: Machine Learning and Computation, 2025
Causal discovery tools enable scientists to infer meaningful relationships from observational data, spurring advances in fields as diverse as biology, economics, and climate science. Despite these successes, the application of causal discovery to space-time systems remains immensely challenging due to the high-dimensional nature of the data. For example, in climate sciences, modern observational temperature records over the past few decades regularly measure thousands of locations around the globe. To address these challenges, we introduce Causal Space-Time Stencil Learning (CaStLe), a novel meta-algorithm for discovering causal structures in complex space-time systems. CaStLe leverages regularities in local space-time dependencies to learn governing global dynamics. This local perspective eliminates spurious confounding and drastically reduces sample complexity, making space-time causal discovery practical and effective. For causal discovery, CaStLe flexibly accepts any appropriately adapted time series causal discovery algorithm to recover local causal structures. These advances enable causal discovery of geophysical phenomena that were previously unapproachable, including non-periodic, transient phenomena such as volcanic eruption plumes. Regularities in local space-time dependencies are transformed into informative spatial replicates, which actually improve CaStLe’s performance when applied to ever-larger spatial grids. We successfully apply CaStLe to discover the atmospheric dynamics governing the climate response to the 1991 Mount Pinatubo volcanic eruption. We provide validation experiments to demonstrate the effectiveness of CaStLe over existing causal-discovery frameworks on a range of geophysics-inspired benchmarks while identifying the method’s limitations and domains where its assumptions may not hold. We introduce a new method for learning the dynamics of causal systems, that is, the physical rules that define a system’s behavior. Although this task, causal discovery, is not new, existing tools are ill-suited for many large geophysics data sets. Current state-of-the-art approaches use statistical techniques to search for causal relationships between all aspects of a system, examining billions of possible causal effects, or simplifying the data by focusing on the most important variables. Instead of an exhaustive search or oversimplifying the data, we incorporate basic physical principles—requiring effects to be “local” and “uniform”—to massively simplify the causal discovery problem. We demonstrate that our approach can recover known geophysical dynamics by applying it to the 1991 Mt. Pinatubo eruption, validating its ability to uncover space-time causal structure from observational data. We introduce Causal Space-Time Stencil Learning (CaStLe) for learning local causal dynamical structure underlying space-time data CaStLe enables previously infeasible analyses of grid-cell-level Earth system data, significantly outperforming traditional methods We demonstrate this new capability by recovering the space-time evolution of atmospheric aerosol flow weeks post-volcanic eruption We introduce Causal Space-Time Stencil Learning (CaStLe) for learning local causal dynamical structure underlying space-time data CaStLe enables previously infeasible analyses of grid-cell-level Earth system data, significantly outperforming traditional methods We demonstrate this new capability by recovering the space-time evolution of atmospheric aerosol flow weeks post-volcanic eruption
2023
- OSTIBenchmarking the PCMCI Causal Discovery Algorithm for Spatiotemporal SystemsJ. Jake Nichol, Michael Weylandt, Mark Smith , and 1 more author2023
Causal discovery algorithms construct hypothesized causal graphs that depict causal dependencies among variables in observational data. While powerful, the accuracy of these algorithms is highly sensitive to the underlying dynamics of the system in ways that have not been fully characterized in the literature. In this report, we benchmark the PCMCI causal discovery algorithm in its application to gridded spatiotemporal systems. Effectively computing grid-level causal graphs on large grids will enable analysis of the causal impacts of transient and mobile spatial phenomena in large systems, such as the Earth’s climate. We evaluate the performance of PCMCI with a set of structural causal models, using simulated spatial vector autoregressive processes in one-and two-dimensions. We develop computational and analytical tools for characterizing these processes and their associated causal graphs. Our findings suggest that direct application of PCMCI is not suitable for the analysis of dynamical spatiotemporal gridded systems, such as climatological data, without significant preprocessing and downscaling of the data. PCMCI requires unrealistic sample sizes to achieve acceptable performance on even modestly sized problems and suffers from a notable curse of dimensionality. This work suggests that, even under generous structural assumptions, significant additional algorithmic improvements are needed before causal discovery algorithms can be reliably applied to grid-level outputs of earth system models.
2021
- JCAMMachine learning feature analysis illuminates disparity between E3SM climate models and observed climate changeJ. Jake Nichol, Matthew G. Peterson, Kara J. Peterson , and 2 more authorsJournal of Computational and Applied Mathematics, Oct 2021
In September of 2020, Arctic sea ice extent was the second-lowest on record. State of the art climate prediction uses Earth system models (ESMs), driven by systems of differential equations representing the laws of physics. Previously, these models have tended to underestimate Arctic sea ice loss. The issue is grave because accurate modeling is critical for economic, ecological, and geopolitical planning. We use machine learning techniques, including random forest regression and Gini importance, to show that the Energy Exascale Earth System Model (E3SM) relies too heavily on just one of the ten chosen climatological quantities to predict September sea ice averages. Furthermore, E3SM gives too much importance to six of those quantities when compared to observed data. Identifying the features that climate models incorrectly rely on should allow climatologists to improve prediction accuracy.
- OSTICausal Evaluations for Identifying Differences between Observations and Earth System ModelsJ. Jake Nichol, Matthew Peterson, and Kara PetersonOct 2021
- ICMLLearning Why: Data-Driven Causal Evaluations of Climate Models.J. Jake Nichol, Matthew Peterson, G. Matthew Fricke , and 1 more authorICML 2021 Workshop Tackling Climate Change with Machine Learning, Oct 2021
We plan to use nascent data-driven causal discovery methods to find and compare causal relationships in observed data and climate model output. We will look at ten different features in the Arctic climate collected from public databases and from the Energy Exascale Earth System Model (E3SM). In identifying and comparing the resulting causal networks, we hope to find important differences between observed causal relationships and those in climate models. With these, climate modeling experts will be able to improve the coupling and parameterization of E3SM and other climate models.
2020
- OSTIArctic Tipping Points Triggering Global Change (LDRD Final Report)Kara J. Peterson, Amy Jo Powell, Irina Kalashnikova Tezaur , and 7 more authorsSep 2020
2018
- arXivThe Swarmathon: An Autonomous Swarm Robotics CompetitionSarah M Ackerman, G Matthew Fricke, Joshua P Hecker , and 7 more authorsSep 2018
The Swarmathon is a swarm robotics programming challenge that engages college students from minority-serving institutions in NASA’s Journey to Mars. Teams compete by programming a group of robots to search for, pick up, and drop off resources in a collection zone. The Swarmathon produces prototypes for robot swarms that would collect resources on the surface of Mars. Robots operate completely autonomously with no global map, and each team’s algorithm must be sufficiently flexible to effectively find resources from a variety of unknown distributions. The Swarmathon includes Physical and Virtual Competitions. Physical competitors test their algorithms on robots they build at their schools; they then upload their code to run autonomously on identical robots during the three day competition in an outdoor arena at Kennedy Space Center. Virtual competitors complete an identical challenge in simulation. Participants mentor local teams to compete in a separate High School Division. In the first 2 years, over 1,100 students participated. 63% of students were from underrepresented ethnic and racial groups. Participants had significant gains in both interest and core robotic competencies that were equivalent across gender and racial groups, suggesting that the Swarmathon is effectively educating a diverse population of future roboticists.