麻豆村

麻豆村
STAMPS@麻豆村

STAtistical Methods for the Physical Sciences Research Center

Spring 2026

January 30, 2026

vivian acquaviva headshot, CUNY NYC

Title: Learning from simulations using statistics, ML, and AI

 Abstract: My research focuses on the process of learning from simulations using a variety of numerical methods, from classic statistics to machine learning to generative AI tools. I will show a few examples from my Astrophysics work, on validating cosmological simulations and formulating hypotheses for the physical models that drive galaxy evolution processes. I will then move on to current research in climate science, where we are developing custom metrics to assess similarity in climate models outputs, and using representation learning to improve the reconstruction of full spatiotemporal fields from sparse and biased ocean data. I will conclude with some lessons learned in applying ML/AI across disciplines, and some considerations and open questions on how AI is changing the way we do science.

Bio: Dr. Acquaviva is a Professor of Physics in the City University of New York. She received her Masters degree in Theoretical Physics from the University of Pisa and her PhD in Astrophysics from the International School for Advanced Studies in Trieste, and held postdoctoral positions at Princeton University and Rutgers University before joining the faculty at CUNY. After many years of research in Astrophysics with statistical tools, machine learning, and AI, she pivoted to Climate Data Science thanks to a PIVOT fellowship, followed by a PIVOT Research Award, by the Simons Foundation. Her current research is centered on developing new metrics to assess the performance of global climate models and on reconstructing full spatio-temporal fields, in particular ocean carbon, from limited and biased data. She is also working with early career scientists to reflect on how generative AI tools can be responsibly incorporated in the scientific workflow and on developing community tools around that topic. Her textbook “Machine Learning for Physics and Astronomy”, published in 2023 by Princeton University Press, won the 2024 Chambliss Astronomical Writing award from the American Astronomical Society.

Fall 2025

October 24, 2025

trevorh2_profile.jpg, University of Connecticut

[] [Harris Talk Slides]

Abstract: Neural operator models are a recent innovation in operator modeling that mimic the structure of deep neural networks. They are increasingly used in spatiotemporal forecasting, inverse problems, data assimilation, and PDE-based surrogate modeling, yet they lack an intrinsic uncertainty mechanism. We introduce Local Sliced Conformal Inference (LSCI), a distribution-free framework for generating function-valued, locally adaptive prediction sets for operator models. We prove finite-sample validity and derive a data-dependent upper bound on the coverage gap under local exchangeability. On a variety of synthetic Gaussian process tasks and real applications (air quality monitoring, energy demand forecasting, and weather prediction), LSCI yields tighter sets with stronger adaptivity than conformal baselines. We also demonstrate empirical robustness against biased predictors and several out-of-distribution noise regimes.

Bio: Trevor Harris received his PhD in Statistics from the University of Illinois Urbana Champaign (2021) and joined Texas A&M University as an assistant professor before moving to the University of Connecticut in 2024. His research is heavily motivated by statistical issues in climate science including climate model validation and assimilation, prediction under distribution shift, robust uncertainty quantification, and out-of-distribution detection. Recent interests include neural operator models and functional data analysis, conformal inference for spatiotemporal data, and sample-efficient generative models.

November 7, 2025

nat-klein.jpg, Los Alamos National Laboratory (LANL)

[Klein Talk Slides] []

Abstract: NASA’s Curiosity and Perseverance rovers have collected rich spectroscopic data from the Martian surface using instruments such as ChemCam and SuperCam. These multimodal datasets (spanning LIBS, infrared, and Raman measurements) pose unique challenges for calibration, interpretation, and data integration across vastly different environments. This talk will highlight statistical and machine learning methods developed to meet these challenges, including Bayesian neural networks for uncertainty-aware prediction, optimal transport for aligning Earth and Mars data, multimodal fusion with interpretability metrics, and density-ratio weighting for combining heterogeneous observations. I’ll also discuss generative models for LIBS spectra and ongoing work using fast simulators for model pretraining. Together, these advances illustrate how planetary science data drive new ideas in uncertainty quantification, domain adaptation, and the fusion of physical and statistical modeling.

Bio: Dr. Natalie Klein is the AI and Advanced Predictive Modeling Team Lead in the Statistics Group at Los Alamos National Laboratory, where she has been a staff member since 2019. Her research focuses on integrating statistical methodology with machine learning to address challenges in scientific domains such as remote sensing and planetary exploration. She holds a joint Ph.D. in Statistics and Machine Learning from 麻豆村.

December 5, 2025

jonathan-lilly.jpg, Planetary Science Institute

[Lilly Talk Slides] 


Title: Local polynomial fitting on the sphere, a mapping solution for the Earth sciences

Abstract: The problem of mapping scattered data is considered from the perspective of the earth sciences. A particularly promising method is local polynomial fitting, which involves fitting not only a field of interest, but also its derivatives up to some specified order, in the vicinity of each grid point. Among other desirable properties, this method has the virtues of simplicity and ease of application. Local polynomial fitting is adapted for use on the sphere by recasting it in terms of the coordinates of a local tangent plane. Three algorithmic choices lead to substantially improved maps. Firstly, the use of a variable bandwidth, in which the smoothing radius is not constant but varies to incorporate a fixed number of data points, performs well with irregularly spaced data. Secondly, first- and second-order fits are shown to offer considerably improved performance compared with a zeroth-order fit due to a property known as design adaptivity. Third, a generalized kernel is introduced that subsumes existing forms in the literature and which allows a wider degree of possibilities.  With these considerations, the problem of mapping sea surface height from alongtrack satellite measurements—an important data analysis problem in oceanography—is considered. Applying the method to a numerical model, for which errors can be assessed directly, a sweep through parameter space is conducted to identify optimal parameters. The results are compared with the community standard product, and sources of remaining error are discussed.

Figure: A continuous map of sea surface height generated by applying local polynomial fitting to satellite altimetric measurements taken along the black lines, with one satellite track shown in bold.  

Figure: A continuous map of sea surface height generated by applying local polynomial fitting to satellite altimetric measurements taken along the black lines, with one satellite track shown in bold.

Bio: Jonathan M. Lilly was born in Lansing, Michigan, in 1972. He received the B.S. degree in atmospheric and oceanic physics from Yale University, New Haven, Connecticut in 1994, and the M.S. and Ph.D. degrees in physical oceanography from the University of Washington in Seattle, Washington, in 1997 and 2002, respectively. From 2003 to 2005 he was a postdoctoral researcher in the Laboratoire d'Océanographie Dynamique et de Climatologie, Université Pierre et Marie Curie, Paris, France. Since 2005 he has worked as a research scientist or senior research scientist at various institutes, including Earth and Space Research, NorthWest Research Associates, and Theiss Research. Since July 2021 he has been a Senior Scientist at the Planetary Science Institute in Tucson, Arizona, and since June 2024 he has been a visiting scientist in the Department of Physics at the University of Toronto in Toronto, Canada.  Since September 2025 he is also serving half-time as Lead Oceanographer for Oceanbox AS in Tromsø, Norway.  His research interests include oceanic vortex structures, statistical methods for data analysis, Lagrangian observations, and high-latitude oceanography.

Spring 2025

February 21, 2025

stefano-castruccio.pngStefano Castruccio - University of Notre Dame

[] []

Location: Zoom 
Title: New Perspectives on Balancing Physics with Data-Driven Models: the Case for Physics Informed Neural Networks in Environmental Statistics


Abstract: The idea of performing data analysis by leveraging physical information with a data-driven model has a long history in environmental Statistics. Physical-Statistical models are predicated on the idea that hierarchical Bayesian models could have a spatio-temporal process informed by a partial differential equation (PDE) which expresses some well-known physical information about the system. The machine learning literature has recently focused on the same problem by proposing a different yet related solution: instead of devising a purely data-driven neural network, inference can be penalized by means of PDE expressing the physics of the system. This approach allows for “soft” constraints on the model instead of a “hard” specification of the process dynamics in physical-statistical models. In this talk, I will discuss two of my recent works on this topic developed by my research group, and discuss the relative merits of this new approach from the perspective of a statistician. The first work will focus on a deep double reservoir model informed by two-dimensional incompressible Navier Stokes, while the second one will discuss the link between a PDE-driven penalty and physics-informed priors. I will also briefly discuss some of my recent work on physics informed convolutional autoencoders and transformers with attention mechanisms. 

Bio: Stefano Castruccio is the Notre Dame Collegiate Associate Professor in Statistics at the University of Notre Dame. He obtained his PhD in 2013 at the University of Chicago, and he was later postdoctoral fellow King Abdullah University of Science and Technology (Saudi Arabia), Lecturer at Newcastle University (UK), before moving to his current institution. He works in spatio-temporal models for complex environmental problems, from air pollution to climate change.

April 25, 2025 

joel-leja.jpgJoel Leja - The Pennsylvania State University

[] []

Location: Steinberg Auditorium (BH A53) + Zoom
Title: Rapid inference of galaxy properties in the age of deep and large-scale surveys of the universe


Abstract: The inference of the physical properties of galaxies at cosmological distance requires modeling a wide range of physics, including e.g. stellar evolution and atmospheres; dust attenuation and re-emission; nebular physics; and AGN emission. Bayesian inference is often used to map the inevitable degeneracies, and the large amount of physics and wide parameter space means these codes are typically not fast (~1-10 hours/object). Yet current and near-future surveys of the universe will yield spectra for millions of galaxies and imaging for billions. I will discuss the tactics employed to speed up these codes, ranging from neural net emulators of key physics (photoionization modeling; stellar spectra) to efficient gradient-enhanced GPU-accelerated high-dimensional sampling to rapid simulation-based inference. These yield speed-ups of somewhere between 100x and 100,000x, with unavoidable trade-offs in flexibility and accuracy. I will discuss applications of these techniques to model modern astronomical data, including both industrial-scale modeling of galaxy observations and newly-possible directions such as spatially resolved galaxy modeling. Finally, time permitting, I will discuss some of the exciting new discoveries made with these techniques in the very distant universe seen by JWST.

Bio: Joel Leja is the Dr. Keiko Miwa Ross Early Career Endowed Faculty Chair and an Assistant Professor of astronomy and astrophysics at Penn State University. His research aims to understand how galaxies form using large ground-and space-based telescopes, large surveys, and fast computers. He specializes in modeling observations of distant galaxies and in data-intensive astrophysical methodologies. Joel was named a Clarivate Highly Cited Researcher in 2023 and in 2024 (top 1% of cited researchers in astrophysics), and awarded Yale University's Brouwer Prize in 2019 for a PhD thesis of unusual merit.

May 16, 2025

brian-nord.jpgBrian Nord - Fermilab

[] []

Location: Zoom 
Title: Simulation-based Inference and the Design and Operation of Science Experiments


Abstract: Simulation-based inference (SBI) is a highly efficient approach for inferring expressive density distributions in many areas of research – economics, climate, population genetics physics, astronomy. Moreover, SBI has potential application in measuring properties of nature in those areas, but also in the design of experiments – including instruments and data acquisition. However, some key challenges remain before SBI will be ready for these tasks – e.g., domain adaptation, trustworthy/credible uncertainty quantification, and high-dimensional parameter space sampling. In this talk, I will discuss some work by my group and the rest of the community in working toward these goals. 

Bio: Brian Nord’s work focuses on how to improve the ways in which we make scientific discoveries --- developing algorithms, building statistical models, and auto-designing experiment.  Brian started his career in large-scale structure cosmology, analyzing galaxy clusters and strong gravitational lenses. More recently, he has been exploring the potential of AI algorithms to address critical challenges in cosmological data analysis. Currently, he is integrating AI with rigorous statistical methods and using this to aid in the design of scientific experiments.

Fall 2024

October 11, 2024 

Gwendolyn Eadie (University of Toronto)

[] []


Title: Studying the Universe with Astrostatistics

Abstract: Astrostatistics is a growing interdisciplinary field at the interface of astronomy and statistics. Astronomy is a field rich with publicly available data, but inference using these data must acknowledge selection effects, measurement uncertainty, censoring, and missingness. In the Astrostatistics Research Team (ART) at the University of Toronto --- a joint team between the David A. Dunlap Department of Astronomy & Astrophysics and the Department of Statistical Sciences --- we take an interdisciplinary approach to analysing astronomical data from a range of objects such as stars, old clusters, and galaxies. In this talk, I will cover three ART projects that employ Bayesian inference techniques to: (1) find stellar flares in time series data from stars using hidden Markov models, (2) investigate the relationship between old star cluster populations and their host galaxies using hurdle models, and (3) discover potential "dark" galaxies within an inhomogeneous Poisson Process framework.

Bio: Gwendolyn Eadie is an Assistant Professor of Astrostatistics at the University of Toronto, jointly appointed between the Department of Statistical Sciences and the David A. Dunlap Department of Astronomy & Astrophysics. She is the founder and co-leader of UofT's Astrostatistics Research Team, and works on a range of projects that use hierarchical Bayesian inference to study galaxies, globular star clusters, stars, and fast radio bursts. She is also the current Chair of the Astrostatistics Interest Group of the American Statistical Association and the Chair of the Working Group on Astroinformatics & Astrostatistics of the American Astronomical Society.

November 15, 2024 

brendan-byrne.jpgSTAMPS Webinar:
 (Qube Technologies)

[] []

Title: Quantifying greenhouse gas emissions through atmospheric inversion systems

Location: Zoom

Abstract: Quantifying greenhouse gas sources and sinks from the atmosphere is essential for assessing the success of emission reduction efforts. Observations of atmospheric greenhouse gases have become major tools for tracking these fluxes. Because atmospheric transport links emissions to downwind concentration changes, this presents an inverse problem. In this presentation, I’ll introduce the general methodology for quantifying emissions, highlighting key successes and ongoing challenges. I’ll also provide an overview of diverse applications, from tracking emissions at individual facilities to monitoring trends on a global scale.

Bio: Brendan Byrne is a Senior Data Scientist at Qube Technologies, where he works on continuous monitoring of methane emissions. Previously, he was a Scientist at the NASA Jet Propulsion Laboratory (JPL), where he led research on the global carbon cycle and its implications for climate change. He is an expert on flux inversion analyses, where surface-atmosphere trace gas fluxes are estimated from atmospheric concentration data and holds a Ph.D. from the University of Toronto and both M.Sc. and B.Sc. degrees from the University of Victoria.

December 6, 2024 


 (Rutgers University)

[] []

Location: Wean 5409 + Zoom at 1:15 p.m.
Title: Searching for the Unexpected from Colliders to Stars with Modern Machine Learning


Abstract: Modern machine learning and generative AI are having an exciting impact on fundamental physics, allowing us to see deeper into the data and enabling new kinds of analyses that were not possible before. I will describe how we are using generative AI to develop powerful new model-agnostic methods for new physics searches at the Large Hadron Collider, and how these methods can also be applied to data from the Gaia Space Telescope to search for stellar streams. I will also describe how these same generative AI techniques can be used to perform a novel measurement of the local dark matter density using stars from Gaia as tracers of the Galactic potential.

Bio: David Shih is a Professor in the New High Energy Theory Center and the Department of Physics & Astronomy at Rutgers University. His current research focuses on developing new machine learning methods to tackle the major open questions in fundamental physics -- such as the nature of dark matter and new particles and forces beyond the Standard Model -- using big datasets from particle colliders and astronomy. His work has touched on many key topics at the intersection of ML and fundamental physics, including generative models, anomaly detection, AI fairness, feature selection, and interpretability. Shih is the recipient of an DOE Early Career Award, a Sloan Foundation Fellowship, the Macronix Prize, and the Humboldt Bessel Research Award.

Spring 2024

November 15, 2024

Ashley Villar (Harvard)

headshot

 

 

 

 

Title: Time-domain Astrophysics in the Era of Big Data

[] []

Abstract: The eruptions, collisions and explosions of stars drive the universe’s chemical and dynamical evolution. The upcoming Legacy Survey of Space and Time will drastically increase the discovery rate of these transient phenomena, bringing time-domain astrophysics into the realm of “big data.” With this transition comes the important question: how do we classify transient events and separate the interesting “needles” from the “haystack” of objects? In this talk, I will discuss efforts to discover and classify unexpected phenomena using semi-supervised machine learning techniques. I will highlight the interplay between data-informed physics and physics-informed machine learning required to best understand the future LSST dataset of extragalactic transients.



Bio: Ashley Villar is an assistant professor of Astronomy at Harvard University. Her research focuses on data-driven analysis of optical transients, including core-collapse supernovae and kilonovae. She is particularly interested in representation learning for sparse, multivariate light curves. Ashley is the co-Chair of the LSST Informatics and Statistics Science.

March 22, 2024

Matthias Katzfuss (Qube Technologies)

Matthias Katzfuss headshot

 

 

 

 

[] []

Title: Non-Gaussian Emulation of Climate Models via Scalable Bayesian Transport Maps

Location: Zoom

Abstract: A multivariate distribution can be described by a triangular transport map from the target distribution to a simple reference distribution. We propose Bayesian nonparametric inference on the transport map by modeling its components using Gaussian processes. This enables regularization and accounting for uncertainty in the map estimation, while resulting in a closed-form invertible posterior map. We then focus on inferring the distribution of a spatial field from a small number of replicates. We develop specific transport-map priors that are highly flexible but shrink toward a Gaussian field with Matern-type covariance. The approach is scalable to high-dimensional fields due to data-dependent sparsity and parallel computations. We present numerical results to demonstrate the accuracy, scalability, and usefulness of our generative methods, including emulation of non-Gaussian climate-model output.

Bio: Matthias Katzfuss is a Professor in the Department of Statistics at University of Wisconsin–Madison. His research interests include computational spatial and spatio-temporal statistics, Gaussian processes, uncertainty quantification, and data assimilation, with applications to environmental and satellite remote-sensing data. His research has been funded by NSF, NASA, NOAA, USDA, Sandia National Laboratory, Jet Propulsion Laboratory, and Texas A&M Institute of Data Science. Matthias is the recipient of an NSF Career Award, a Fulbright Scholarship, and an Early Investigator Award from the American Statistical Association’s Section on Statistics and the Environment.

April 5, 2024

Wouter Verkerke (University of Amsterdam/Nikhef)

Wouter Verkerke headshot

 

 

 

 

[] []

Title: Uncertainty modeling in particle physics

 

Abstract: Wouter will present a pedagogical introduction to uncertainty modeling in particle physics. He will mostly focus on the methods used at the Large Hadron Collider experiments, where systematic effects are explicitly parameterized in the likelihood function in terms of nuisance parameters. Accurate modeling of systematic effects is of increasing importance at the LHC as the abundant data has decreased statistical uncertainties in many measurements to be on par with systematic uncertainties. He will discuss the reasoning behind the modeling approaches commonly chosen, common challenges in the parametric modeling and in the interpretation of the corresponding uncertainties. He will conclude with the special considerations in the modeling of theoretical uncertainties, which are often incompletely defined.

Bio: Wouter Verkerke is a professor in the physics department of the University of Amsterdam and is head of the ATLAS group at Nikhef, the Dutch Institute for particle physics. His research interests include Higgs boson physics and top quark physics at the Large Hadron Collider. Wouter is member of the ATLAS experiment at CERN since 20 years and has coordinated various of its analysis groups including the top quark physics group, the Higgs combination modelling group and the statistics forum. He is the author of one of the most popular statistical modelling tools in particles physics, RooFit, part of the ROOT data analysis package since 2003.

April 19, 2024

Mark Risser (LBNL)

Mark Risser headshot

 

 

 

 

[] []

Title: Detecting multiple anthropogenic forcing agents for attribution of regional precipitation change

Abstract: Daily rainfall accumulations are a critical component of the global water cycle, and comprehensive understanding of human-induced changes in rainfall is essential for water resource management and infrastructure design. Detection and attribution methods reveal cause and effect relationships between anthropogenic forcings and changes in daily precipitation by comparing observed changes with those from climate models. However, at sub-continental scales, existing studies are rarely able to conclusively identify human influence on precipitation. In this work, we show that anthropogenic aerosol and greenhouse gas emissions are the primary drivers of precipitation change over the United States and, by simultaneously accounting for both agents, we explicitly decompose the uncertain regional human influence into the individual effects of these agents. Greenhouse gas (GHG) emissions increase mean and extreme precipitation from rain gauge measurements across all seasons, while the decadal-scale effect of global aerosol emissions decreases precipitation. Local aerosol emissions further offset GHG increases in the winter and spring but enhance rainfall during the summer and fall. Our results show that conflicting literature on trends in precipitation over the historical record can be explained by offsetting aerosol and greenhouse gas signals.

Bio: Mark is a Research Scientist in the Climate and Ecosystem Sciences Division at Lawrence Berkeley National Laboratory. He received his Ph.D. in Statistics from the Ohio State University in 2015 (thesis advisor: Catherine Calder). Mark’s primary goal as a statistician is to use data science, Bayesian modeling, and computational tools to identify and quantify climate change. His research focuses on statistical climatology, extreme value analysis, Gaussian processes, and Bayesian modeling.

Fall 2023

October 13, 2023

Laurence Perreault-Levasseur (Université de Montréal / Mila)

levasseur_laurence-min.jpeg

 

 

 

 

 

 

Title: Data-Driven Strong Gravitational Lensing Analysis in the Era of Large Sky Surveys

[] []

Abstract: Despite the remarkable success of the standard model of cosmology, the lambda CDM model, at predicting the observed structure of the universe over many scales, very little is known about the fundamental nature of its principal constituents: dark matter and dark energy. In the coming years, new surveys and telescopes will provide an opportunity to probe these unknown components. Strong gravitational lensing is emerging as one of the most promising probes of the nature of dark matter, as it can, in principle, measure its clustering properties on sub-galactic scales. The unprecedented volumes of data that will be produced by upcoming surveys like LSST, however, will render traditional analysis methods entirely impractical. In recent years, machine learning has been transforming many aspects of the computational methods we use in astrophysics and cosmology. I will share our recent work in developing machine learning tools for the analysis of strongly lensed systems.



Bio: Laurence Perreault-Levasseur is the Canada Research Chair in Computational Cosmology and in Artificial Intelligence. She is an assistant professor at the University of Montréal and an Associate Member of Mila, where she conducts research in the development and application of machine learning methods to cosmology. She is also a Visiting Scholar at the Flatiron Institute in New York City. Prior to that, she was a Flatiron research fellow at the Center for Computational Astrophysics in the Flatiron Institute and a KIPAC postdoctoral fellow at Stanford University. Laurence completed her PhD degree at the University of Cambridge, where she worked on applications of open effective field theory methods to the formalism of inflation.

October 27, 2023

Michael Wehner (Lawrence Berkeley National Laboratory)

michael_wehner-min.jpeg

 

 

 

 

 

 

[] []

Title: Extreme Weather Impact Attribution, Environmental Justice and Loss & Damages

Abstract: Reporting the effect of global warming on certain classes of individual extreme weather events has become relatively routine. Research is now turning to quantifying the resulting effects on the impacts of these extreme weather events. I will use Hurricane Harvey and its exceptional flood of the greater Houston area in 2017 to demonstrate an “end to end” attribution. The human induced warming of the Gulf of Mexico increased the storm’s precipitation causing an increase in the flooded area that in turn caused an increase in the number of homes flooded. A disproportionate number of these flooded homes were in low income Hispanic neighborhoods. This example will then be used to motivate how climate scientists can inform the recently approved UNFCCC Loss and Damage fund established to aid nations that are “particularly vulnerable” to the impacts of climate change.

Bio: Michael F. Wehner is a senior staff scientist in the Applied Mathematics and Computational Research Division at the Lawrence Berkeley National Laboratory. Dr. Wehner’s current research concerns the behavior of extreme weather events in a changing climate, especially heat waves, intense precipitation, drought and tropical cyclones. Before joining the Berkeley Lab in 2002, Wehner was an analyst at the Lawrence Livermore National Laboratory in the Program for Climate Modeling Diagnosis and Intercomparison.

He is the author or co-author of over 230 scientific papers and reports. He was a lead author for both the 2013 Fifth and 2021 Sixth Assessment Report of the Intergovernmental Panel on Climate Change and the 2nd, 3rd and 4th and upcoming 5th US National Assessments on climate change. Dr. Wehner earned his master’s degree and Ph.D. in nuclear engineering from the University of Wisconsin-Madison, and his bachelor’s degree in Physics from the University of Delaware.

November 10, 2023

Michael Kagan (SLAC National Accelerator Laboratory) 

michael_kagan-min.png

 

 

 

 

 

 

[] []

Title: Using gradients to get more out of High Energy Physics

 

Abstract: High Energy Physics experiments, like those at the Large Hadron Collider at CERN, have developed intricate data analysis pipelines to search for rare hints of new particles and forces. With the goal of maximizing our sensitivity to signs of new physics, how can we optimize our data analysis pipelines, which rely on a mixture of physics driven computations and data-driven ML models, and optimize future experiments to get the most of out the data? This talk will discuss progress towards building differentiable data analysis and simulation components that are amenable to gradient-based optimization and challenges that arise in gradient estimation in these settings.

Bio: Michael Kagan is a Lead Staff Scientist at SLAC National Accelerator Laboratory. He received his Ph.D. in physics from Harvard University, and his B.S. in Physics and Mathematics at the University of Michigan. After his postdoctoral work at SLAC National Laboratory, and Michael was a Panofsky Fellow at SLAC from 2016-2021. Michael’s work focuses on the study of the Higgs Boson and the search for new physics at the ATLAS experiment at the LHC, and on the development of Machine Learning for fundamental physics.

December 8, 2023

Haruko Wainwright (MIT)

mit_haruko_wainwright-min.jpeg

 

 

 

 

 

 

[] []

Title: Physics-infused Environmental Monitoring for Soil and Groundwater Contamination

Abstract: Environmental monitoring – traditionally relied on collecting point samples – are undergoing transformational changes with new technologies such as remote sensing, in situ sensors and various imaging techniques at different scales. At the same time, environmental simulation capabilities are advancing rapidly, predicting environmental flow and contaminant transport in complex systems and quantifying its uncertainty. However, there are still significant challenges to integrate these multi-type multiscale datasets with model simulations. In particular, these datasets are often indirectly correlated with the variables of interest, and have different scales and accuracies. Simulation results are often not perfect due to natural heterogeneities or fine-scale processes not captured in conceptual models.

The Advanced Long-term Environmental Monitoring Systems (ALTEMIS) project aims to establish the new paradigm of long-term monitoring of soil and groundwater contamination by integrating these new technologies through machine learning (ML). This talk highlights the two new developments, involving groundwater flow and contaminant transport simulations. First, I will talk about an emulator based on the Fourier neural operator, considering the uncertainty in subsurface parameters and climate forcing. This emulator aims to enable the off-line assessment of future climate change impacts on residual contaminants. Second, I will introduce a Bayesian hierarchical approach coupled with Gaussian process models to integrate in situ sensor data, groundwater sampling data and ensemble simulations. It enables us to infuse physics –such as flow direction and contaminant mobility – into the spatiotemporal characterization of contaminant plumes. Lastly, I will discuss the pathway to actual deployment with the understanding of environmental regulations and site needs, as well as the citizen science efforts to improve the environmental literacy in impacted regions and beyond.

Bio: Haruko Wainwright is the Mitsui Career Development Professor in Contemporary Technology; Assistant Professor in the Department of Nuclear Science and Engineering, and the Department of Civil and Environmental Engineering at Massachusetts Institute of Technology. She received her MS in nuclear engineering in 2006, MA in statistics in 2010 and PhD in nuclear engineering in 2010 from University of California, Berkeley. Before joining MIT, she was a Staff Scientist in the Earth and Environmental Sciences Area at Lawrence Berkeley National Laboratory.

Her research focuses on environmental modeling and monitoring technologies with a particular focus on nuclear waste, and nuclear contamination.​

Spring 2023

January 27, 2023

Bobby Gramacy (Department of Statistics at Virginia Tech)

bobby_gramacy-min.jpeg

 

 

 

 

 

 

Title: Deep Gaussian Process Surrogates for Computer Experiments

[] []

Abstract: Deep Gaussian processes (DGPs) upgrade ordinary GPs through functional composition, in which intermediate GP layers warp the original inputs, providing flexibility to model non-stationary dynamics. Recent applications in machine learning favor approximate, optimization-based inference for fast predictions, but applications to computer surrogate modeling – with an eye towards downstream tasks like calibration, Bayesian optimization, and input sensitivity analysis – demand broader uncertainty quantification (UQ). We prioritize UQ through full posterior integration in a Bayesian scheme, hinging on elliptical slice sampling the latent layers. We demonstrate how our DGP’s non-stationary flexibility, combined with appropriate UQ, allows for active learning: a virtuous cycle of data acquisition and model updating that departs from traditional space-filling design and yields more accurate surrogates for fixed simulation effort. But not all simulation campaigns can be developed sequentially, and many existing computer experiments are simply too big for full DGP posterior integration because of cubic scaling bottlenecks. For this case we introduce the Vecchia approximation, popular for ordinary GPs in spatial data settings. We show that Vecchia-induced sparsity of Cholesky factors allows for linear computational scaling without compromising DGP accuracy or UQ. We vet both active learning and Vecchia-approximated DGPs on numerous illustrative examples and a real simulation involving drag on satellites in low-Earth orbit. We showcase implementation in the deepgp package for R on CRAN.



Bio: Bobby Gramacy is Professor of Statistics at Virginia Tech and affiliate faculty in VT’s Computational Modeling and Data Analytics program, and a Fellow of the American Statistical Association (ASA). He currently serves as Editor-in-Chief for Technometrics, an ASA journal, and as President for the ASA’s Uncertainty Quantification Interest Group. Recently he completed tours as President of the ASA’s Section on Physical and Engineering Sciences, and as Treasurer for the International Society of Bayesian Analysis.

Prof Gramacy’s research interests include Bayesian modeling methodology, statistical computing, Monte Carlo inference, nonparametric regression, sequential design, and optimization under uncertainty. He recently published a fully reproducible (and open source) textbook called “Surrogates: Gaussian process modeling, design and optimization for the applied sciences”.

February 24, 2023

Aneta Siemiginowska (Harvard-Smithsonian Center for Astrophysics)

aneta-siemiginowska-min.jpg

 

 

 

 

 

 

[] []

Title: Statistical Methodology for High-Energy Astronomical Datasets

Abstract: Modern X-ray telescopes and detectors collect high quality multi-dimensional data marking arrival time, location, and energy of each incoming photon. The data are sparse in all dimensions, requiring that they be described as a Poisson process. In the standard analysis, these data are collapsed along two or more dimensions – light curves for time, spectra for energy, images for space – and are analyzed independently. The multi-domain approaches that simultaneously take into account 3D, or the full 4D, information carried by each photon are emerging leading to higher quality results. I will discuss traditional and emerging methods for high resolution X-ray observations and applications to images obtained with the Chandra X-ray Observatory.

Bio: Aneta Siemiginowska is a Senior Astrophysicist at the High Energy Astrophysics Division of the Center for Astrophysics | Harvard & Smithsonian, and a member of the Science Data System team at the Chandra X-ray Center. She is an expert in extragalactic X-ray astronomy specializing in active galaxies, quasars, powerful jets, and has discovered several hundred kiloparsec long relativistic X-ray jets associated with distant quasars. She is a founding member of the International CHASC AstroStatistics Collaboration and works on applications of modern statistical methodology to Poisson data in spectral, timing and imaging domains. She is the current president of the International Astrostatistics Association (2021-2023 term).

March 31, 2023

Pietro Vischia (University of Oviedo and ICTEA)

pietro_vischia-min.jpg

 

 

 

 

 

 

[] []

Title: Optimizing experiment design with machine learning

 

Abstract: In physics and other disciplines, future experimental setups will be so complex that it will be unfeasible for humans to find an optimal set of design parameters. We parameterize the full design of an experiment in a differentiable way and introduce a definition of optimality based on a loss function that encodes the end goals of the experiment. Crucially, we also account for construction constraints, as well as budget, resulting in a constrained optimization problem that we solve using gradient descent.

In this seminar, I will describe the goals and activities of the MODE Collaboration, focussing on our ongoing work on the optimization of a muon tomography experiment.

Bio: Pietro Vischia is a Ramón y Cajal Senior Researcher at Universidad de Oviedo and ICTEA, where he leads a project aiming at solving constrained optimization problems in high-dimensional parameter spaces using realistic neuron models. He graduated from University of Padova and received his Ph.D. in Physics from Instituto Superior Técnico (Portugal): after graduating, he was postdoctoral researcher at Universidad de Oviedo (Spain) and Chargé de Recherche at Université catholique de Louvain and FNRS (Belgium).

Vischia focuses on the extension of machine learning methods to realistic neurons with spiking networks, and their implementation in neuromorphic hardware devices and quantum systems. He is a member of the CMS Collaboration at CERN.

April 21, 2023

Jonathan Hobbs (Jet Propulsion Laboratory)

jonathan-hobbs-min.jpg

 

 

 

 

 

 

[] []

Title: Simulation-Based Uncertainty Quantification for Infrared Sounder Atmospheric Retrievals

Abstract: Multiple decades of remote-sensing data have provided indirect observations of numerous atmospheric and surface geophysical quantities of interest with comprehensive spatial coverage. For example, multiple hyperspectral infrared sounder instruments, including the Atmospheric Infrared Sounder (AIRS) and Cross-track Infrared Sounder (CrIS), produce estimates of atmospheric temperature, humidity, and cloud properties that inform weather and climate investigations. These geophysical states are inferred from the satellite observations, or spectra, through an inverse method known as a retrieval. This presentation will provide an overview of the remote sensing observing system and will highlight key sources of uncertainty for the infrared sounder processing pipeline. The pipeline motivates a simulation-based framework for uncertainty quantification (UQ) for the AIRS retrieval. The framework will be demonstrated for near-surface temperature estimates over the continental United States.

Bio: Dr. Jonathan Hobbs is a data scientist at the Jet Propulsion Laboratory, California Institute of Technology. As a member of the Uncertainty Quantification (UQ) and Statistical Analysis group, his research has included developing UQ methodology for atmospheric remote sensing retrievals, including simulation-based approaches for the Orbiting Carbon Observatory-2 and 3 and Atmospheric Infrared Sounder. In addition, he has substantial experience in the development of spatio-temporal statistical methods for geoscience applications, including hydrology, carbon cycle science, weather, and climate. Prior to joining JPL, he received a co-major Ph.D. in statistics and meteorology from Iowa State University in 2014.

Fall 2022

September 9, 2022

Lukas Heinrich (Technical University Munich)

lukas_heinrich-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Systematic Uncertainties in Frequentist Analysis at the LHC

Abstract: The precise measurement of the basic building blocks of matter - the particles of the Standard Model of Particle Physics - as well as the search for potential new particles beyond the Standard Model presents a formidable statistical challenge. The complexity and volume of the data collected at the Large Hadron Collider requires careful modeling not only of the primary sought-after phenomenon but a precise assessment of uncertainties related to the modeling of possible backgrounds. This work is complicated by the facts that particle physics at its core does not admit a closed-form likelihood model and requires likelihood-free approaches and the highly distributed nature of large-scale collaboration calling for collaborative statistical modeling tools. In this talk I will give a broad overview of how statistical modeling is performed currently at the LHC and discuss ideas that exploit recent advances in Machine Learning to go beyond the existing methodology.



Bio: Lukas Heinrich is a particle physicist and professor for data science in physics at the Technical University of Munich. He is a long-time member of the ATLAS Collaboration at the Large Hadron Collider a CERN, where he is searching for phenomena beyond the Standard Model of Particle Physics and is engaged in computational, statistical and machine-learning methods research. He is one of the main developers of the statistics tool pyhf, which paved the way for a public release of the complex statistical models that underpin the data analyses at the LHC.

October 14, 2022

Benjamin Nachman (Lawrence Berkeley National Laboratory)

nachman-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Building Robust Deep Learning Methods for High Energy Physics

Abstract: Deep Learning is becoming a widely used tool in High Energy Physics to enhance the search for new elementary particles and forces of nature. However, great care is required to design new methods that are robust in order to perform reliable statistical tests with our data (e.g. preventing false claims of discovery!). In this talk, he will provide examples from high energy physics related to uncertainty/inference-aware deep learning and draw a connection to algorithmic fairness and related topics in the statistics and machine learning literature.

Bio: Ben Nachman is a Staff Scientist in the Physics Division at LBNL where he is the group leader of the cross-cutting Machine Learning for Fundamental Physics group. He was a Churchill Scholar at Cambridge University and then received his Ph.D. in Physics and Ph.D. minor in Statistics from Stanford University. After graduating, he was a Chamberlain Fellow in the Physics Division at Berkeley Lab. Nachman develops, adapts, and deploys machine learning algorithms to enhance data analysis in high energy physics. He is a member of the ATLAS Collaboration at CERN.

October 27, 2022

STAMPS-NSF AI Planning Institute Joint Seminar - Kaze Wong (Flatiron Institute)

kaze_wong-250x300-min.jpg

 

 

 

 

 

 

[]

Title: Challenges and Opportunities from gravitational waves: data scientists on diet

 

Abstract: The gravitational wave (GW) community has numerous exciting discoveries in the past 7 years, from the first detection to a catalog of ~80 GW events, containing all sorts of surprises such as binary neutron stars and neutron star-black hole mergers. In the coming decade, there will be next generation facilities such as the third generation GW detectors network and space-based GW observatory, that will provide many more surprising events. There are quite a number of open modelling and data analysis problems in GW that await to be solved in order to unlock the full potential of next generation detections. Despite the recent rapid development of machine learning and efforts trying to solve these problems in GW, it seems GW has a number of traits which make applying machine learning to GW difficult. In this talk, I will discuss a number of challenges and opportunities in GW, and some insights from GW on how we should apply modern techniques such as machine learning to physical science in general.

Bio: Kaze Wong is a research fellow studying black holes through gravitational waves at the Flatiron Institute. He graduated from the physics and astronomy department of Johns Hopkins University in 2021, and he is the recipient of the 2021 GWIC-Braccini Prize. Kaze's research centers around the intersection between physical science and deep learning. He is particularly interested in building production-grade hybrid methods to take on challenges in physical science.

December 9, 2022

STAMPS-ISSI Joint Seminar - Rebecca Willett (University of Chicago)

rebecca-willett-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Machine Learning for Inverse Problems in Climate Science

Abstract: Machine learning has the potential to transform climate research. This fundamental change cannot be realized through the straightforward application of existing off-the-shelf machine learning tools alone. Rather, we need novel methods for incorporating physical models and constraints into learning systems. In this talk, I will discuss inverse problems central to climate science — data assimilation and simulator model fitting — and how machine learning yields methods with high predictive skill and computational efficiency. First, I will describe a machine learning framework for learning dynamical systems in data assimilation. Our auto-differentiable ensemble Kalman filters blend ensemble Kalman filters for state recovery with machine learning tools for learning the dynamics. In doing so, our methods leverage the ability of ensemble Kalman filters to scale to high-dimensional states and the power of automatic differentiation to train high-dimensional surrogate models for the dynamics. Second, I will describe learning emulators of high-dimensional climate forecasting models targeting parameter estimation with uncertainty estimation. We assume access to a computationally complex climate simulator that inputs a candidate parameter and outputs a corresponding multichannel time series. Our task is to accurately estimate a range of likely values of the underlying parameters that best fit data. Our framework learns feature embeddings of observed dynamics jointly with an emulator that can replace high-cost simulators for parameter estimation. These methods build upon insights from inverse problems, data assimilation, stochastic filtering, and optimization, highlighting how theory can inform the design of machine learning systems in the natural sciences.

Bio: Rebecca Willett is Professor of Statistics and Computer Science & Director of AI at the Data Science Institute at the University of Chicago, with a courtesy appointment at the Toyota Technological Institute at Chicago. She is also the faculty lead of AI+Science Postdoctoral Fellow program. Her work in machine learning and signal processing reflects broad and interdisciplinary expertise and perspectives. She is known internationally for her contributions to the mathematical foundations of machine learning, large-scale data science, and computational imaging.

In particular, Prof. Willett studies methods to learn and leverage hidden structure in large-scale datasets; representing data in terms of these structures allows ML methods to produce more accurate predictions when data contain missing entries, are subject to constrained sensing or communication resources, correspond to rare events, or reflect indirect measurements of complex physical phenomena. These challenges are pervasive in science and technology data, and Prof. Willett’s work in this space has had important implications in national security, medical imaging, materials science, astronomy, climate science, and several other fields. She has published nearly two hundred book chapters and scientific articles published in top-tier journals and conferences at the intersection of machine learning, signal processing, statistics, mathematics, and optimization. Her group has made contributions both in the mathematical foundations of signal processing and machine learning and in their application to a variety of real-world problems.

For her full bio, please see her .

Spring/Summer 2022

January 21, 2022

Amanda Lenzi (Argonne National Laboratory)

lenzi_amanda-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Can Neural Networks be used for Parameter Estimation?

Abstract: Neural networks have proved successful in various applications in approximating nonlinear maps based on training datasets. Can they also be used to estimate parameters in statistical models when the standard likelihood estimation or Bayesian methods are not (computationally) feasible? In this talk, I will discuss this topic towards the aim of estimating parameters from a model for multivariate extremes, where inference is exceptionally challenging, but simulation from the model is easy and fast. I will demonstrate that in this example, neural networks can provide a competitive alternative to current approaches, with considerable improvements in accuracy and computational time. A key ingredient for this result is to actively use our statistical knowledge about parameters and data to make the problem more palatable for the neural network.



Bio: Amanda Lenzi is Postdoctoral Appointee at Argonne National Laboratory. She was a Postdoctoral Fellow at King Abdullah School of Science and Technology (KAUST) before coming to Argonne. She obtained her PhD degree in Statistics from the Technical University of Denmark in 2017 and her BS and MS degrees at the University of Campinas, São Paulo, Brazil. Her main research interests concern statistical modeling, prediction, simulation, and uncertainty quantification of spatiotemporal data from applications relating to energy as well as environmental science. She is also interested in computational methods for large datasets and the use of machine learning to improve the modeling of these complex spatiotemporal processes.

February 18, 2022

Nicholas Wardle (Department of Physics, Imperial College London)

nwardle-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: The Discrete Profiling Method: Handling Uncertainties in Background Shapes

Abstract: Model selection is a huge topic in statistics and often in HEP experiments, we don’t know the exact model appropriate for a particular process. Typically HEP experiments will rely on using data to directly constrain or choose which (parametric) models are best suited to extract the underlying physics, however this choice naturally represents a systematic uncertainty in the analysis of the data. While there are several methods to incorporate these uncertainties related to choices of continuous parameter values, the uncertainty associated to the choice of discrete model is less clear. In this presentation, Nicholas will describe a method developed in the context of the search for the Higgs boson at CMS that aims to incorporate the uncertainty related to model selection into statistical analysis of data “the discrete profiling method”. Nicholas will discuss various studies on the bias and coverage properties of the method and open extensions where further work is needed.

Bio: Nicholas did his Ph.D. at Imperial College where he started working in early W/Z cross-section measurements with electrons at CMS, and then moved onto searching for the Higgs boson in the diphoton decay channel, and the discovery in that channel formed his thesis in 2013. After that he held a fellowship at CERN where he spent most of his time on searches for dark matter and H->invisible decays. He moved back to London in 2017 as an STFC fellow at Imperial College and now as a lecturer where he mainly focuses on Higgs combinations and interpretations of precision Higgs boson measurements in the search for physics beyond the SM, and teaches postgraduate courses on statistics and machine learning for physicists.

March 18, 2022

Derek Bingham (Department of Statistics and Actuarial Science, Simon Fraser University)

dbingham-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Computer Model Emulation and Uncertainty Quantification Using a Deep Gaussian Process

 

Abstract: Computer models are often used to explore physical systems. Increasingly, there are cases where the model is fast, the code is not readily accessible to scientists, but a large suite of model evaluations is available. In these cases, an “emulator” is used to stand in for the computer model. This work was motivated by a simulator for the chirp mass of binary black hole mergers where no output is observed for large portions of the input space and more than 10^6 simulator evaluations are available. This poses two problems: (i) the need to address the discontinuity when observing no chirp mass; and (ii) performing statistical inference with a large number of simulator evaluations. The traditional approach for emulation is to use a stationary Gaussian process (GP) because it provides a foundation for uncertainty quantification for deterministic systems. We explore the impact of the choices when setting up the deep GP on posterior inference and apply the proposed approach to the real application.

Bio: Derek is a Professor of Statistics and Actuarial Science at Simon Fraser University. He completed his PhD in Statistics in 1999 with Randy Sitter at SFU on the design and analysis of fractional factorial split-plot experiments. After graduating, he moved to the Department of Statistics at the University of Michigan as an Assistant Professor. In 2003, he joined the Department of Statistics and Actuarial Science at Simon Fraser as the Canada Research Chair in Industrial Statistics.

The focus of his current research is developing statistical methods for combining physical observations with large-scale computer simulators. This includes new methodology for Bayesian computer model calibration, emulation, uncertainty quantification and experimental design. His work is generally motivated by real-world applications. His recent collaborations have been with scientists at USA national laboratories (Argonne National Lab and Los Alamos National Lab) and also USA Department of Energy sponsored projects (Center for Radiative Shock Hydrodynamics; Center for Exascale Radiation Transport).

April 22, 2022

Jakob Runge (Institute of Data Science, German Aerospace Center)

jakob_runge-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Causal Inference and discovery with perspectives in Earth sciences

Abstract: The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In disciplines dealing with complex dynamical systems, such as the Earth system, replicated real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal inference methods beyond the commonly adopted correlation techniques. Causal inference provides the theory and methods to learn and utilize qualitative knowledge about causal relations that is often available in Earth sciences. In this talk I will present an overview of this exciting and widely applicable framework and illustrate it with some examples from Earth sciences. I will also present recent work on statistically optimal estimators of causal effects.

Bio: Jakob Runge heads the Causal Inference group at the German Aerospace Center’s Institute of Data Science in Jena since 2017 and is guest professor of computer science at TU Berlin since 2021. His group combines innovative data science methods from different fields (graphical models, causal inference, nonlinear dynamics, deep learning) and closely works with experts in the climate sciences and beyond. Jakob studied physics at Humboldt University Berlin and finished his Ph.D. project at the Potsdam Institute for Climate Impact Research in 2014. For his studies he was funded by the German National Foundation (Studienstiftung) and his thesis was awarded the Carl-Ramsauer prize by the Berlin Physical Society.

In 2014 he won a $200.000 Fellowship Award in Studying Complex Systems by the James S. McDonnell Foundation and joined the Grantham Institute, Imperial College London, from 2016 to 2017. In 2020 he won an ERC Starting Grant with his interdisciplinary project CausalEarth.

On  he provides Tigramite, a time series analysis python module for causal inference. For more details, see: 


 

June 16, 2022

ISSI-STAMPS Joint Seminar - Ann Lee (Department of Statistics and Data Science, 麻豆村)

ann_lee-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Likelihood-Free Frequentist Inference: Confidence Sets with Correct Conditional Coverage

Abstract: Many areas of science make extensive use of computer simulators that implicitly encode likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, outside the asymptotic and low-dimensional regimes. Although new machine learning methods, such as normalizing flows, have revolutionized the sample efficiency and capacity of LFI methods, it remains an open question whether they produce confidence sets with correct conditional coverage. In this talk, I will describe our group’s recent and ongoing research on developing scalable and modular procedures for (i) constructing Neyman confidence sets with finite-sample guarantees of nominal coverage, and for (ii) computing diagnostics that estimate conditional coverage over the entire parameter space. We refer to our framework as likelihood-free frequentist inference (LF2I). Any method that defines a test statistic, like the likelihood ratio, can be adapted to LF2I to create valid confidence sets and diagnostics, without costly Monte Carlo samples at fixed parameter settings. In my talk, I will discuss where we stand with LF2I and challenges that still remain. (Part of these efforts are joint with Niccolo Dalmasso, Rafael Izbicki, Luca Masserano, Tommaso Dorigo, Mikael Kuusela, and David Zhao. The original LF2I framework is described in  with a recent version in )

 

Bio: Ann Lee is a a professor in the Department of Statistics & Data Science at 麻豆村 (麻豆村), with a joint appointment in the Machine Learning Department. Dr. Lee's interests are in developing statistical methodology for complex data and problems in the physical and environmental sciences. She co-directs the Statistical Methods for the Physical Sciences (STAMPS) research group at 麻豆村, and is senior personnel in the NSF AI Planning Institute for Data-Driven Discovery in Physics at 麻豆村.

Prior to joining 麻豆村 in 2005, Dr. Lee was the J.W. Gibbs Assistant Professor in the Department of Mathematics at Yale University, and before that she served a year as a visiting research associate at the Department of Applied Mathematics at Brown University. She received her Ph.D. degree in Physics at Brown University, and her BSc/MS degree in Engineering Physics at Chalmers University of Technology in Sweden.


 

Fall 2021

September 10, 2021

Doug Nychka (Department of Applied Mathematics and Statistics, Colorado School of Mines)

douglas_nychka-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Climate models, large spatial datasets, and harnessing deep learning for a statistical computation

Abstract: Numerical simulations of the motion and state of the Earth’s atmosphere and ocean yield large and complex data sets that require statistics for their interpretation. Typically climate and weather variables are in the form of space and time fields and it is useful to describe their dependence using methods from spatial statistics. Throughout these problems is the need for estimating covariance functions over space and time and accounting for the fact that the covariance may not be stationary. This talk focuses on a new computational technique for fitting covariance functions using maximum likelihood. Estimating local covariance functions is a useful way to represent spatial dependence but is computationally intensive because it requires optimizing a local likelihood over many windows of the spatial field. Thus the problem we tackle here is having numerous (tens of thousands) small spatial estimation problems and is in contrast to other research that attempts a single, global estimate for a massive spatial data set. In this work we show how a neural network (aka deep learning) model can be trained to give accurate maximum likelihood estimates based on the spatial field or its empirical variogram. Why train a neural network to reproduce a statistical estimate? The advantage is that the neural network model evaluates very efficiently and gives speedups on the order of a factor of a hundred or more. In this way computations that could take hours are reduced to minutes or tens of seconds and facilitates a more flexible and iterative approach to building spatial statistical models. An example of local covariance modeling is given using the large ensemble experiment created by the National Center for Atmospheric Research.

See: Gerber, Florian, and Douglas Nychka. “Fast covariance parameter estimation of spatial Gaussian process models using neural networks.” Stat 10.1 (2021): e382.



Bio: Douglas Nychka is a statistician and data scientist whose areas of research include the theory, computation and application of curve and surface fitting with a focus on geophysical and environmental applications. Currently he is a Professor in the Department of Applied Mathematics and Statistics at the Colorado School of Mines and Senior Scientist Emeritus at the National Center for Atmospheric Research (NCAR), Boulder, Colorado. Before moving to Mines he directed the Institute for Mathematics Applied to Geosciences at NCAR. His current focus in research has been efficient computation of spatial statistics methods for large data sets and the migration of these methods into easy to use R packages. He is a Fellow of the American Statistical Association and the Institute for Mathematical Statistics.

October 8, 2021

Yang Chen (Department of Statistics, University of Michigan)

yang_chen-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Matrix Completion Methods for the Total Electron Content Video Reconstruction

Abstract: The total electron content (TEC) maps can be used to estimate the signal delay of GPS due to the ionospheric electron content between a receiver and satellite. This delay can result in GPS positioning error. Thus it is important to monitor the TEC maps. The observed TEC maps have big patches of missingness in the ocean and scattered small areas of missingness on the land. In this work, we propose several extensions of existing matrix completion algorithms to achieve TEC map reconstruction, accounting for spatial smoothness and temporal consistency while preserving important structures of the TEC maps. We call the proposed method Video Imputation with SoftImpute, Temporal smoothing and Auxiliary data (VISTA). Numerical simulations that mimic patterns of real data are given. We show that our proposed method achieves better reconstructed TEC maps as compared to existing methods in literature. Our proposed computational algorithm is general and can be readily applied for other problems besides TEC map reconstruction. Brief discussions on ongoing efforts for prediction models for TEC maps will be given if time allows.

Bio: Yang Chen received her Ph.D. (2017) in Statistics from Harvard University and joined the University of Michigan as an Assistant Professor of Statistics and Research Assistant Professor at the Michigan Institute of Data Science (MIDAS). She received her B.A. in Mathematics and Applied Mathematics from the University of Science and Technology of China. Research interests include computational algorithms in statistical inference and applied statistics in the field of biology and astronomy.

November 12, 2021

Glen Cowan (Department of Physics, Royal Holloway, University of London)

glen_cowan-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Errors on Errors: Refining Particle Physics Analyses with the Gamma Variance Model

Abstract: In a statistical analysis in Particle Physics, one faces two distinct challenges: the limited number of particle collisions and imperfections in the model itself, corresponding to “statistical” and “systematic” errors in the result. To combat the modeling uncertainties one includes nuisance parameters, whose best estimates are often treated as a Gaussian distributed with given standard deviations. The appropriate values for these standard deviations are, however, often the subject of heated argument, which is to say that the uncertainties themselves are uncertain.

A type of model is presented where estimates of the systematic variances are modeled as gamma distributed variables. The resulting confidence intervals show interesting and useful properties. For example, when averaging measurements to estimate their mean, the size of the confidence interval increases as a for decreasing goodness-of-fit, and averages have reduced sensitivity to outliers. The basic properties of the model are presented and several examples relevant for Particle Physics are explored.

Bio: Ph.D. in Physics 1988 from University of California, Berkeley, followed by postdoc positions in Munich and Siegen working on electron-positron collisions at LEP (CERN). Research focus on Quantum Chromodynamics (multijet production, measurements of alpha_s, properties of hadronic Z decays). 1998-present, faculty member in Department of Physics, Royal Holloway, University of London. My research in High Energy Physics has involved experiments at the Large Hadron Collider (CERN) on proton-proton collisions, with focus on application and development of statistical methods.

December 3, 2021

Elizabeth Barnes (Department of Atmospheric Science, Colorado State University)

barnes_elizabeth-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Benefits of saying “I Don’t Know” when analyzing and modeling the climate system with ML

Abstract: The atmosphere is chaotic. This fundamental property of the climate system makes forecasting weather incredibly challenging: it’s impossible to expect weather models to ever provide perfect predictions of the Earth system beyond timescales of approximately 2 weeks. Instead, atmospheric scientists look for specific states of the climate system that lead to more predictable behaviour than others. Here, we demonstrate how neural networks can be used, not only to leverage these states to make skillful predictions, but moreover to identify the climatic conditions that lead to enhanced predictability. We introduce a novel loss function, termed “abstention loss”, that allows neural networks to identify forecasts of opportunity for regression and classification tasks. The abstention loss works by incorporating uncertainty in the network’s prediction to identify the more confident samples and abstain (say “I don’t know”) on the less confident samples. Once the more confident samples are identified, explainable AI (XAI) methods are then applied to explore the climate states that exhibit more predictable behavior.


Bio: Dr. Elizabeth (Libby) Barnes is an associate professor of Atmospheric Science at Colorado State University. She joined the CSU faculty in 2013 after obtaining dual B.S. degrees (Honors) in Physics and Mathematics from the University of Minnesota, obtaining her Ph.D. in Atmospheric Science from the University of Washington, and spending a year as a NOAA Climate & Global Change Fellow at the Lamont-Doherty Earth Observatory. Professor Barnes' research is largely focused on climate variability and change and the data analysis tools used to understand it. Topics of interest include earth system predictability, jet-stream dynamics, Arctic-midlatitude connections, subseasonal-to-decadal (S2D) prediction, and data science methods for earth system research (e.g. machine learning, causal discovery).

She teaches graduate courses on fundamental atmospheric dynamics and data science and statistical analysis methods. Professor Barnes is involved in a number of research community activities. In addition to being a lead of the US CLIVAR Working Group: Emerging Data Science Tools for Climate Variability and Predictability, a member of the National Academies’s Committee on Earth Science and Applications from Space, a funded member of the NSF AI Institute for Research on Trustworthy AI in Weather, Climate and Coastal Oceanography (AI2ES), and on the Steering Committee of the CSU Data Science Research Institute, she recently finished being the lead of the NOAA MAPP S2S Prediction Task Force (2016-2020).


 

Spring 2021

January 22, 2021

David John Gagne (National Center for Atmospheric Research)

david_j_gagne-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Machine Learning Emulation across the Earth System

Abstract: Earth system processes can be explicitly modeled to a high degree of complexity and realism. The most complex models also are the most computationally expensive, so in practice they are not used within large weather and climate simulations. Machine learning emulation of these complex models promises to approximate the complex model output at a small fraction of the original computational cost. If the performance is satisfactory, then the computational budget could be steered toward other priorities. The NCAR Analytics and Integrative Machine Learning group is currently working on machine learning emulation problems for microphysics, atmospheric chemistry, and processing holographic observations of rain drops. We will discuss our successes as well as challenges in ensuring robust online performance and incorporating emulators within existing simulations.



Bio: David John Gagne is a Machine Learning Scientist and head of the Analytics and Integrative Machine Learning group at the National Center for Atmospheric Research (NCAR) in Boulder, Colorado. His research focuses on developing machine learning systems to improve the prediction and understanding of high impact weather and to enhance weather and climate models. He received his Ph.D. in meteorology from the University of Oklahoma in 2016 and completed an Advanced Study Program postdoctoral fellowship at NCAR in 2018.

He has collaborated with interdisciplinary teams to produce machine learning systems for hail, tornadoes, hurricanes, and renewable energy. In order to educate atmospheric science students and scientists about machine learning, he has led a series of interactive short courses and hackathons.

February 12, 2021

Robert Cousins (Department of Physics and Astronomy, UCLA)

robert_cousins-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Testing a sharp null hypothesis versus a continuous alternative: Deep issues regarding this everyday problem in high energy physics

Abstract: In high energy physics, it is extremely common to test a well-specified null hypothesis (such as the Standard Model of elementary particle physics) that is nested within an alternative hypothesis with unspecified value(s) of parameter(s) of interest (such as the Standard Model plus a new force of nature with unknown strength). As widely discussed in the context of the Jeffreys-Lindley paradox, two experiments with the same p-value for testing the null hypothesis can have differing results for the Bayesian probability that the null hypothesis is true (and for the Bayes factor), since the latter depends on both the sample size and the width of the prior probability density in the parameters(s) of the sought-for discovery. After a reminder of relevant methods for hypothesis testing and the paradox, I will note that the issues are particularly apparent when there are three well-separated independent scales for the parameter of interest, namely (in increasing order) the small (or negligible) width of the null hypothesis, the width of the measurement resolution, and the width of the prior probability density. After giving examples with this hierarchy, I will quote various statements in the statistics literature and discuss their relevance (or not) to usual practice in high energy physics. Much of the talk will draw on .

Bio: Robert (Bob) Cousins is Distinguished Professor Emeritus in the Department of Physics and Astronomy at UCLA, where he was on the faculty from 1981 through 2020. He completed his A.B from Princeton in 1976, obtained his Stanford Ph.D. under Mel Schwartz while collaborating on a kaon experiment at Fermilab, and then had a position at CERN during 1981 before joining UCLA. Throughout his career, he has worked on experiments measuring or searching for rare processes, at Brookhaven National Lab with kaons, at CERN with neutrinos, and since 2000 on the CMS Experiment at CERN's Large Hadron Collider. This has motivated his career-long interest in statistical data analysis.

Cousins has held various high-level leadership positions in his collaborations, and served on a number of ad hoc and standing advisory and review committees for laboratories and funding agencies. Recent such service included the Particle Physics Project Prioritization Panel (P5) in the U.S. (2013-2014), and CERN’s Scientific Policy Committee (2018-2023).

March 12, 2021

Raphaël Huser (Extreme Statistics Research Group, King Abdullah University of Science and Technology)

raphael_huser-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: High-resolution Modeling and Estimation of Extreme Red Sea Surface Temperature Hotspots

Abstract: Modeling, estimation and prediction of spatial extremes is key for risk assessment in a wide range of geo-environmental, geo-physical, and climate science applications. In this talk, we will first introduce state-of-the-art models based on extreme-value theory, and discuss their statistical and computational limitations. We will then discuss an alternative flexible approach for modeling and estimating extreme sea surface temperature (SST) hotspots, i.e., high threshold exceedance regions, for the whole Red Sea, a vital region of high biodiversity. In a nutshell, our proposed model is a semiparametric Bayesian spatial mixed-effects linear model with a flexible mean structure to capture spatially-varying trend and seasonality, while the residual spatial variability is modeled through a Dirichlet process mixture of low-rank spatial Student-t processes to efficiently handle high dimensional data with strong tail dependence. With our model, the bulk of the SST residuals influence tail inference and hotspot estimation only moderately, while our approach can automatically identify spatial extreme events without any arbitrary threshold selection. Posterior inference can be drawn efficiently through Gibbs sampling. Moreover, we will show how hotspots can be estimated from the fitted model, and how to make high-resolution projections until the year 2100, based on the Representative Concentration Pathways 4.5 and 8.5. Our results show that the estimated 95% credible region for joint high threshold exceedances includes large areas covering major endangered coral reefs in the southern Red Sea.

Bio: Raphaël Huser is an Assistant Professor of Statistics at the King Abdullah University of Science and Technology (KAUST), where he leads the Extreme Statistics (extSTAT) research group. He obtained his PhD degree in Statistics in 2013 from the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland, and he also holds a BS degree in Mathematics and an MS degree in Applied Mathematics from the same institution. His research mainly focuses on the development of novel statistical methodology for the modeling, prediction and assessment of risk related to spatio-temporal extremes arising in a wide range of geo-environmental applications, although he also has interests in other application areas.

April 9, 2021

Patrick Heimbach (Oden Institute for Computational Engineering and Sciences, University of Texas at Austin)

patrick_heimbach-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Augmenting a sea of data with dynamics: the global ocean parameter and state estimation problem

Abstract: Because of the formidable challenge of observing the full-depth global ocean circulation in its spatial detail and the many time scales of oceanic motions, numerical simulations play an essential role in quantifying patterns of climate variability and change. For the same reason, predictive capabilities are confounded by the high-dimensional space of uncertain inputs required to perform such simulations (initial conditions, model parameters and external forcings). Inverse methods optimally extract and blend information from observations and models. Parameter and state estimation, in particular, enables rigorously calibrated and initialized predictive models to optimally learn from sparse, heterogeneous data while satisfying fundamental equations of motion. A key enabling computational approach is the use of derivative information (adjoints and Hessians) for solving nonlinear least-squares optimization problems. Emerging capabilities are the uncertainty propagation from the observations through the model to key oceanic metrics such as equator-to-pole oceanic mass and heat transport. A related use of the adjoint method is the use of the time-evolving dual state as sensitivity kernel for dynamical attribution studies. I will give examples of the power of (i) property-conserving data assimilation for reconstruction, (ii) adjoint-based dynamical attribution, and (iii) the use of Hessian information for uncertainty quantification and observing system design.

 

Bio: Patrick Heimbach is a computational oceanographer at the University of Texas at Austin, with joint appointments in the Jackson School of Geosciences, the Institute for Geophysics, and the Oden Institute for Computational Engineering and Sciences. His research focuses on ocean and ice dynamics and their role in the global climate system. He specializes in the use of inverse methods applied to ocean and ice model parameter and state estimation, uncertainty quantification and observing system design.

Patrick earned his Ph.D. in 1998 from the Max-Planck-Institute for Meteorology and the University of Hamburg, Germany. Among his professional activities, Patrick serves on the National Academy of Sciences’ Ocean Studies Board, the CLIVAR/CliC Northern Ocean Regional Panel, and the US CLIVAR Ocean Uncertainty Quantification working group.


 

May 7, 2021

Daniela Huppenkothen (SRON Netherlands Institute for Space Research)

dhuppenkothen-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Unravelling the Physics of Black Holes Using Astronomical Time Series

Abstract: Black holes are at the heart of many open questions in astrophysics. They are prime laboratories to study the effects of strong gravity, and are thought to play a significant role in the evolution of the universe. Much of our knowledge of these sources comes from studies of black holes in X-ray binaries, where a black hole exists in a binary system with a star, and is observed through the radiation emitted by stellar material as it falls into the black hole. Of particular interest are their time series, measurements of their brightness as a function of time. Connecting properties of these (often stochastic) time series to physical models of how matter falls into black holes enables probes of fundamental physics, but requires sophisticated statistical methods commonly grouped under the term “spectral timing”. In addition, data analysis is often complicated by systematic biases introduced by the detectors used to gather the data.

In this talk, I will introduce black holes as important astrophysical sources and give an overview of the types of data we observe from them with X-ray telescopes. I will give an overview of spectral timing as an approach to characterizing the information of the physical system contained in these data sets, and present both the state-of-the-art and future directions of time series analysis for black holes. I will also present recent work on mitigating systematic biases in X-ray detectors using simulation-based inference and deep neural networks.

Bio: Daniela Huppenkothen is a staff scientist at the SRON Netherlands Institute for Space Research. Previously, she was Associate Director of the Center for Data-Intensive Research in Astrophysics and Cosmology (DIRAC) at the University of Washington. Before that, she spent time at New York University as a Moore-Sloan Data Science Postdoctoral Fellow, after receiving her PhD at the University of Amsterdam in Astronomy in 2014.

Daniela is interested in leveraging new statistical and computational methods to improve inference within astronomy and space science. Her current research focuses mostly on time series analysis across all parts of astronomy, including asteroids, neutron stars and black holes. She is interested in how we can use machine learning and statistics to mitigate biases introduced into our data by detectors and telescopes. She is lead developer of the open-source software project Stingray, which implements a collection of commonly used time series methods in astronomy. She is interested in finding new ways to teach data science to astronomers (often with candy), and she develops new strategies for facilitating interdisciplinary collaborations in her role as co-organizer of Astro Hack Week.


 

Summer/Fall 2020

July 10, 2020

Adam Sykulski (Department of Mathematics and Statistics, Lancaster University)

adam-sykulski-250x300-min.jpg

 

 

 

 

 

 

[]

Title: Stochastic modeling of the ocean using drifters: The Lagrangian perspective

Abstract: Drifter deployments continue to be a popular observational method for understanding ocean currents and circulation, with numerous recent regional deployments, as well as the continued growth of the Global Drifter Program. Drifter data, however, is highly heterogenous, prone to measurement error, and captures an array of physical processes  that are difficult to disentangle. Moreover, the data is “Lagrangian” in that each drifter moves through space and time, thus posing a unique statistical and physical modelling challenge. In this talk I will start by overviewing some novel techniques for preprocessing and interpolating noisy GPS data using smoothing splines and non-Gaussian error structures. We then examine how the interpolated data can be uniquely visualised and interpreted using time-varying spectral densities. Finally we highlight some parametric stochastic models which separate physical processes such as diffusivity, inertial oscillations and tides from the background flow.


Bio: Adam is a Lecturer in Data Science at Lancaster University in the UK. Adam’s research interests are in time series analysis and spatial statistics, with a focus on spectral techniques using Fourier transforms. Adam’s main application area is in oceanography, but he also studies problems more broadly across geophysical and medical applications.

August 14, 2020

Tommaso Dorigo (INFN-Padova)

tommaso-dorigo-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Frequentist Statistics, the Particle Physicists’ Way: How To Claim Discovery or Rule Out Theories

Abstract: Fundamental research in particle physics progresses by investigating the merits of  theories that describe matter and its interactions at the smallest distance scales, as well as by looking for new phenomena in high-energy particle collisions. The large datasets today commonly handled by experiments at facilities such as the CERN Large Hadron Collider, together with the well-defined nature of the questions posed to the data, have fostered the development of an arsenal of specialized Frequentist methods for hypothesis testing and parameter estimation, which strive for severity and calibrated coverage, and which enforce type-I error rates below 3 x 10-7 for discovery claims. In this lecture I will describe the generalities and needs of inference problems at particle physics experiments, and examine the statistical procedures that allow us to rule out or confirm new phenomena.

Bio: Tommaso Dorigo is an experimental particle physicist who works as a First Researcher at the INFN in Italy. He obtained his Ph.D. in Physics in 1999 with a thesis on data analysis for the CDF experiment at the Fermilab Tevatron. After two years as a post-doctoral fellow with Harvard University, when he contributed to the upgrade of the muon system of the CDF-II experiment, he has worked as a researcher for INFN in Padova, Italy.

He collaborates with the CMS experiment at the CERN LHC, where he is a member (formerly chair) of the Statistics Commitee of the experiment. He is the author of several innovative algorithms and machine learning tools for data analysis in particle physics. In 2014-2019 Dorigo has been the founder and scientific coordinator of the ETN “AMVA4NewPhysics” which focused on training Ph.D. students in machine learning applications to physics. His current interests focus on end-to-end optimization of physics experiments and measurements with machine learning. He is also very active in science outreach with a , and in 2016 he published the book “Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab”.

September 11, 2020

Parker Holzer (Department of Statistics & Data Science, Yale University)

parker-holzer-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Discovering Exoplanets With Hermite-Gaussian Linear Regression

Abstract: One growing focus in modern astronomy is the discovery of exoplanets through the radial velocity (or Doppler) method. This method aims to detect an oscillation in the motion of distant stars, indicating the presence of orbiting planetary companions. Since the radial velocity imposed on a star by an planetary companion is small, however, such a signal is often difficult to detect. By assuming the relative radial velocity is small and using Hermite-Gaussian functions, we show that the problem of detecting the signal of exoplanets can be formulated as simple (weighted) linear regression. We also demonstrate the new Hermite-Gaussian Radial Velocity (HGRV) method on recently collected data for the star 51 Pegasi. In this demonstration, as well as in simulation studies, the HGRV approach is found to outperform the traditional cross-correlation function approach.

Bio: I am a current Ph.D. student in the Department of Statistics & Data Science at Yale University. I got my undergraduate at the University of Utah as a double-major in Mathematics and Applied Physics. My research primarily centers on applying statistics to astronomy, with a current focus on exoplanet detection. I am married with a 1-year-old son and another son expected in January.

October 9, 2020

Amy Braverman (Jet Propulsion Laboratory, California Institute of Technology)

braverman-300x300-min.jpg

 

 

 

 

 

 

[] []

Title: Post-hoc Uncertainty Quantification for Remote Sensing Observing Systems

Abstract: The ability of spaceborne remote sensing data to address important Earth and climate science problems rests crucially on how well the underlying geophysical quantities can be inferred from these observations. Remote sensing instruments measure parts of the electromagnetic spectrum and use computational algorithms to infer the unobserved true physical states. However, the accompanying uncertainties, if they are provided at all, are usually incomplete. There are many reasons why including but not limited to unknown physics, computational artifacts and compromises, unknown uncertainties in the inputs, and more.

In this talk I will describe a practical methodology for uncertainty quantification of physical state estimates derived from remote sensing observing systems. The method we propose combines Monte Carlo simulation experiments with statistical modeling to approximate conditional distributions of unknown true states given point estimates produced by imperfect operational algorithms. Our procedure is carried out post-hoc; that is, after the operational processing step because it is not feasible to redesign and rerun operational code. I demonstrate the procedure using four months of data from NASA’s Orbiting Carbon Observatory-2 mission, and compare our results to those obtained by validation against data from the Total Carbon Column Observing Network where it exists.

Bio: Amy Braverman is Principal Statistician at the Jet Propulsion Laboratory, California Institute of Technology. She received her Ph.D. in Statistics from UCLA in 1999. Prior to that she earned an M.A. in Mathematics, also from UCLA (1992), and a B.A. in Economics from Swarthmore College in 1982. From 1983 to 1990, she worked in litigation support consulting for two different firms in Los Angeles. Her research interests include massive data set analysis, spatial and spatio-temporal statistics, data fusion, decision making in complex systems, and uncertainty quantification.

October 23, 2020

Collin Politsch (Machine Learning Department, 麻豆村)

collin-politsch-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Three-dimensional cosmography of the high redshift Universe using intergalactic absorption

Abstract: The Lyman-α forest – a dense series of hydrogen absorptions seen in the spectra of distant quasars – provides a unique observational probe of the redshift z>2 Universe. The density of spectroscopically measured quasars across the sky has recently risen to a level that has enabled secure measurements of large-scale structure in the three-dimensional distribution of intergalactic gas using the inhomogeneous hydrogen absorption patterns imprinted in the densely sampled quasar sightlines. In principle, these modern Lyman-α forest observations can be used to statistically reconstruct three-dimensional density maps of the intergalactic medium over the massive cosmological volumes illuminated by current spectroscopic quasar surveys. However, until now, such maps have been impossible to produce without the development of scalable and statistically rigorous spatial modeling techniques. Using a sample of approximately 160,000 quasar sightlines measured across 25 percent of the sky by the SDSS-III Baryon Oscillation Spectroscopic Survey, here we present a 154 Gpc3 large-scale structure map of the redshift 1.98≤z≤3.15 intergalactic medium — the largest volume large-scale structure map of the Universe to date — accompanied by rigorous quantification of the statistical uncertainty in the reconstruction.

Bio: Collin is a Postdoctoral Fellow in the Machine Learning Department at 麻豆村. He received his joint Ph.D. in Statistics and Machine Learning from 麻豆村 in the summer of 2020 with his thesis titled "Statistical Astrophysics: From Extrasolar Planets to the Large-scale Structure of the Universe." Prior to that, he received a M.Sc. in Machine Learning from 麻豆村 in 2017 and a B.Sc. in Mathematics from the University of Kansas in 2014. His research interests include applications of statistical machine learning methods to problems in astrophysics, spatio-temporal data analysis, uncertainty quantification, and forecasting COVID-19.

November 13, 2020

Murali Haran (Department of Statistics, Pennsylvania State University)

murali-haran-250x300-min.jpg

 

 

 

 

 

 

[] []

Title: Statistical Methods for Ice Sheet Model Calibration

Abstract: In this talk I will consider the scientifically challenging task of understanding the past and projecting the future dynamics of the Antarctic ice sheet; this ice sheet is of particular interest as its melting may lead to drastic sea level rise. The scientific questions lead to the following statistical and computational question: How do we combine information from noisy observations of an ice sheet with a physical model of the ice sheet to learn about the parameters governing the dynamics of the ice sheet? I will discuss two classes of methods: (i) approaches that perform inference based on an emulator, which is a stochastic approximation of the ice sheet model, and (ii) an inferential approach based on a heavily parallelized sequential Monte Carlo algorithm. I will explain how the choice of method depends on the particulars of the questions we are trying to answer, the data we use, and the complexity of the ice sheet model we work with. This talk is based on joint work with Ben Lee (George Mason U.), Won Chang (U of Cincinnati), Klaus Keller, Rob Fuller, Dave Pollard, and Patrick Applegate (Penn State Geosciences).

Bio: Murali Haran is Professor and Head of the Department of Statistics at Penn State University. He has a PhD in Statistics from the University of Minnesota, and a BS in Computer Science (with minors in Statistics, Mathematics and Film Studies) from 麻豆村. His research interests are in Monte Carlo algorithms, spatial models, the statistical analysis of complex computer models, and interdisciplinary research in climate science and infectious diseases.

December, 2020

Jenni Evans (Department of Meteorology & Atmospheric Science, Pennsylvania State University)

jenni-evans-250x300-min.jpg

 

 

 

 

 

 

[]

Title: Unscrambling ensemble simulations to improve hurricane forecasts

Abstract: In November 2020, Hurricane Iota made landfall in Nicaragua, 15 miles south of where Hurricane Eta had crossed the coast less than 2 weeks earlier. Like Eta, Iota was a Category 4 hurricane at landfall, with maximum sustained winds near 155 mph. In a situation like Eta or Iota, devastation follows landfall due to a combination of winds, rainfall, flooding and mudslides. The storm’s ultimate impact relies on its track, its intensity and its structure. An accurate hurricane forecast can save countless lives. In the drive to produce accurate hurricane forecasts, meteorologists developed detailed deterministic models and refined them endlessly, but large forecast errors still occurred. Modelers began running permutations of the deterministic models 10s, or even 100s, of times. These ensemble simulations of hurricane evolution provide a measure of the uncertainty in the forecast, but translating this into a forecast can mean that much information is lost. I will discuss how we can synthesize the information in the ensemble objectively, and show that the resulting partition distinguishes between different synoptic situations, preserving information on the sources of the uncertainty in the forecast. Examples will be drawn from two US landfalling hurricanes: Hurricane Sandy (October/November 2012) and Hurricane Harvey (August 2017).

Bio: Jenni L. Evans is the Director of Penn State’s Institute for Computational and Data Sciences (ICDS), Professor of Meteorology & Atmospheric Science and served as Centennial President of the American Meteorological Society (AMS) in 2019. Evans earned both her undergraduate and doctoral degrees in applied mathematics at Monash University. The Institute for Computational and Data Sciences (ICDS) is a pan-university research institute and is also the home of Penn State’s high performance computing facility. ICDS jointly employs over 30 tenure track faculty and supports researchers across the disciplinary spectrum.

Dr. Evans was the Centennial President of the American Meteorological Society in 2019, is Fellow of the American Association for the Advancement of Science and also of the American Meteorological Society. She has served on numerous national and international committees and has long been Meteorologist in an interdisciplinary team of scientists and actuaries advising the State of Florida by auditing catastrophe risk models for hurricanes and flood.

Evans’ research spans tropical climate, climate change, and hurricane lifecycles in the tropics, as well as hurricanes that undergo “extratropical transition” (like Hurricane Sandy in 2012) and sonification – the “music of hurricanes.” She uses high performance computing for simulations of hurricanes, and machine learning and advanced statistical techniques, to study formation of hurricanes in the tropics and subtropics, methods for improving hurricane forecasts, theory for the limiting intensity of hurricanes and how this could change with climate change, and the use of climate models to understand the impacts of climate change on our daily lives.