I was asked to sit on a discussion panel at this meeting on uncertainty quantification (UQ) with exascale computing. I prepared a short statement (below) on future challenges for UQ at exascale, but I would have made a slightly a longer one (below that) if there was time.
Thank you for the invitation to be a panel member. I’m Doug McNeall, I work at the Met Office and at the University of Exeter as a climate scientist.
At the Met Office I sit at the interface between data scientists, model developers and climate analysts. We’re trying to build the best possible simulators of the natural world, and then inform stakeholders and decision makers of the consequences of a changing climate, along with the corresponding uncertainty..
I think an exciting potential of exascale computing is that it offers the large number of model runs that we might use to better build surrogate models that we use for UQ, even while pursuing the traditional weather and climate strategy of increasing resolution to remove uncertainty in parameterisations.
My main challenge is that we might run up against human limits – we need to develop the tools and workflows that make it easy for UQ practitioners to understand what domain experts and model developers need, and vice versa. It has been great to see some of the speakers addressing the development of these tools and workflows directly today.
I build surrogate models – emulators, and the methods I use often sample hundreds of thousands of or millions of times from a statistical model, built using a small sample of representative simulator runs. So my first thought when I considered what exascale computing might enable with regards to uncertainty quantification (UQ) is that we might just replace these samples with lots of directly calculated ensemble members and I could just retire early.
Unfortunately, I spoke to some of my colleagues in climate science, and they informed me I was likely to be working for a good while yet.
What are the main challenges, roadblocks and bottlenecks when performing UQ in climate science given exascale resources? How do we get ready to use those resources?
In many ways, things might not be similar to now, … but more so.
When allocating computational resources, we have the axes of resolution (both spatial and temporal), model complexity and raw number of ensemble members. The needs of climate scientists – and the stakeholders and decision makers that they inform – suggest occupation of a particular region on. As a UQ practitioner, I would argue that we have spent a considerable amount of those resources on resolution and model complexity, sometimes to the detriment of UQ. Some of our major UQ breakthroughs in large climate modelling projects have used less than 20 ensemble members. More recently, we’ve started to use larger ensembles, of a few tens to a few hundred members.
Travelling along the resolution and complexity axes can have benefits for UQ too though. A ~100m resolution would permit convection schemes in climate models, allowing updrafts, and permitting eddies in the ocean. Processes that would otherwise be parameterised (with associated uncertainty) become explicitly resolved, potentially removing the parameterisation uncertainty. Biases that appear because the model is missing an entire process disappear. The temptation to push as far as we can down these axes is understandable, as suddenly our stakeholders can see the systems they care about (watersheds, forests and fire regions, cities and agricultural land) directly appear in the models.
But as my colleague Lizzie Kendon – currently developing the highest resolution climate projections – points out, 100m resolution is not a silver bullet. The push to ever-increasing fidelity can simply expose more processes that need to be modelled. More uncertainties need to be understood, and quantified. At the moment, my colleague suggests that getting to the km scale is seen as a good target, at which point other issues become more important sources of uncertainty.
For a start, there are important uncertainties in the physics of some processes that might act as a barrier to development. Climate models famously over-represent “drizzle” at the expense of the more intense rainfall events that we see in reality. High resolution atmospheric components might have to couple with relatively coarse, heterogeneous and perhaps underdeveloped land surface components. (As the land surface is less important on the short scale, we won’t find ready-made versions coming from weather models either).
One bottleneck that might prove hard to overcome in climate applications is the necessity of having large volumes of data to input and output. Starting conditions for the model are critical – for example, we might want to run a small domain at high resolution, nested within a global model, and so need initial conditions at the boundary. Our customers might not know exactly what information they want kept as output from a model, or we may discover new parties interested in our climate model output that we cannot yet account for.
A model that runs at lightning speed may offer us solutions to these problems, but at the same time, the sheer volume of model output potentially available might cause its own problems – particularly as users become more adept at using uncertainty information and necessarily large ensembles.
From a personal perspective now, I work at the interface between statisticians, model developers, and people with various interests in the impacts of climate change (stakeholders, decision makers and analysts). What do I see as the tools we need to take the best advantage of exascale computing?
A huge challenge at the moment is finding ways to best inform the modeller how well the model represents reality, and, crucially, what they could do to improve it. The models are complex, and the number of parameters that might be tweaked is very large. Finding ways to cut through the uncertainty and offer the modeller a clear path to improvement is important. This will involve clever statistical and ML algorithms, but also understanding which outputs of the model are most important to get right (understanding the climate system, understanding the impacts of changing the system), and of the way humans receive, process, and act on information (understanding the modeller).
We need example workflows from the design of computer experiments through to the information delivery to the stakeholder. Truly multivariate emulators are needed, perhaps through new dimension reduction techniques, able to work with heterogeneous data and simple enough to be at least understood (and possibly applied) by modellers. We need better information and uncertainty summaries to feed back to our modellers.
For UQ at exascale to work well, we need long term projects, where climate modellers and data scientists work together, where they get the chance to understand each other’s needs, and together develop the tools for extracting the best information from the firehose of data that exascale computing will bring.