Author: Omar Wani. He no longer works at Eawag. Please contact firstname.lastname@example.org for further information.
Jörg Rieckermann (Urban Hydrology group leader in SWW, ex-supervisor during my PhD) and I (Omar Wani, PostDoc in SWW) are taking a good-bye walk along the Chriesbach creek before I leave Eawag. We reflect on challenges for statistically-inclined hydrologists, opportunities related to data-driven models and the collaborative research environment at Eawag.
Omar, you did your PhD at Eawag in an "initial training network" (ITN), in the project "Quantifying Uncertainty in Integrated Catchment Studies", in short QUICS. Before we talk about hydrology, can you tell our readers what is special about an ITN?
In 2014, I joined the collaborative QUICS project, which was focusing on the proper quantification of uncertainty in hydrologic models, mainly urban hydrology. ITNs, beyond producing quality research, aim at producing quality researchers. Given the extent of scientific collaboration across countries and continents, it was a great experience. Apart from engaging in scientific research, we also had to develop a personal development plan, so that it can benefit our research career after we finish our PhDs. We had to disseminate our research, engage with the partner institutes, and collaborate with the industry. The goal was to make scientific research trickle down to the society.
It was an excellent opportunity to collaborate and learn. For example, we conducted a workshop at the University of Kashmir and another at the Indian Institute of Technology, Bombay. I was able to go on a three-month research visit to Caltech, USA. That eventually helped me secure one of my next postdocs there.
And regarding your research in this ITN: the quantification of uncertainty in hydrology, why is this relevant?
In environmental sciences, generally, and in hydrology, particularly, our models - our mathematical descriptions of various phenomena - are approximate. This is because the phenomena we generally study constitute several sub-processes and have spatially variable details which we cannot reasonably incorporate in our descriptions. But, still, we have to make very important decisions based on the prediction of our models. The impact of these phenomena, let's say flooding or combined sewer overflows, is substantial. So what we would like to have is a quantitative understanding of how much we can trust these models. That allows us to hedge our bets; we do not discount any probable outcomes by just focusing on one. And this holistic view is very important to make more rational, risk-based decisions.
Figure 1: Left: Navigating Uncertainty (acrylic on canvas): The painting depicts the flow of a river—capturing the confluence of rainfall-runoff generated from natural and urban landscapes. The graphical overlay represents hydrologic forecasts and the associated uncertainties (Painting by Sourabh Gupta. Source: Wani, 2018). Right: the heavily engineered section of Chriesbach creek at Eawag in autumn 2020.
If you look back - working with all these uncertainties, what was the biggest problem you struggled with during your PhD?
The questions around imperfect models and uncertainty, they do not belong to water sciences per se. There has already been a lot of advanced scientific work done on model uncertainties in the field of statistics (Reichert et al., 2009, Del Giudice et al. 2016, Albert et al. 2016). As an applied scientist, someone who starts working freshly on this problem, the question is how to avoid reinventing the wheel and do a bad job at it. I would rather approach people who have been generally working on techniques and statistical methods, seek their advice and then tailor those techniques to our specific problem. But this meant I had to do the balancing act. I was wearing the hat of water scientist -someone who works with problems related to water - and, at the same time, I had to develop expertise on advanced statistical techniques, before tailoring them to our field.
Yes, I fully agree, but on a technical level, talking about non-convergence of our Markov-Chain samplers, talking about copulas (Wani et al. 2019): What do you think was most challenging?
Very specifically: finding an adequate error model. So far, we as water scientists have been working within the framework of deterministic models. For example, for a certain input rainfall or input pollutant concentration, we expect a certain fixed response from the stream discharge or water quality model. And the way we generally incorporate uncertainty is by developing error models over and above our deterministic models. Typically, we have a model that describes the physics of the system, and then we add some statistical model on top of it. The problem is that we can't have off-the-shelf, one-size-fits-all error models for our engineering applications. It stays a pipe dream - the quest for a certain class of error models that describes structural and observational errors together. And time and again, we have found out that, while devising error models, if you gain somewhere you lose somewhere else. This was also my realization towards the end of my PhD ... we can't compensate for the deficit in our physics-based model by just adding an error model to it. The error model also needs to be tested for its assumptions [Figure 1]. So there is this meta uncertainty about your uncertainty analysis. However, that is why the field of statistical hydrology is ripe for some exciting new developments and discoveries.
Figure 2: Flexible error description cannot fix model structure deficits of a process-based rainfall-runoff model (didactical example from Wani et al. (2019) using synthetic data). (top) Model deficits in low flows inducing biased parameters (2) can be avoided by using the flexibility of copula-based likelihood during inference (1).
So, essentially, the error model cannot fix a bad model?
It cannot fix a bad model and it cannot be always adequate. You cannot have just a single error model that always adequately describes errors in all kinds of deterministic models that we use in water resources. Besides, we also need to test these error models.
Is there anything which you can recommend to the guys who come after you?
I think if you're talking about statistics, in environmental sciences, and particularly in water sciences, one of the recommendations I can provide is this: collaborate! Talk to the statisticians and talk to the water scientists. If you are interested in tools from advanced statistics, eventually they will be supplied by the statisticians and you would need water scientists to supply you with some of the most interesting and urgent problems. As someone working at the intersection, you might end up doing a good job in bringing eclectic things together and perhaps discover a new solution.
So for me as a sewer researcher, interested in uncertainty, there will always be one who is more clever and more knowledgeable about sewers. And there will always be one who knows more about uncertainty. But if we team up with the specialists - this will be unique.
Right. Because it's not always about being the cleverest in a particular domain. It's about having the vision to solve a problem. If I have to make an analogy, it's like directing a movie. You may not be the best cinematographer out there and you may not be an actor. And perhaps you may not even write the screenplay of your movie. But as a director, you have the vision. You bring the best people together and finally have a product. So, I definitely think that the ability to have that vision will always be desirable.
So, Goethe is dead now?
Goethe is dead. Da Vinci is dead and Aristotle is dead. People who could virtually do all the different things, the polymaths, are dead. Now, it's the age of specialists.
Maybe regarding modern specialist: method-wise, I was thinking that machine learning is a big hype nowadays. What is your very personal opinion on using machine learning in hydrology?
Well, there are reasons why there is a hype surrounding data-driven techniques. Machine learning is one of the terms that represents these suite of techniques. But, like some of my other colleagues, I rather prefer the label “data-driven”. Over the past decade, these techniques have really shown to beat many previous benchmarks (Kratzert et al. 2018). So naturally, the enthusiasm or “hype” is justified and deserved. But, at the same time, all our problems are not problems of prediction. What we as water scientists are after is not just data-fitting, mapping between two, input and output, datasets. Regression and classification are relatively small subsets of all the problems that water scientists study. Let’s say, the other problems around design, around climate change, questions on how catchments change due to urban growth and questions around sustainability – all these modeling questions still stay. And if I want to know what happens in extrapolations, where I want to extrapolate and play with a parameter which has a physical meaning, the physics-based models stay. So I do think machine learning is certainly a game changer and a very important addendum. But I don't think we have to pit it against other sub-branches of the field. Their relevance will stay. However, at the same time, while defending hydrologic theory, one should not adopt any reticence towards machine learning or be slow in its proper adoption. Otherwise, a lot of good would be lost. We definitely have to embrace these tools and start collaborations with computer scientists, which I guess is already happening more frequently and with greater enthusiasm.
Coming back to the ITN and your fellow PhD colleagues. And maybe for the prospective of a young researcher out there who thinks about working with us: In comparison to the other institutions you experienced, what was it like to work at Eawag?
In terms of aquatic sciences, Eawag is a renowned and respected name. There is quite a lot of cool research going on. Hydrology, fish ecology, limnology, systems analysis, process engineering, urban water management and so much more - you name it! That is really special. And I think one of the other things that's really special about Eawag, is that, when it comes to scientific problems, you work as equals, where the professor as well as a group leader, and the PhD student work as colleagues. The organizational hierarchy is not always important. You can approach and utilize the expertise of your supervisor, and you can walk into another department and talk to let’s say a physicist, and then you can walk into another department to talk to social scientists. That is a different framework than being affiliated with only a university where you are "just a student".
Great! Maybe really brief: Where you are heading now?
I have been offered a joint Postdoc by UC Berkeley and Caltech. I will continue to work at the intersection of advanced statistical data analysis and hydrology. At Berkeley, I will study large hydrologic data sets, mainly on flow and storage. Among other things, I will investigate patterns and links, using causal inference, between different time series. And then at Caltech, I would be working on proper representation of uncertainty in geomorphological models, with the Mississippi Delta as a case study.
Sounds exciting. Thanks, Omar for the talk and all the best.
Thank you very much, Jörg. Thank you for having me.
Oh, and one last thing! You once shared with me this image, where you were sitting in some street corner in Kashmir behind a small stand? Can you explain what this image is about?
Figure 3: Amateur tea seller beside Lake Dal in Srinagar, Kashmir
Certainly, that was really fun. I had just finished my PhD, I went home to Kashmir and I wanted to do something more hands-on: learn a new skill or perhaps hone an old one. It so happened that I decided to participate in a short "workshop” of sorts, where I learned how to make and sell tea on the streets. My teacher, an ace tea maker, is a street vendor. He sells tea beside Lake Dal in Srinagar. I had approached him earlier and told him I was interested in learning the trade. I asked whether I can help him on one of the days. He replied: “of course, why not?” And then on a mutually agreed day, I ended up making and selling tea as an apprentice. Publishing a good paper takes a year or more. As opposed to that, by selling tea, you can provide a little bit of satisfaction to somebody else almost immediately. So it was a fun detour from my usual engagements and a very rewarding experience.
- Albert, Carlo; Ulzega, Simone; Stoop, Ruedi (2016). Boosting Bayesian parameter inference of nonlinear stochastic differential equation models by Hamiltonian scale separation. Physical Review E, 93(4):043313.https://arxiv.org/pdf/1509.05305
- Del Giudice, D., Albert, C., Rieckermann, J., Reichert, P., 2016. Describing the catchment-averaged precipitation as a stochastic process improves parameter and input estimation. Water Resour. Res. 52, 3162–3186. https://doi.org/10.1002/2015WR017871
- Kratzert, F., Klotz, D., Brenner, C., Schulz, K., Herrnegger, M., 2018. Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks. Hydrology and Earth System Sciences 22, 6005–6022. https://doi.org/10.5194/hess-22-6005-2018
- Reichert, P., Mieleitner, J., 2009. Analyzing input and structural uncertainty of nonlinear dynamic models with stochastic, time-dependent parameters. WATER RESOURCES RESEARCH 45.https://doi.org/10.1029/2009WR007814
- Wani, O., A Scheidegger, F Cecinati, G Espadas, J Rieckermann (2019) Exploring a copula-based alternative to additive error models—for non-negative and autocorrelated time series in hydrology Journal of Hydrology 575, 1031-1040, https://doi.org/10.1016/j.jhydrol.2019.06.006
- Wani, O., 2018. Statistical Methods for Better Hydrologic Predictions—Improving Parameter and Uncertainty Estimation (Doctoral Thesis). ETH Zurich. https://doi.org/10.3929/ethz-b-000315756