Predicting ocean organic carbon fluxes using machine learning and earth observations
A.P. Martin (Lead, NOC), B. B. Cael (NOC), S. Henley (U.Edinburgh), S. A. Henson (NOC)
Scientific background and motivation: Atmospheric carbon dioxide (CO2) dissolved in seawater is used by phytoplankton to grow. As they are eaten and pass through the foodweb, the resulting organic detritus sinks into the ocean interior as ‘marine snow’. Without this ‘biological carbon pump’, atmospheric CO2 concentrations may be 50% higher (Parekh et al., 2006). The depth to which the marine snow sinks before it is remineralised back into CO2 is crucial: the deeper this happens, the longer the carbon is away from the atmosphere. However the controls on the magnitude and attenuation of this flux are still uncertain with a consistent global picture remaining elusive. Nevertheless, there is evidence for a major influence of the phytoplankton population, for which there are an increasing number of satellite products. Machine learning (ML) provides a powerful way of combining these data to infer key global carbon fluxes from space.

Aim and objectives of the PhD project: The aim of this project is to apply machine learning to satellite datasets to determine the oceanic flux of marine snow globally.
Objectives:
(1) To determine how accurately the flux of marine snow can be estimated by combining global satellite remote sensing data using machine learning techniques.
(2) To quantify the relative influence of different surface phytoplankton related parameters in controlling the carbon flux and its attenuation.
(3) To compare these results to output from the existing range of global biogeochemical models to highlight weaknesses and to help develop the next generation of climate models.
Methodology: We know that the flux of marine snow leaving the surface and its attenuation can vary geographically, strongly influenced by the local surface phytoplankton community. There are also now several satellite products available that provide relevant phytoplankton data: 1. primary production gives us the rate at which source material is produced, 2. chlorophyll and particulate carbon (Stramski et al., 2008; Balch et al., 2005) products both estimate the amount of source material, 3. particle size spectrum (Kostadinov et al., 2009) is linked to the sinking speed of organic material, and 4. phytoplankton functional groups (e.g. Mouw et al., 2017) have been linked to both sinking speed and rate of attenuation. Combining these datasets to estimate carbon fluxes demands the flexibility of ML for handling non-linear systems with unevenly distributed data – a characteristic of ocean ecosystems. For example, random forests have recently been used to estimate organic C fluxes using in situ data (Clements et al., 2021). It has not yet been done using satellite data despite the step change in our understanding it would provide. The student will train a random forest on half of the global dataset of sinking organic carbon fluxes compiled by Mouw et al. (2016). This has over 15,000 samples but they are very unevenly distributed. The remaining half will be used to test the model skill in prediction. Different splits of the data, including boot-strapping, will allow the uncertainties on predictions to be quantified. Relative importance of drivers will be determined e.g. by summing Gini impurities across the forest and by permuting-out drivers in forest construction. Correlated data will be addressed using the conditional variable importance method (Strobl et al., 2008). The ML model will be used to construct a 3d global map of fluxes and key drivers. This will be compared to output from the CMIP6 (Coupled Model Intercomparison Project, https://pcmdi.llnl.gov/CMIP6/) archive. Each satellite product has a proxy in these models. This will provide a strong multi-parameter constraint on carbon flux. This approach will highlight where the models fail to capture correctly the controls on organic carbon flux, facilitating development of the next generation of climate prediction models. The student will be part of a growing NOC team applying machine learning to ocean ecology and biogeochemistry. By using existing EO data in an innovative way this project will illustrate how considerable greater value can be obtained from EO data that are expensive to collect.
References and DOIs: Balch et al., 2005, 10.1029/2004JC002560; Clements et al., 2021, 10.1002/ essoar.10507104.1; Kostadinov et al., 2009, 10.1029/2009JC005303; Lutz et al., 2007, 10.1029/ 2006JC003706; Mouw et al., 2017, 10.3389/fmars.2017.00041; Mouw et al. 2016, 10.5194/essd-8-531-2016; Parekh et al., 2006, 10.1029/2005GL025098, Stramski et al., 2008, 10.5194/bg-5-171-2008; Strobl et al., 2008, 10.1186/1471-2105-9-307
Please be aware that due to immigration requirements this project is only available to applicants from the UK or who have settled or pre-settled status in the UK (i.e. a home fees student)