Skip to article frontmatterSkip to article content

Cosmos-UK Soil Moisture (UKCEH)

The Alan Turing Institute

Context

Purpose

To load and visualise daily hydrometeorological and soil data from the 2013-2019 public COSMOS-UK dataset Stanley et al., 2021.

Sensor description

Since 2013 the UK Centre for Ecology & Hydrology (UKCEH) has established the world’s most spatially dense national network of cosmic-ray neutron sensors (CRNSs) Zreda et al., 2012 to monitor soil moisture across the UK. The Cosmic-ray Soil Moisture Observing System for the UK (COSMOS-UK) delivers field-scale soil water volumetric content (VWC) measurements for around 50 sites in near-real time. In addition to measuring field-scale (or local) soil moisture, the network collects a large number of hydrometeorological and soil data variables, including VWC measured by point-scale (or site) soil moisture sensors Evans et al., 2016.

This notebook explores a subset of 4 out of 51 stations available in the public COSMOS-UK dataset Stanley et al., 2021. These stations represent the first sites to prototype COSMOS sensors in the UK, see further details in Evans et al. (2016) and they are situated in human-intervened areas (grassland and cropland), except for one in a woodland land cover site.

The media below, available in the UKCEH YouTube channel, summarises the concept of cosmic-ray neutron sensors and how they provide non-invasive soil moisture measurments at field scale.

COSMOS-UK using cosmic-ray neutron sensors to monitor soil moisture. Source: UKCEH.

Highlights

  • Fetch COSMOS-UK dataset files through intake.
  • Inspect the available metadata with information about the sites, their locations and other site-specific attributes.
  • Explore relationships between daily mean soil moisture and potential evapotranspiration derived from the meteorological measurements at the site.
  • Analyse yearly change of daily mean soil moisture observations.
  • Compare local and site soil moisture measurements at daily resolution.

Contributions

Dataset originator/creator

  • UK Centre for Ecology & Hydrology (creator)
  • Natural Environment Research Council (support)

Load libraries

Source
import os
import pandas as pd
import intake
import holoviews as hv
import panel as pn
import matplotlib.pyplot as plt
from bokeh.models.formatters import DatetimeTickFormatter
from datetime import datetime

import hvplot.pandas
import hvplot.xarray  # noqa

import pooch

import warnings
warnings.filterwarnings(action='ignore')

pd.options.display.max_columns = 10
hv.extension('bokeh')
pn.extension()
Loading...

Set project structure

notebook_folder = './notebook'
if not os.path.exists(notebook_folder):
    os.makedirs(notebook_folder)

Fetch and load data

Let’s download the sample data. We use pooch to fetch and unzip them directly from a Zenodo repository.

pooch.retrieve(
    url="doi:10.5281/zenodo.6567018/subset_COSMOS-UK_HydroSoil_Daily_2013-2019.zip",
    known_hash="md5:3755cb069bc48c5efc081905110e169b",
    processor=pooch.Unzip(extract_dir=os.path.join(notebook_folder,'data')),
    path=f".",
)
Downloading data from 'doi:10.5281/zenodo.6567018/subset_COSMOS-UK_HydroSoil_Daily_2013-2019.zip' to file '/home/jovyan/97469708ef44493ff5e8878f93e00890-subset_COSMOS-UK_HydroSoil_Daily_2013-2019.zip'.
Unzipping contents of '/home/jovyan/97469708ef44493ff5e8878f93e00890-subset_COSMOS-UK_HydroSoil_Daily_2013-2019.zip' to '/home/jovyan/./notebook/data'
['/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_Daily_2013-2019_Metadata.csv', '/home/jovyan/./notebook/data/COSMOS-UK_SiteMetadata_2013-2019.csv', '/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_SH_2013-2019_Metadata.csv', '/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_SH_2013-2019/COSMOS-UK_CHIMN_HydroSoil_SH_2013-2019.csv', '/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_SH_2013-2019/COSMOS-UK_WYTH1_HydroSoil_SH_2013-2019.csv', '/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_SH_2013-2019/COSMOS-UK_WADDN_HydroSoil_SH_2013-2019.csv', '/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_SH_2013-2019/COSMOS-UK_SHEEP_HydroSoil_SH_2013-2019.csv', '/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_Daily_2013-2019/COSMOS-UK_SHEEP_HydroSoil_Daily_2013-2019.csv', '/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_Daily_2013-2019/COSMOS-UK_WYTH1_HydroSoil_Daily_2013-2019.csv', '/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_Daily_2013-2019/COSMOS-UK_WADDN_HydroSoil_Daily_2013-2019.csv', '/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_Daily_2013-2019/COSMOS-UK_CHIMN_HydroSoil_Daily_2013-2019.csv']

Load an intake catalog for the downloaded data

Source
# set catalogue location
catalog_file = os.path.join(notebook_folder, 'catalog.yaml')

with open(catalog_file, 'w') as f:
    f.write('''
sources:
  data_siteid:
    driver: intake.source.csv.CSVSource
    parameters:
      stationid:
        description: five letter code for the COSMOS-UK site
        type: str
        default: CHIMN
      resolution:
        description: temporal resolution
        type: str
        default: Daily
        allowed:
          - Daily
          - Hourly
          - SH
    args:
      urlpath: "{{ CATALOG_DIR }}/data/COSMOS-UK_HydroSoil_{{resolution}}_2013-2019/COSMOS-UK_{{stationid}}_HydroSoil_{{resolution}}_2013-2019.csv"
      csv_kwargs:
        na_values: [-9999]
        parse_dates: ['DATE_TIME']

  data_all:
    driver: intake.source.csv.CSVSource
    parameters:
      resolution:
        description: temporal resolution
        type: str
        default: Daily
        allowed:
          - Daily
          - Hourly
          - SH
    args:
      urlpath: "{{ CATALOG_DIR }}/data/COSMOS-UK_HydroSoil_{{resolution}}_2013-2019/COSMOS-UK_*.csv"
      csv_kwargs:
        na_values: [-9999]
        parse_dates: ['DATE_TIME']

  metadata_sites:
    driver: intake.source.csv.CSVSource
    args:
      urlpath: "{{ CATALOG_DIR }}/data/COSMOS-UK_SiteMetadata_2013-2019.csv"
      csv_kwargs:
        header: 0
        parse_dates: [ 'START_DATE','END_DATE']

  metadata_measurements:
    driver: intake.source.csv.CSVSource
    parameters:
      resolution:
        description: temporal resolution
        type: str
        default: Daily
        allowed:
          - Daily
          - Hourly
          - SH
    args:
      urlpath: "{{ CATALOG_DIR }}/data/COSMOS-UK_HydroSoil_{{resolution}}_2013-2019_Metadata.csv"
      csv_kwargs:
        header: 0

  location:
    driver: intake_xarray.image.ImageSource
    parameters:
      stationid:
        description: five letter code for the COSMOS-UK site
        type: str
        default: CHIMN
    args:
      urlpath: "https://eip.ceh.ac.uk/hydrodata/cosmos-uk/maps/airphoto/1000px/{{stationid}}.jpg"
    storage_options: {'anon': True}
''')
cat = intake.open_catalog(catalog_file)

Load metadata

Here we load COSMOS-UK metadata into memory. The metadata contains multiple columns about the sites, their locations and other site-specific attributes.

metadata = cat.metadata_sites().read()
print(metadata.columns.tolist())
['SITE_NAME', 'SITE_ID', 'START_DATE', 'END_DATE', 'EASTING', 'NORTHING', 'EAST_NORTH_EPSG', 'LATITUDE', 'LONGITUDE', 'LAT_LONG_ESPG', 'ALTITUDE', 'SOIL_TYPE', 'LAND_COVER', 'BULK_DENSITY', 'BULK_DENSITY_SD', 'SOIL_ORGANIC_CARBON', 'SOIL_ORGANIC_CARBON_SD', 'LATTICE_WATER', 'LATTICE_WATER_SD']
metadata
Loading...

For this example, we will explore a subset of four stations, all of them with start date in 2013. Only the Wytham Woods station ceased on 10th January 2016. This station is situated in a Broadleaf woodland land cover which also hosts Environmental Change Network (ECN) and FLUXNET monitoring sites (see further details here). The dataframe contains each site name, id and corresponding land cover. CHIMN and WADDN are located situated in improved grassland, and SHEEP is in arable and horticulture.

metadata[['SITE_NAME','SITE_ID','LAND_COVER']]
Loading...

A key feature of COSMOS-UK stations is their capability of monitoring field-scale soil moisture. The CRNSs VWC value is an average soil moisture measurement (%) across an estimated, variable footprint of radius up to 200 m and estimated variable measurement depth of between approximately 0.1 and 0.8 m. It is worth mentioning the measurement depth depends on the soil moisture content as well as lattice water and soil organic matter water equivalent (see Cooper et al. 2021). The greater the actual soil water content, the shallower the penetrative depth. Let’s explore the notional footprint of the analysed stations from the CEH COSMOS-UK website.

Source
# set sliders
station_list = list(metadata.SITE_ID.tolist())

target_station = pn.widgets.Select(name = 'Station', options = station_list)

@pn.depends(target_station.param.value)
def plot_footprint(station):
    location_da = cat.location(stationid=station).to_dask()
    p = location_da.hvplot.rgb(x='x', y='y', bands='channel', data_aspect=1, flip_yaxis=True, xaxis=False, yaxis=None, hover=False)
    return p

plot_stations = pn.Row(
    plot_footprint,
    pn.Column(pn.Spacer(height=5), target_station, background='#f0f0f0', sizing_mode="fixed"),
    width_policy='max', height_policy='max',
)

plot_stations.embed()
Loading...

Load daily data

Here we load COSMOS-UK daily data into memory. The daily data is the level with the highest processing and derived from subhourly data. Note only certain variables are provided at this level.

site_daily_all = cat.data_all(resolution='Daily').read()
print(site_daily_all.columns.tolist())
['DATE_TIME', 'SITE_ID', 'COSMOS_VWC', 'D86_75M', 'SWE', 'SNOW', 'ALBEDO', 'PE']

To further understand the meaning of above columns, the COSMOS-UK dataset include a separate metadata file by time resolution. Let’s explore the metadata for daily measurements. The dataframe below includes further details of each variable, including the unit, aggregation and data type. For instance, soil moisture measurements at daily resolution refer to the daily mean derived from CRNSs.

metadata_daily = cat.metadata_measurements(resolution='Daily').read()
metadata_daily
Loading...

Timeseries

The plot below shows two timeseries, soil moisture and potential evapotranspiration (PE), provided by the daily COSMOS-UK dataset. PE refers to the potential evaporation from soils plus transpiration by plants (so called evapotranspiration). PE assumes there is always adequate moisture to match the evapotranspiration demand. We evidence this relationship in the daily aggregated data of both variables as it is shown in the plot below. We also note each station has a different time span with the SITE_ID equal to WYTH1 containing the shortest records.

Source
# set sliders
station_list = list(metadata.SITE_ID.tolist())

target_station = pn.widgets.Select(name = 'Station', options = station_list)

# set formater for dates
formatter = DatetimeTickFormatter(months='%b %Y')

@pn.depends(target_station.param.value)
def plot_pe_vwc(station):
    daily_dataset = cat.data_siteid(resolution='Daily', stationid=station).read()
    daily_dataset.dropna(subset = ['COSMOS_VWC','PE'], inplace=True) #remove empty rows
    
    p1=daily_dataset.hvplot(x='DATE_TIME', y=['COSMOS_VWC'], xformatter=formatter, xlabel = 'Date', ylabel = 'Volumetric Water Content (%)', color='blue', title='Soil Moisture (CRNS VWC)', line_width=0.8, fontscale=1.2, padding=0.2)
    p2=daily_dataset.hvplot(x='DATE_TIME', y=['PE'], xformatter=formatter, xlabel = 'Date', ylabel = 'Potential Evapotranspiration (mm)', color='red', title='Potential Evapotranspiration (1 day)', line_width=0.8, fontscale=1.2, padding=0.2)

    return (p1 + p2).cols(1)

plot_scatterplot = pn.Row(
    plot_pe_vwc,
    pn.Column(pn.Spacer(height=5), target_station, background='#f0f0f0', sizing_mode="fixed"),
    width_policy='max', height_policy='max',
)

plot_scatterplot.embed()
Loading...

Correlation

To explore further the seasonal dynamics of the above variables, let’s generate correlation charts grouped by season. The highest values of PE are in the summer followed by spring, fall and winter. The forest site, WYTH1, has higher soil moisture values than the human-intervened places, CHIMN and WADDN (improved grassland), and SHEEP (arable and horticulture site).

Source
def season(df):
    """Add season column based on lat and month
    """
    seasons = {3: 'spring',  4: 'spring',  5: 'spring',
                   6: 'summer',  7: 'summer',  8: 'summer',
                   9: 'fall',   10: 'fall',   11: 'fall',
                  12: 'winter',  1: 'winter',  2: 'winter'}
    return df.assign(season=df.DATE_TIME.dt.month.map(seasons))

site_daily_all = season(site_daily_all)

custom_dict = {'winter': 0, 'spring': 1, 'summer': 3, 'fall':4}
plot_season = site_daily_all.sort_values('season', key=lambda x: x.map(custom_dict)).hvplot.scatter(x='COSMOS_VWC', y='PE',
    row='season', col='SITE_ID', alpha=0.2, ylabel='PE (mm)', xlabel='VWC (%)',
    fontsize = {'title': 15, 'xticks': 9, 'yticks': 9, 'labels':11}, shared_axes=True                                                                                 
)
plot_season
Loading...

Heatmap

The heatmap below allow us to discover temporal patterns from daily means of soil moisture. We observe 2018 contains the lowest consecutive values of VWC.

Source
plot_heatmap = site_daily_all.hvplot.heatmap(
    x='DATE_TIME',
    y='SITE_ID',
    C='COSMOS_VWC',
    xformatter=formatter,
    title='Time series of CRNS soil moisture',
    cmap='RdYlBu',
    width=600,
    height=300,
    xlabel='',
    ylabel='Site ID',
    fontsize = {'title': 15, 'xticks': 12, 'yticks': 15}
)
plot_heatmap
Loading...

Load sub-hourly

The subhourly data contains all preprocessed weather and soil variables, except CRNSs. Let’s explore the columns of the subhourly datasets of one of the stations, SHEEP.

subhourly_dataset = cat.data_siteid(resolution='SH', stationid='SHEEP').read()
print(subhourly_dataset.columns.tolist())
['DATE_TIME', 'SITE_ID', 'LWIN', 'LWOUT', 'SWIN', 'SWOUT', 'RN', 'PRECIP', 'PA', 'TA', 'WS', 'WD', 'Q', 'RH', 'SNOWD_DISTANCE_COR', 'UX', 'UY', 'UZ', 'G1', 'G2', 'TDT1_TSOIL', 'TDT1_VWC', 'TDT2_TSOIL', 'TDT2_VWC', 'TDT3_TSOIL', 'TDT3_VWC', 'TDT4_TSOIL', 'TDT4_VWC', 'TDT5_TSOIL', 'TDT5_VWC', 'TDT6_TSOIL', 'TDT6_VWC', 'TDT7_TSOIL', 'TDT7_VWC', 'TDT8_TSOIL', 'TDT8_VWC', 'TDT9_TSOIL', 'TDT9_VWC', 'TDT10_TSOIL', 'TDT10_VWC', 'STP_TSOIL2', 'STP_TSOIL5', 'STP_TSOIL10', 'STP_TSOIL20', 'STP_TSOIL50']

Similar to the daily observation, the metadata file for subhourly resolution informs variable long names, their resolution, units, aggregation details and data types. In this case, most of the variables are measured. For soil moisture, the measurements provided are by time domain transmissometry (TDT) sensors. These sensors provide point measurements of soil moisture at different depths as it commonly conducted in soil moisture in-situ sensing.

metadata_subhourly = cat.metadata_measurements(resolution='SH').read()
metadata_subhourly
Loading...

Comparison of soil moisture probes

To compare CNRSs (local) and TDT (site) soil moisture measurements at daily resolution, it is necessary to resample the TDT measurements from subhourly to daily. The cell below defines a function to resample and join daily CNRS and resampled TDT. The function yields an interactive hvplot by station ID from the merged observations.

Source
@pn.depends(target_station.param.value)
def site_daily(target_station):
    """Timeseries plot showing the daily mean soil moisture by sensor type"""

    # subhourly
    daily_dataset = cat.data_siteid(resolution='Daily', stationid=target_station).read()
    daily_dataset.index = daily_dataset.DATE_TIME.astype('datetime64[ns]')

    subhourly_dataset = cat.data_siteid(resolution='SH', stationid=target_station).read()

    target_columns = subhourly_dataset.columns.str.endswith('_VWC')

    daily_aggregate = subhourly_dataset.groupby(subhourly_dataset['DATE_TIME'].dt.date, as_index=True)[subhourly_dataset.columns[subhourly_dataset.columns.str.endswith('_VWC')]].mean()
    daily_aggregate.index = daily_aggregate.index.astype('datetime64[ns]')

    daily_joined = daily_dataset.join(daily_aggregate)
    target_columns = subhourly_dataset.columns[subhourly_dataset.columns.str.endswith('_VWC')].tolist() + ['COSMOS_VWC']
    daily_joined = daily_joined[target_columns]
    daily_joined = daily_joined.reset_index()
    daily_joined.index = daily_joined.DATE_TIME.astype('datetime64[ns]')
    daily_joined.dropna(axis=1, how='all', inplace=True)

    daily_joined_long = pd.melt(daily_joined, id_vars='DATE_TIME',
                     var_name="Sensor", value_name="VWC")

    plot_daily = daily_joined_long.hvplot(x='DATE_TIME', y='VWC', by='Sensor',
                            xformatter=formatter,
                            label='Variation in VWC by sensor type',
                            ylabel='Volumetric Water Content (%)',
                            xlabel='Time', xlim=(datetime(2014,1,1), datetime(2019,12,31)))

    return plot_daily.opts(legend_position='top', **settings_lineplots)

settings_lineplots = dict(padding=0.1, height=400, width=700, fontsize={'title': '120%','labels': '120%', 'ticks': '100%'})

plot_timeseries = pn.Row(
    site_daily,
    pn.Column(pn.Spacer(height=5), target_station, background='#f0f0f0', sizing_mode="fixed"),
    width_policy='max', height_policy='max',
)

plot_timeseries.embed()
Loading...

We conclude all sites contain at least two TDT probes, and their temporal sequence follow a similar pattern as the CRNS. It is worth mentioning the pattern might differ when we explore other stations in the full COSMOS-UK dataset which can contain more than two TDT probes.

Soils contain a complex porous structure which means moisture can be non-uniformly distributed horizontally and vertically. For site measurements such as TDTS even distanced a few metres apart they measure “extremely local” moisture (and can sometimes be trapped in a water pocket leading to artificially high VWC or be pressed against a rock and produce artificially low VWC). In contrast, local measurements such as CRNS average over all of this heterogeneity but introduces its own sources of noise (biomass water, surface water, variable depth and horizontal footprint).

Summary

This notebook has demonstrated the use of certain open-source python packages to explore the 2013-2019 COSMOS-UK dataset:

  • intake to easily fetch and manipulate daily and subhourly data, their metadata and other data types (remote images).
  • hvplot to propose some interactive visualisations of hydrometeorological and soil data.
  • pandas to resample subhourly data and merge them into a daily dataset of soil moisture.

Citing this Notebook

Please see CITATION.cff for the full citation information. The citation file can be exported to APA or BibTex formats (learn more here).

Additional information

Review: This notebook has been reviewed by one or more members of the Environmental Data Science book community. The open review is available here.

Dataset: 2013-2019 COSMOS-UK dataset (further details of the version in Stanley et al. (2021)).

License: The code in this notebook is licensed under the MIT License. The Environmental Data Science book is licensed under the Creative Commons by Attribution 4.0 license. See further details here.

Contact: If you have any suggestion or report an issue with this notebook, feel free to create an issue or send a direct message to environmental.ds.book@gmail.com.

Notebook repository version: v2.0.0
Last tested: 2025-04-21
References
  1. Stanley, S., Antoniou, V., Askquith-Ellis, A., Ball, L. A., Bennett, E. S., Blake, J. R., Boorman, D. B., Brooks, M., Clarke, M., Cooper, H. M., Cowan, N., Cumming, A., Evans, J. G., Farrand, P., Fry, M., Hitt, O. E., Lord, W. D., Morrison, R., Nash, G. V., … Winterbourn, B. (2021). Daily and sub-daily hydrometeorological and soil data (2013-2019) [COSMOS-UK]. NERC Environmental Information Data Centre. 10.5285/b5c190e4-e35d-40ea-8fbe-598da03a1185
  2. Zreda, M., Shuttleworth, W. J., Zeng, X., Zweck, C., Desilets, D., Franz, T., & Rosolem, R. (2012). COSMOS: the COsmic-ray Soil Moisture Observing System. Hydrology and Earth System Sciences, 16(11), 4079–4099. 10.5194/hess-16-4079-2012
  3. Evans, J. G., Ward, H. C., Blake, J. R., Hewitt, E. J., Morrison, R., Fry, M., Ball, L. A., Doughty, L. C., Libre, J. W., Hitt, O. E., Rylett, D., Ellis, R. J., Warwick, A. C., Brooks, M., Parkes, M. A., Wright, G. M. H., Singer, A. C., Boorman, D. B., & Jenkins, A. (2016). Soil water content in southern England derived from a cosmic-ray soil moisture observing system – COSMOS-UK. Hydrological Processes, 30(26), 4987–4999. 10.1002/hyp.10929
  4. Cooper, H. M., Bennett, E., Blake, J., Blyth, E., Boorman, D., Cooper, E., Evans, J., Fry, M., Jenkins, A., Morrison, R., Rylett, D., Stanley, S., Szczykulska, M., Trill, E., Antoniou, V., Askquith-Ellis, A., Ball, L., Brooks, M., Clarke, M. A., … Winterbourn, B. (2021). COSMOS-UK: national soil moisture and hydrometeorology data for environmental science research. Earth System Science Data, 13(4), 1737–1757. 10.5194/essd-13-1737-2021