Cosmos-UK Soil Moisture (UKCEH)
Context¶
Purpose¶
To load and visualise daily hydrometeorological and soil data from the 2013-2019 public COSMOS-UK dataset Stanley et al., 2021.
Sensor description¶
Since 2013 the UK Centre for Ecology & Hydrology (UKCEH) has established the world’s most spatially dense national network of cosmic-ray neutron sensors (CRNSs) Zreda et al., 2012 to monitor soil moisture across the UK. The Cosmic-ray Soil Moisture Observing System for the UK (COSMOS-UK) delivers field-scale soil water volumetric content (VWC) measurements for around 50 sites in near-real time. In addition to measuring field-scale (or local) soil moisture, the network collects a large number of hydrometeorological and soil data variables, including VWC measured by point-scale (or site) soil moisture sensors Evans et al., 2016.
This notebook explores a subset of 4 out of 51 stations available in the public COSMOS-UK dataset Stanley et al., 2021. These stations represent the first sites to prototype COSMOS sensors in the UK, see further details in Evans et al. (2016) and they are situated in human-intervened areas (grassland and cropland), except for one in a woodland land cover site.
The media below, available in the UKCEH YouTube channel, summarises the concept of cosmic-ray neutron sensors and how they provide non-invasive soil moisture measurments at field scale.
COSMOS-UK using cosmic-ray neutron sensors to monitor soil moisture. Source: UKCEH.
Highlights¶
- Fetch COSMOS-UK dataset files through
intake
. - Inspect the available metadata with information about the sites, their locations and other site-specific attributes.
- Explore relationships between daily mean soil moisture and potential evapotranspiration derived from the meteorological measurements at the site.
- Analyse yearly change of daily mean soil moisture observations.
- Compare local and site soil moisture measurements at daily resolution.
Contributions¶
Dataset originator/creator¶
- UK Centre for Ecology & Hydrology (creator)
- Natural Environment Research Council (support)
Load libraries¶
Source
import os
import pandas as pd
import intake
import holoviews as hv
import panel as pn
import matplotlib.pyplot as plt
from bokeh.models.formatters import DatetimeTickFormatter
from datetime import datetime
import hvplot.pandas
import hvplot.xarray # noqa
import pooch
import warnings
warnings.filterwarnings(action='ignore')
pd.options.display.max_columns = 10
hv.extension('bokeh')
pn.extension()
Set project structure¶
notebook_folder = './notebook'
if not os.path.exists(notebook_folder):
os.makedirs(notebook_folder)
Fetch and load data¶
Let’s download the sample data. We use pooch to fetch and unzip them directly from a Zenodo repository.
pooch.retrieve(
url="doi:10.5281/zenodo.6567018/subset_COSMOS-UK_HydroSoil_Daily_2013-2019.zip",
known_hash="md5:3755cb069bc48c5efc081905110e169b",
processor=pooch.Unzip(extract_dir=os.path.join(notebook_folder,'data')),
path=f".",
)
Downloading data from 'doi:10.5281/zenodo.6567018/subset_COSMOS-UK_HydroSoil_Daily_2013-2019.zip' to file '/home/jovyan/97469708ef44493ff5e8878f93e00890-subset_COSMOS-UK_HydroSoil_Daily_2013-2019.zip'.
Unzipping contents of '/home/jovyan/97469708ef44493ff5e8878f93e00890-subset_COSMOS-UK_HydroSoil_Daily_2013-2019.zip' to '/home/jovyan/./notebook/data'
['/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_Daily_2013-2019_Metadata.csv',
'/home/jovyan/./notebook/data/COSMOS-UK_SiteMetadata_2013-2019.csv',
'/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_SH_2013-2019_Metadata.csv',
'/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_SH_2013-2019/COSMOS-UK_CHIMN_HydroSoil_SH_2013-2019.csv',
'/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_SH_2013-2019/COSMOS-UK_WYTH1_HydroSoil_SH_2013-2019.csv',
'/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_SH_2013-2019/COSMOS-UK_WADDN_HydroSoil_SH_2013-2019.csv',
'/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_SH_2013-2019/COSMOS-UK_SHEEP_HydroSoil_SH_2013-2019.csv',
'/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_Daily_2013-2019/COSMOS-UK_SHEEP_HydroSoil_Daily_2013-2019.csv',
'/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_Daily_2013-2019/COSMOS-UK_WYTH1_HydroSoil_Daily_2013-2019.csv',
'/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_Daily_2013-2019/COSMOS-UK_WADDN_HydroSoil_Daily_2013-2019.csv',
'/home/jovyan/./notebook/data/COSMOS-UK_HydroSoil_Daily_2013-2019/COSMOS-UK_CHIMN_HydroSoil_Daily_2013-2019.csv']
Load an intake catalog for the downloaded data
Source
# set catalogue location
catalog_file = os.path.join(notebook_folder, 'catalog.yaml')
with open(catalog_file, 'w') as f:
f.write('''
sources:
data_siteid:
driver: intake.source.csv.CSVSource
parameters:
stationid:
description: five letter code for the COSMOS-UK site
type: str
default: CHIMN
resolution:
description: temporal resolution
type: str
default: Daily
allowed:
- Daily
- Hourly
- SH
args:
urlpath: "{{ CATALOG_DIR }}/data/COSMOS-UK_HydroSoil_{{resolution}}_2013-2019/COSMOS-UK_{{stationid}}_HydroSoil_{{resolution}}_2013-2019.csv"
csv_kwargs:
na_values: [-9999]
parse_dates: ['DATE_TIME']
data_all:
driver: intake.source.csv.CSVSource
parameters:
resolution:
description: temporal resolution
type: str
default: Daily
allowed:
- Daily
- Hourly
- SH
args:
urlpath: "{{ CATALOG_DIR }}/data/COSMOS-UK_HydroSoil_{{resolution}}_2013-2019/COSMOS-UK_*.csv"
csv_kwargs:
na_values: [-9999]
parse_dates: ['DATE_TIME']
metadata_sites:
driver: intake.source.csv.CSVSource
args:
urlpath: "{{ CATALOG_DIR }}/data/COSMOS-UK_SiteMetadata_2013-2019.csv"
csv_kwargs:
header: 0
parse_dates: [ 'START_DATE','END_DATE']
metadata_measurements:
driver: intake.source.csv.CSVSource
parameters:
resolution:
description: temporal resolution
type: str
default: Daily
allowed:
- Daily
- Hourly
- SH
args:
urlpath: "{{ CATALOG_DIR }}/data/COSMOS-UK_HydroSoil_{{resolution}}_2013-2019_Metadata.csv"
csv_kwargs:
header: 0
location:
driver: intake_xarray.image.ImageSource
parameters:
stationid:
description: five letter code for the COSMOS-UK site
type: str
default: CHIMN
args:
urlpath: "https://eip.ceh.ac.uk/hydrodata/cosmos-uk/maps/airphoto/1000px/{{stationid}}.jpg"
storage_options: {'anon': True}
''')
cat = intake.open_catalog(catalog_file)
Load metadata¶
Here we load COSMOS-UK metadata into memory. The metadata contains multiple columns about the sites, their locations and other site-specific attributes.
metadata = cat.metadata_sites().read()
print(metadata.columns.tolist())
['SITE_NAME', 'SITE_ID', 'START_DATE', 'END_DATE', 'EASTING', 'NORTHING', 'EAST_NORTH_EPSG', 'LATITUDE', 'LONGITUDE', 'LAT_LONG_ESPG', 'ALTITUDE', 'SOIL_TYPE', 'LAND_COVER', 'BULK_DENSITY', 'BULK_DENSITY_SD', 'SOIL_ORGANIC_CARBON', 'SOIL_ORGANIC_CARBON_SD', 'LATTICE_WATER', 'LATTICE_WATER_SD']
metadata
For this example, we will explore a subset of four stations, all of them with start date in 2013. Only the Wytham Woods station ceased on 10th January 2016. This station is situated in a Broadleaf woodland land cover which also hosts Environmental Change Network (ECN) and FLUXNET monitoring sites (see further details here). The dataframe contains each site name, id and corresponding land cover. CHIMN and WADDN are located situated in improved grassland, and SHEEP is in arable and horticulture.
metadata[['SITE_NAME','SITE_ID','LAND_COVER']]
A key feature of COSMOS-UK stations is their capability of monitoring field-scale soil moisture. The CRNSs VWC value is an average soil moisture measurement (%) across an estimated, variable footprint of radius up to 200 m and estimated variable measurement depth of between approximately 0.1 and 0.8 m. It is worth mentioning the measurement depth depends on the soil moisture content as well as lattice water and soil organic matter water equivalent (see Cooper et al. 2021). The greater the actual soil water content, the shallower the penetrative depth. Let’s explore the notional footprint of the analysed stations from the CEH COSMOS-UK website.
Source
# set sliders
station_list = list(metadata.SITE_ID.tolist())
target_station = pn.widgets.Select(name = 'Station', options = station_list)
@pn.depends(target_station.param.value)
def plot_footprint(station):
location_da = cat.location(stationid=station).to_dask()
p = location_da.hvplot.rgb(x='x', y='y', bands='channel', data_aspect=1, flip_yaxis=True, xaxis=False, yaxis=None, hover=False)
return p
plot_stations = pn.Row(
plot_footprint,
pn.Column(pn.Spacer(height=5), target_station, background='#f0f0f0', sizing_mode="fixed"),
width_policy='max', height_policy='max',
)
plot_stations.embed()
Load daily data¶
Here we load COSMOS-UK daily data into memory. The daily data is the level with the highest processing and derived from subhourly data. Note only certain variables are provided at this level.
site_daily_all = cat.data_all(resolution='Daily').read()
print(site_daily_all.columns.tolist())
['DATE_TIME', 'SITE_ID', 'COSMOS_VWC', 'D86_75M', 'SWE', 'SNOW', 'ALBEDO', 'PE']
To further understand the meaning of above columns, the COSMOS-UK dataset include a separate metadata file by time resolution. Let’s explore the metadata for daily measurements. The dataframe below includes further details of each variable, including the unit, aggregation and data type. For instance, soil moisture measurements at daily resolution refer to the daily mean derived from CRNSs.
metadata_daily = cat.metadata_measurements(resolution='Daily').read()
metadata_daily
Timeseries¶
The plot below shows two timeseries, soil moisture and potential evapotranspiration (PE), provided by the daily COSMOS-UK dataset. PE refers to the potential evaporation from soils plus transpiration by plants (so called evapotranspiration). PE assumes there is always adequate moisture to match the evapotranspiration demand. We evidence this relationship in the daily aggregated data of both variables as it is shown in the plot below. We also note each station has a different time span with the SITE_ID equal to WYTH1 containing the shortest records.
Source
# set sliders
station_list = list(metadata.SITE_ID.tolist())
target_station = pn.widgets.Select(name = 'Station', options = station_list)
# set formater for dates
formatter = DatetimeTickFormatter(months='%b %Y')
@pn.depends(target_station.param.value)
def plot_pe_vwc(station):
daily_dataset = cat.data_siteid(resolution='Daily', stationid=station).read()
daily_dataset.dropna(subset = ['COSMOS_VWC','PE'], inplace=True) #remove empty rows
p1=daily_dataset.hvplot(x='DATE_TIME', y=['COSMOS_VWC'], xformatter=formatter, xlabel = 'Date', ylabel = 'Volumetric Water Content (%)', color='blue', title='Soil Moisture (CRNS VWC)', line_width=0.8, fontscale=1.2, padding=0.2)
p2=daily_dataset.hvplot(x='DATE_TIME', y=['PE'], xformatter=formatter, xlabel = 'Date', ylabel = 'Potential Evapotranspiration (mm)', color='red', title='Potential Evapotranspiration (1 day)', line_width=0.8, fontscale=1.2, padding=0.2)
return (p1 + p2).cols(1)
plot_scatterplot = pn.Row(
plot_pe_vwc,
pn.Column(pn.Spacer(height=5), target_station, background='#f0f0f0', sizing_mode="fixed"),
width_policy='max', height_policy='max',
)
plot_scatterplot.embed()
Correlation¶
To explore further the seasonal dynamics of the above variables, let’s generate correlation charts grouped by season. The highest values of PE are in the summer followed by spring, fall and winter. The forest site, WYTH1, has higher soil moisture values than the human-intervened places, CHIMN and WADDN (improved grassland), and SHEEP (arable and horticulture site).
Source
def season(df):
"""Add season column based on lat and month
"""
seasons = {3: 'spring', 4: 'spring', 5: 'spring',
6: 'summer', 7: 'summer', 8: 'summer',
9: 'fall', 10: 'fall', 11: 'fall',
12: 'winter', 1: 'winter', 2: 'winter'}
return df.assign(season=df.DATE_TIME.dt.month.map(seasons))
site_daily_all = season(site_daily_all)
custom_dict = {'winter': 0, 'spring': 1, 'summer': 3, 'fall':4}
plot_season = site_daily_all.sort_values('season', key=lambda x: x.map(custom_dict)).hvplot.scatter(x='COSMOS_VWC', y='PE',
row='season', col='SITE_ID', alpha=0.2, ylabel='PE (mm)', xlabel='VWC (%)',
fontsize = {'title': 15, 'xticks': 9, 'yticks': 9, 'labels':11}, shared_axes=True
)
plot_season
Heatmap¶
The heatmap below allow us to discover temporal patterns from daily means of soil moisture. We observe 2018 contains the lowest consecutive values of VWC.
Source
plot_heatmap = site_daily_all.hvplot.heatmap(
x='DATE_TIME',
y='SITE_ID',
C='COSMOS_VWC',
xformatter=formatter,
title='Time series of CRNS soil moisture',
cmap='RdYlBu',
width=600,
height=300,
xlabel='',
ylabel='Site ID',
fontsize = {'title': 15, 'xticks': 12, 'yticks': 15}
)
plot_heatmap
Load sub-hourly¶
The subhourly data contains all preprocessed weather and soil variables, except CRNSs. Let’s explore the columns of the subhourly datasets of one of the stations, SHEEP.
subhourly_dataset = cat.data_siteid(resolution='SH', stationid='SHEEP').read()
print(subhourly_dataset.columns.tolist())
['DATE_TIME', 'SITE_ID', 'LWIN', 'LWOUT', 'SWIN', 'SWOUT', 'RN', 'PRECIP', 'PA', 'TA', 'WS', 'WD', 'Q', 'RH', 'SNOWD_DISTANCE_COR', 'UX', 'UY', 'UZ', 'G1', 'G2', 'TDT1_TSOIL', 'TDT1_VWC', 'TDT2_TSOIL', 'TDT2_VWC', 'TDT3_TSOIL', 'TDT3_VWC', 'TDT4_TSOIL', 'TDT4_VWC', 'TDT5_TSOIL', 'TDT5_VWC', 'TDT6_TSOIL', 'TDT6_VWC', 'TDT7_TSOIL', 'TDT7_VWC', 'TDT8_TSOIL', 'TDT8_VWC', 'TDT9_TSOIL', 'TDT9_VWC', 'TDT10_TSOIL', 'TDT10_VWC', 'STP_TSOIL2', 'STP_TSOIL5', 'STP_TSOIL10', 'STP_TSOIL20', 'STP_TSOIL50']
Similar to the daily observation, the metadata file for subhourly resolution informs variable long names, their resolution, units, aggregation details and data types. In this case, most of the variables are measured. For soil moisture, the measurements provided are by time domain transmissometry (TDT) sensors. These sensors provide point measurements of soil moisture at different depths as it commonly conducted in soil moisture in-situ sensing.
metadata_subhourly = cat.metadata_measurements(resolution='SH').read()
metadata_subhourly
Comparison of soil moisture probes¶
To compare CNRSs (local) and TDT (site) soil moisture measurements at daily resolution, it is necessary to resample the TDT measurements from subhourly to daily. The cell below defines a function to resample and join daily CNRS and resampled TDT. The function yields an interactive hvplot
by station ID from the merged observations.
Source
@pn.depends(target_station.param.value)
def site_daily(target_station):
"""Timeseries plot showing the daily mean soil moisture by sensor type"""
# subhourly
daily_dataset = cat.data_siteid(resolution='Daily', stationid=target_station).read()
daily_dataset.index = daily_dataset.DATE_TIME.astype('datetime64[ns]')
subhourly_dataset = cat.data_siteid(resolution='SH', stationid=target_station).read()
target_columns = subhourly_dataset.columns.str.endswith('_VWC')
daily_aggregate = subhourly_dataset.groupby(subhourly_dataset['DATE_TIME'].dt.date, as_index=True)[subhourly_dataset.columns[subhourly_dataset.columns.str.endswith('_VWC')]].mean()
daily_aggregate.index = daily_aggregate.index.astype('datetime64[ns]')
daily_joined = daily_dataset.join(daily_aggregate)
target_columns = subhourly_dataset.columns[subhourly_dataset.columns.str.endswith('_VWC')].tolist() + ['COSMOS_VWC']
daily_joined = daily_joined[target_columns]
daily_joined = daily_joined.reset_index()
daily_joined.index = daily_joined.DATE_TIME.astype('datetime64[ns]')
daily_joined.dropna(axis=1, how='all', inplace=True)
daily_joined_long = pd.melt(daily_joined, id_vars='DATE_TIME',
var_name="Sensor", value_name="VWC")
plot_daily = daily_joined_long.hvplot(x='DATE_TIME', y='VWC', by='Sensor',
xformatter=formatter,
label='Variation in VWC by sensor type',
ylabel='Volumetric Water Content (%)',
xlabel='Time', xlim=(datetime(2014,1,1), datetime(2019,12,31)))
return plot_daily.opts(legend_position='top', **settings_lineplots)
settings_lineplots = dict(padding=0.1, height=400, width=700, fontsize={'title': '120%','labels': '120%', 'ticks': '100%'})
plot_timeseries = pn.Row(
site_daily,
pn.Column(pn.Spacer(height=5), target_station, background='#f0f0f0', sizing_mode="fixed"),
width_policy='max', height_policy='max',
)
plot_timeseries.embed()
We conclude all sites contain at least two TDT probes, and their temporal sequence follow a similar pattern as the CRNS. It is worth mentioning the pattern might differ when we explore other stations in the full COSMOS-UK dataset which can contain more than two TDT probes.
Soils contain a complex porous structure which means moisture can be non-uniformly distributed horizontally and vertically. For site measurements such as TDTS even distanced a few metres apart they measure “extremely local” moisture (and can sometimes be trapped in a water pocket leading to artificially high VWC or be pressed against a rock and produce artificially low VWC). In contrast, local measurements such as CRNS average over all of this heterogeneity but introduces its own sources of noise (biomass water, surface water, variable depth and horizontal footprint).
Summary¶
This notebook has demonstrated the use of certain open-source python packages to explore the 2013-2019 COSMOS-UK dataset:
intake
to easily fetch and manipulate daily and subhourly data, their metadata and other data types (remote images).hvplot
to propose some interactive visualisations of hydrometeorological and soil data.pandas
to resample subhourly data and merge them into a daily dataset of soil moisture.
Citing this Notebook¶
Please see CITATION.cff for the full citation information. The citation file can be exported to APA or BibTex formats (learn more here).
Additional information¶
Review: This notebook has been reviewed by one or more members of the Environmental Data Science book community. The open review is available here.
Dataset: 2013-2019 COSMOS-UK dataset (further details of the version in Stanley et al. (2021)).
License: The code in this notebook is licensed under the MIT License. The Environmental Data Science book is licensed under the Creative Commons by Attribution 4.0 license. See further details here.
Contact: If you have any suggestion or report an issue with this notebook, feel free to create an issue or send a direct message to environmental
Notebook repository version: v2.0.0
Last tested: 2025-04-21
- Stanley, S., Antoniou, V., Askquith-Ellis, A., Ball, L. A., Bennett, E. S., Blake, J. R., Boorman, D. B., Brooks, M., Clarke, M., Cooper, H. M., Cowan, N., Cumming, A., Evans, J. G., Farrand, P., Fry, M., Hitt, O. E., Lord, W. D., Morrison, R., Nash, G. V., … Winterbourn, B. (2021). Daily and sub-daily hydrometeorological and soil data (2013-2019) [COSMOS-UK]. NERC Environmental Information Data Centre. 10.5285/b5c190e4-e35d-40ea-8fbe-598da03a1185
- Zreda, M., Shuttleworth, W. J., Zeng, X., Zweck, C., Desilets, D., Franz, T., & Rosolem, R. (2012). COSMOS: the COsmic-ray Soil Moisture Observing System. Hydrology and Earth System Sciences, 16(11), 4079–4099. 10.5194/hess-16-4079-2012
- Evans, J. G., Ward, H. C., Blake, J. R., Hewitt, E. J., Morrison, R., Fry, M., Ball, L. A., Doughty, L. C., Libre, J. W., Hitt, O. E., Rylett, D., Ellis, R. J., Warwick, A. C., Brooks, M., Parkes, M. A., Wright, G. M. H., Singer, A. C., Boorman, D. B., & Jenkins, A. (2016). Soil water content in southern England derived from a cosmic-ray soil moisture observing system – COSMOS-UK. Hydrological Processes, 30(26), 4987–4999. 10.1002/hyp.10929
- Cooper, H. M., Bennett, E., Blake, J., Blyth, E., Boorman, D., Cooper, E., Evans, J., Fry, M., Jenkins, A., Morrison, R., Rylett, D., Stanley, S., Szczykulska, M., Trill, E., Antoniou, V., Askquith-Ellis, A., Ball, L., Brooks, M., Clarke, M. A., … Winterbourn, B. (2021). COSMOS-UK: national soil moisture and hydrometeorology data for environmental science research. Earth System Science Data, 13(4), 1737–1757. 10.5194/essd-13-1737-2021