An inverse error variance weighting of the anomalies of three terrestrial evaporation (ET) products from the WACMOS-ET project based on FLUXNET sites is presented. The three ET models were run daily and at a resolution of 25 km for 2002–2007, and based on common input data when possible. The local weights, derived based on the variance of the difference between the tower ET anomalies and the modelled ET anomalies, were made dynamic by estimating them using a 61-day running window centred on each day. These were then extrapolated from the tower locations to the global landscape by regressing them on the main model inputs and derived ET using a neural network. Over the stations, the weighted scheme usefully decreased the random error component, and the weighted ET correlated better with the tower data than a simple average. The global extrapolation produced weights displaying strong seasonal and geographical patterns, which translated into spatiotemporal differences between the ET weighted and simple average ET products. However, the uncertainty of the weights after the extrapolation remained large. Out-sample prediction tests showed that the tower data set, mostly located at temperate regions, had limitations with respect to the representation of different biome and climate conditions. Therefore, even if the local weighting was successful, the extrapolation to a global scale remains problematic, showing a limited added value over the simple average. Overall, this study suggests that merging tower observations and ET products at the timescales and spatial scales of this study is complicated by the tower spatial representativeness, the products' coarse spatial resolution, the nature of the error in both towers and gridded data sets, and how all these factors impact the weights extrapolation from the tower locations to the global landscape.

The surface latent heat flux governs the interactions between the Earth and
its atmosphere

Point-based measurements of land heat fluxes are typically conducted during
field experiments

To derive global estimates, a central challenge remains: ET does not have a
direct signature that can be remotely detected. As an alternative, satellite
remote sensing observations related to surface temperature, soil moisture, or
vegetation can again be combined with traditional flux formulations

Far from discouraging the use of these ET data sets, the inter-product
differences have been perceived as an opportunity to foster research and find
new means to combine these data sets in an optimal manner. So far, these
efforts have ranged from simply averaging a number of ET products

Aiming to improve the predictive capability for ET, the WAter Cycle
Multi-mission Observation Strategy – ET project (WACMOS-ET,

Analyses of the WACMOS-ET estimates showed substantial differences between
the three model products, both at the point scale

The GLEAM, PT-JPL, and PM-MOD models, and the inputs required to run them
globally at a 0.25

GLEAM is a simple land surface model fully dedicated to deriving evaporation. It distinguishes between direct soil evaporation, transpiration from short and tall vegetation, snow sublimation, open-water evaporation, and interception loss from tall vegetation. Interception loss is independently calculated based on the Gash (1979) analytical model forced by observations of precipitation. The remaining components of evaporation are based upon the formulation by Priestley and Taylor (1972) for potential evaporation, constrained by multiplicative stress factors. For transpiration and soil evaporation, the stress factor is calculated based on the content of water in vegetation (microwave vegetation optical depth) and the root zone (multilayer soil model driven by observations of precipitation and updated through assimilation of microwave surface soil moisture). For regions covered by ice and snow, sublimation is calculated using a Priestley and Taylor equation with specific parameters for ice and supercooled waters. For the fraction of open water at each grid cell, the model assumes potential evaporation.

The recent GLEAM v3 model of

The PT-JPL model by

For this study, optimized vegetation products are used as inputs to the
model. In WACMOS-ET, the leaf area index (LAI) and fraction of absorbed
photosynthetic active radiation (FAPAR) products, derived from the Joint
Research Centre Two-Stream Inversion (JRC-TIP) package

The PM-MOD is based on the Monteith (1965) adaptation of Penman (1948), and
the version applied here follows the implementation of

The WACMOS-ET LAI/FAPAR products are used with PM-MOD as in

The weights in a merging scheme are typically based on an estimation of some
measure of product uncertainty. Here the idea is to estimate the weights
proportionally to the agreement between the variations of each ET product and
the tower measurements. In order to do so, we propose the following merging
scheme:

At each tower location, both the different ET products and the
tower observations are decomposed into a time series of anomalies and a
seasonal climatology as follows:

The product anomalies are weighted as follows:

The merged product is finally calculated by adding the weighted
anomalies to the average of the three products' climatology:

In order to produce a global weighted product, an extrapolation of the
weights from the tower space (i.e. the 84 cells where the towers are
located; see Sect.

A standard multi-layer perceptron with an 11-input first layer, one hidden
layer with 30 neurons and sigmoidal activation functions, and one output
layer with 3 neurons and linear activation functions, is used for the
regression. Inputs to the NN are the GLEAM, PT-JPL, and PM-MOD ET together
with the surface net radiation, the near-surface air temperature, the
relative humidity, the soil moisture, the vegetation optical depth, and the
project LAI and FAPAR (see Sect.

The objective of any NN is to model the general distribution of the data, not
the very specific features of the training data set. The existence of these
specific features is unavoidable, as any training data set is always limited
in terms of being a sample of the true distribution. Modelling the specific
features is often referred to as “over-fitting”. To prevent the latter,
standard techniques such as early stopping are applied

Preventing over-fitting only assures the right NN model complexity for the
conditions sampled in the training data set. In this particular case the
limited spatial coverage of the tower stations suggests a poor sampling of the
global conditions (see Sect.

Note that as tower measurements were masked for rainy intervals (see
Sect.

Agreement with the towers' ET is analysed by calculating the Pearson
correlation coefficient (

Statistics are calculated for the complete study period, or separately for
the boreal winter (DJF), spring (MAM), summer (JJA), and autumn (SON). For
the correlations, statistical significance is tested by calculating 95 %
confidence intervals. For the correlation differences, a Fisher

The GLEAM, PT-JPL, and PM-MOD required that global inputs remain unchanged with
respect to

List of the FLUXNET sites used in this study together with their FLUXNET code (ID), IGBP land cover (LC), and official reference or principal investigator (PI). The CA-NS1-7 refers to seven stations closely located and run by the same group.

The FLUXNET 2015 synthesis data set (

Distribution of tower sites used in the study.

Eddy-covariance measurements are subject to errors, both random and
systematic, and any merging technique using them as reference is likely to be
impacted by those errors. Systematic errors can arise from instrumental
calibration and unmet assumptions about the meteorological conditions, while
random errors are typically related to turbulence sampling errors, the
assumptions of a constant footprint area, and instrumental limitations

The propagation of systematic errors typically results in the lack of energy
balance closure observed at many eddy-covariance sites

Moreover, not all stations completely cover the 2002–2007 period, with 6, 14,
24, 9, and 31 stations reporting 2, 3, 4, 5, and 6 years of data within the
period, respectively. At stations where inter-annual variability is large, the
weights may not be representative of the overall climate conditions at the
tower if only a relatively short number of years exist. Limiting the study to
stations with a relatively large number of years could minimize this
drawback, but it would severely reduce the number of towers, so this
filtering has not been applied. For instance, if we only derive weights for
towers with at least 4 years of data, half of the towers would have been
removed. Notice also that due to the masking of the tower data the 61
consecutive daily estimates required to estimate our temporally varying
weights (see Sect.

Because the substantial mismatch between the size of the model grid cells and
the tower footprint is likely to result in representativeness errors,
ancillary data sets are required to characterize the spatial homogeneity of
the grid cells where the stations are located. Two data sets are considered:
the MODIS Land Cover Type product MCD12Q1 at a native resolution of 500 m,
and the Terra MODIS Vegetation Continuous Fields product MOD44B,
available at a spatial resolution of 250 m. A homogeneity index
(

The multi-annual GLEAM, PT-JPL, and PM-MOD total ET, together with their
absolute and relative differences, are shown in Fig.

Summary of GLEAM, PT-JPL, and PM-MOD annual ET
differences.

Next, the ET estimates of GLEAM, PT-JPL, and PM-MOD are evaluated at the
available tower sites. If we look at the towers' spatial distribution in
Fig.

Normalized histograms of ET and available energy (Ae) from GLEAM, PT-JPL, PM-MOD, and the tower observations. The histograms are calculated with the ET values at the tower locations separated first by season and land cover.

An example of good agreement is the forest group in autumn, with the distributions of both ET and Ae being quite similar for the observed and modelled variables. The crops/grass group in summer also shows reasonable agreement between the GLEAM and PT-JPL ET distributions, but larger differences with PM-MOD and the tower ET. In that case, the tower ET shows a clear bimodal distribution, which cannot be replicated any of the models. This may be due to agricultural management practices being poorly captured by the models (e.g. irrigation), but may also reflect the large heterogeneity of croplands and their (a priori) low representativeness of the larger pixel scale. For the shrubs/savanna group during summer, the four ET distributions are quite different, with the Ae distributions also showing differences. For these cases it is difficult to identify whether tower and model ET differences are due to biases in the surface radiation, or discrepancies in the ET formulations.

A summary of daily weight statistics over all the sites belonging to a given
land cover group is given in Fig.

Box plots of the GLEAM (red), PT-JPL (blue), and PM-MOD
(green) seasonal weights for the three land cover groups. The central
mark of the box plots is the median of the group population, the box
edges are the 25th (

An example of the temporal variability of the weights at three towers is
given in Fig.

Example of GLEAM (red), PT-JPL (blue), and PM-MOD (green) weights at the FR-Pue (top, forest), US-SRM (middle, grassland), and US-Ne1 (bottom, cropland) stations. The thick black line marks the one-third value of the SA-merger weights; the thin black lines mark the 0–1 interval.

Figure

The 2006 time series of the different ET products and the
sites shown in Fig.

The performance of the individual and merged products across the different
stations is summarized in Fig.

Season and land-cover averaged ET correlations and RMSD
of the tower and the different products (

Concerning the RMSDs, they are slightly lower for WA-merger for all seasons
except for winter months. As SA-merger and WA-merger share their climatology
(see Sect.

The local weights at the 84 stations have been extrapolated by the NN as
described in Sect.

Seasonally averaged global weights for GLEAM

The seasonally averaged ET differences between WA-merger and SA-merger,
normalized by the seasonal SA-merger, are plotted in
Fig.

Some geographical structures and seasonal changes are visible in some
regions. For instance, in the sub-Saharan transition zone the differences are
positive in the first half of the year (WA-merger

Seasonally averaged normalized ET differences between SA-merger and WA-merger, expressed as a percentage of the seasonally averaged SA-merger ET. Red (blue) colours indicate positive (negative) differences.

Our inverse error variance weighting is based on the differences between the
model and tower ET anomalies. However, it is expected that part of the
difference between in situ measurements of ET and model estimates respond to
the mismatch in spatial resolution (tower footprint versus model cell). The
RMSD of SA-merger against the towers' ET, normalized by the mean annual tower
ET, is displayed in Fig.

Homogeneity index (

The objective on an inverse error-variance weighting is to find the estimate
that minimizes the variance of the random error

For the three land cover groups the random

Nevertheless, even if optimality in the sense of minimizing the error
variance of the WA-merger cannot be assured, weighting the anomalies should
result in a decrease in the random error. This is shown in
Fig.

The number of stations used in this merging exercise is certainly limited in terms of covering different biomes and climatic conditions. Hence, the ability to represent the full distribution of ET across time, space, and biomes is questionable. This is verified here by out-sampling the NN training data set in two different ways. In the first test all stations are included in the tower data set, i.e. the standard configuration used to produce the global WA-merger. Before training the NN, 15 % of the days at each station are randomly masked from the training data set, and the prediction statistics are derived over this independent subset. In the second test, the station where the prediction will be checked is entirely removed from the training data set, i.e, the weights for that station are derived using a NN that did not include that station in the training phase (i.e. leave-one out cross-validation).

Box plot showing for the three land cover groups the
correlation

A box plot summarizing the correlation and RMSD between the station weights
and the weights predicted by the NN for these two tests is presented in
Fig.

An additional test to check the representativeness of the tower data set is
conducted by globally extrapolating the weights with each of the previous 84
NNs trained without one station, and then checking the variability of the
predicted weights. For the conditions well represented in the training data
set, it is expected that removing one station will only result in slight
changes in the extrapolated weights. However, for regions that are poorly
represented, a slightly different data set is likely to result in
substantially different weights. This is illustrated in
Fig.

Relative annual variability of the global weights extrapolated by 84 different NNs. Smaller (larger) values indicate lower (higher) variability. See the text for details.

The evaluation of ET products is typically conducted by comparing the
estimates to point-scale tower fluxes. Alternatively, water balance
calculations at larger spatial scales – such as catchment scales – where ET
is estimated as the residual of precipitation (

Scatter plots of

The mass balance of a catchment implies that the space and time integration
of

Scatter plots showing the correspondence between

A simple average (SA-merger) and an inverse error variance weighting (WA-merger) of the three global ET products generated during the WACMOS-ET project is presented. During the project, three ET models were forced with common daily inputs at a resolution of 25 km for the period 2002–2007: GLEAM, PT-JPL, and PM-MOD. GLEAM and PT-JPL share a Priestley–Taylor formulation to estimate potential evaporation, while PM-MOD uses a more different modelling approach of potential evaporation based on a Penman–Monteith formulation, but a very similar evaporative stress and radiation partitioning formulation to the one by PT-JPL. In WA-merger, the weights were estimated using the error variance of the individual product anomalies, with the error defined as the difference between tower-based ET anomalies and modelled ET anomalies for non-rainy conditions. Then the final data set was reconstructed by adding the weighted anomalies to the mean seasonal climatology of the products. A similar approach was followed to generate SA-merger, but in this case giving equal weights to the anomalies of all three products. Finally, the potential to extrapolate these locally estimated weights to the global scale based on a neural network approach has been explored. Given the described framework, the intent here is to evaluate the potential of blending these data sets to yield anomalies of ET that better represent those measured by the global network of eddy-covariance towers. We note that capturing anomalies in ET is crucial for applications such as drought monitoring or irrigation planning.

The resulting local weights showed seasonal patterns and negative values at many stations. This was to a large extent related to correlation in the errors of the anomalies of GLEAM and PT-JPL. Nonetheless, seasonal correlations between WA-merger and the tower ET are overall higher than for the individual products and SA-merger. This is mostly attributed to a successful reduction in the random error. Meanwhile, the globally extrapolated weights showed seasonal and regional variability, with these patterns resulting in seasonal differences between the global SA-merger and WA-merger of up to 25 % in a large number of regions. However, the limited global coverage of the tower stations, mostly located in the Northern Hemisphere temperate regions, cast doubts on the ability of the NN prediction scheme to reliably extrapolate the locally estimated weights. This was apparent when the extrapolation was tested over individual stations with the training data set not including the station under study, and when reproducing the global extrapolation of the weights with the training data set missing one station at a time. Both mergers were also compared with the ET inferred from water balance calculations in different catchments across the globe, and similar correlations and RMSDs were obtained, with only slightly better results for the WA-merger over wet basins.

Several limiting factors for the merging exercise are identified, some of which could be informative for other initiatives aiming to blend ET data sets. A longer study period can give access to more in situ data and extend the in situ data set to less represented regions. This would clearly help the global extrapolation of the weights. In addition, the mismatch between the spatial resolution of the towers and the products is still an issue, despite the fact that here other error sources were deemed to be more dominant. The impact of the mismatch in spatial resolution is expected to be minimized as ET data sets move towards finer spatial resolutions. Dependency between the ET products can also have an impact on the merged products. In this study the GLEAM, PT-JPL, and PM-MOD products are derived with common data sets for their shared inputs. While this was motivated by the primary objective of WACMOS-ET of studying algorithm differences, this is can become a drawback when aiming to achieve an optimal merger. In that case a lower inter-dependency is expected to be beneficial.

Overall, our study suggests that an inverse error variance scheme combining
information from tower observations and ET products has the potential to
improve upon the simple mean proposed by several previous efforts

The WACMOS-ET data sets are freely
available upon request.
For instructions on accessing the data, please visit
the project website
(

All authors have been involved in interpreting the results, discussing the findings, and editing the paper. CJ conducted the main analysis and wrote the draft of the paper. BM and DGM provided guidance to run the GLEAM model and expertise on processing and analysing the FLUXNET data. JBF provided guidance on using the PT-JPL model and the adaptations required to run the model with the WACMOS-ET inputs. HEB made accessible the precipitation and run-off data, and provided expertise on the catchment mass balance analysis.

The authors declare that they have no conflict of interest.

This study was funded by the European Space Agency (ESA) and conducted as part of the project WACMOS-ET-Ensemble (ESRIN contract no. 4000117355/16/I-NB). Diego G. Miralles acknowledges support from the European Research Council (ERC) under grant agreement number 715254 (DRY-2-DRY). Joshua B. Fischer contributed to this paper at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. The California Institute of Technology and government sponsorship is acknowledged. Support to Joshua B. Fisher was provided by NASA's SUSMAP, THP, and INCA programs, and the ECOSTRESS mission. Kevin P. Tu, from the Department of Ecosystem and Conservation Sciences, University of Montana, is acknowledged by providing guidance for the use of the vegetation products in this study. This work used eddy-covariance data acquired by the FLUXNET community and in particular by the following networks: AmeriFlux (U.S. Department of Energy, Biological and Environmental Research, Terrestrial Carbon Program (DE-FG02-04ER63917 and DE-FG02-04ER63911)), AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, Fluxnet-Canada (supported by CFCAS, NSERC, BIOCAP, Environment Canada, and NRCan), GreenGrass, KoFlux, LBA, NECC, OzFlux, TCOS-Siberia, USCCC. Data and logistical support for the station US-Wrc were provided by the US Forest Service Pacific Northwest Research Station. The FLUXNET eddy-covariance data processing and harmonization were carried out by the ICOS Ecosystem Thematic Center, the AmeriFlux Management Project, and the Fluxdata project of FLUXNET (with the support of CDIAC), as well as the OzFlux, ChinaFlux, and AsiaFlux offices. Edited by: Harrie-Jan Hendricks Franssen Reviewed by: three anonymous referees