OR/15/048 Data availability and analytical methodology

From Earthwise
Jump to navigation Jump to search
Bearcock, J M, Smedley, P L and Milne, C J. 2015. Baseline groundwater chemistry: the Corallian of the Vale of Pickering, Yorkshire. British Geological Survey Internal Report, OR/15/048.

Groundwater data sources

Groundwater chemical data were acquired for the area by collection of 24 new groundwater samples, and collation with unique sample site data from the Environment Agency (EA) database (1 additional site). The EA data include analyses of major elements, selected trace elements and field-determined parameters. These data comprise time-series analyses, generally spanning from 1995 to 2006. These time-series data are discussed separately in Section 5.5. Where there was a site monitored by the EA, but not sampled by the BGS, the most recent data were added to the 24 new groundwater samples to expand the spatial coverage and create a data set of 25 sites (see Figure 4). Data for 2006 were selected to be comparable to associated BGS analyses. All the groundwaters in this study were located in zones D or E as defined by Reeves et al. (1978)[1] (See Regional geology and hydrogeology).

Groundwater sampling and analysis

A total of 24 groundwater samples were collected by BGS during September 2006 from commercial abstraction and private boreholes exploiting the Corallian aquifer of the Vale of Pickering. These were almost exclusively from the unconfined aquifer exposed on the periphery of the Vale of Pickering. Few boreholes are available within the confined part of the Corallian aquifer below Jurassic clays. The sample locations are shown in Figure 4.

Samples were mostly collected from continuously pumping boreholes, although in the case of private boreholes this was not always possible. Where practical, the pumps at private sources were switched on at least 10 minutes prior to samples being taken. Efforts were made to sample the groundwater as close to the borehole as possible and with minimum transport through pipes or hoses. Sampling from storage tanks was mostly avoided. At one site this could not be avoided, although the rapid flush rate through the tank meant that a sample could be obtained that had only had a residence time in the tank of a few hours.

At each site, measurements were made of temperature, specific electrical conductance (SEC), alkalinity (by titration against H2SO4), pH, dissolved oxygen (DO) and redox potential (Eh). The latter three parameters were measured in a flow cell to prevent contact with the atmosphere and parameters were monitored until stable readings were obtained. In a few cases, use of a flow cell was not possible and on-site parameters were measured rapidly in a bucket. In each case a note was made of the sampling conditions.

Groundwater samples were also taken at each site for laboratory analysis. Samples for major- and trace-element analysis were collected in rinsed polyethylene bottles and filtered to <0.2 µm. Filtration was performed using either an in-line reusable filter holder attached to the outflow of the flow cell or a disposable filter and syringe. Those required for cation and trace-element analysis were acidified to 1% (v/v) HNO3 to prevent metal precipitation and minimise sorption onto the container walls. Aliquots of the sample filtered to <0.2 µm were also collected in polyethylene bottles preloaded with potassium persulphate for the determination of total dissolved phosphorus (TDP).

Samples for dissolved organic carbon (DOC) analysis were filtered through a 0.45 µm silver- impregnated filter and collected in glass vials pre-cleaned with chromic acid. Samples for the determination of stable isotopes (18O and 2H in water and 13C in dissolved inorganic carbon) were collected unfiltered in rinsed glass bottles.

Analysis of major cations and sulphate was carried out by inductively-coupled plasma optical emission spectrometry (ICP-OES); Cl, NO3, Br and F were determined by ion chromatography (IC), NH4, NO2 and I by automated colorimetry (AC), stable isotopes by mass spectrometry and a range of trace elements by inductively-coupled plasma mass spectrometry (ICP-MS). With the exception of TDP, analyses were carried at the BGS laboratories in Wallingford and Keyworth. Total dissolved phosphorus was analysed by sample digestion followed by chromatography using the molybdenum blue method at the CEH laboratory in Wallingford.

Data handling

The data collected as a result of the sampling campaign and the archive EA data were combined into one set for the purposes of statistical handling and interpretation. For many trace elements, the concentrations were below the detection limits of the analytical techniques used. As the data reported were obtained from more than one laboratory source and by more than one method, the detection limits for any given determinand can vary, indeed detection limits can vary from day to day on a single instrument. This produces left-censored data sets that require special statistical analysis methods for calculating summary statistics.

Summary statistics were calculated in the R statistical computer environment using the NADA package. This package is used to perform statistical analysis on censored data and uses the methods described in Helsel (2005)[2]. The methods used to summarise the combined BGS-EA dataset were the Kaplan-Meier (K-M) method the 'regression on order statistics' (ROS) method. These methods can both be used to summarise multiply-censored data sets (Lee and Helsel, 2005b[3], 2007[4]). The ROS method is particularly useful for small data sets (n<30) where other methods may become inaccurate. It is also particularly useful where the non-detects comprise up to 80% of the data set.

As noted by Lee and Helsel (2005a[5], b[6]), where the data set has greater than 80% non-detects, the estimated summary statistics are very tenuous. They suggest that in such cases the data can only be summarised by presenting minimum and maximum concentrations (Lee and Helsel, 2005a[5]). The summary statistics in this report were calculated using each of the above methods. The most appropriate method for each analyte was taken following the recommendations of Helsel (2005)[2], as given below:

<50% non-detects K-M method
50% and 80% non-detects ROS method
>80% non-detects ranges only quoted.

The summary statistics reported are therefore derived from a range of methods and do not all present the same parameters.

In the Baseline report series the 95th percentile of a data distribution has typically been used as an upper cut-off for outlier compositions. The choice of percentile is somewhat arbitrary and other percentiles have been used within the literature. The 90–95th percentile was used by Lee and Helsel (2005a)[5] and the 97.7th percentile was used by Langmuir (1997)[7]. While using percentiles as an upper limit provides a simple definition of outliers, the method clearly has its limitations. For example, many UK groundwaters are contaminated by nitrate derived from long-term use of nitrogenous fertilisers in agriculture. Nitrate concentrations are therefore variable and the 95% threshold in unconfined aquifers rarely represents a cut-off between natural and anthropogenically-influenced compositions. Likewise, for some elements data presented above a given threshold are presented as anomalous, when they can in fact represent baseline concentrations. However, the 95th percentile represents a simplification to exclude the upper 5% of the data distribution and has been used in the Baseline report series as one measure for estimating likely upper limits to baseline concentrations. Concentrations above this threshold are unlikely to be exceeded in future samples unless conditions within the aquifer have changed. It should be emphasised that this is not the only factor used when attempting to characterise the baseline groundwater compositions. A combination of understanding of the hydrogeological and geochemical processes, rainfall compositions, land use and residence times, together with temporal variability observed through time-series data are also taken into consideration.

In addition to the statistical analysis, saturation indices were calculated for the newly collected groundwater samples using PHREEQCi and the wateq4f.dat database. Saturation indices will be discussed where appropriate in Section 5. It should be remembered that minerals which are predicted to dissolve or precipitate may not actually do so because of kinetic constraints or indeed absence of the mineral in the case of dissolution (Zhu and Anderson, 2002 [8]).


  1. REEVES, M J, PARRY, E L, and RICHARDSON, G. 1978. Preliminary investigation of the groundwater resources of the western part of the Vale of Pickering. Quarterly Journal of Engineering Geology, Vol. 11, 253–262.
  2. 2.0 2.1 HELSEL, D. 2005. Nondetects and Data Analysis: Statistics for Censored Environmental Data. (New York: Wiley & Sons.)
  3. LEE, L, and HELSEL, D. 2005b. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics. Computers & Geosciences, Vol. 31, 1241–1248.
  4. LEE, L, and HELSEL, D. 2007. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing. Computers & Geosciences, Vol. 33, 696–704.
  5. 5.0 5.1 5.2 LEE, L, and HELSEL, D. 2005a. Baseline models of trace elements in major aquifers of the United States. Applied Geochemistry, Vol. 20, 1560–1570.
  6. LEE, L, and HELSEL, D. 2005b. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics. Computers & Geosciences, Vol. 31, 1241–1248.
  7. LANGMUIR, D. 1997. Aqueous Environmental Geochemistry. (New Jersey: Prentice-Hall.)
  8. ZHU, C, and ANDERSON, G. 2002. Environmental Applications of Geochemical Modeling. (Cambridge: Cambridge University Press.)