OR/14/022 Summary and recommendations
|Barkwith A K A P, Pachocka M, Watson C, Hughes A G. 2014. Couplers for linking environmental models: Scoping study and potential next steps. British Geological Survey Internal Report, OR/14/022.|
The benefits of integrated modelling are not limited to better understanding of complex coupled Earth system processes. Adopting integrated modelling technology means substantial saving in time and costs, since already developed codes can be repurposed and reused in new models.
The couplers described in this report are good exemplars of the types of technologies that are available for model integration. Due to rapid developments in IT, most of the current technologies allow components to communicate dynamically (Lu 2011). While a large number of couplers have been developed up to date, not all of them have been equally successful within the scientific community. Reasons for that could be attributed to specific coupler's features, or lack thereof, e.g.: lack of support for Windows operating system (e.g.: MMS, SME), use of less compatible languages (e.g.: ICMS using MickL, Tarsier using Borland C++), or lack of GUI and use of declarative statements to describe model structure (e.g.: SME, NextFRAMES) (Lu 2011).
OpenMI standard appears to be the most successful and widely accepted within the hydrological community. This does not come as a surprise as OpenMI was developed to specifically target water resources domain. OpenMI's particular feature is that it only sets standards based on interfaces and ensuring that these are implemented correctly is sufficient to make a component complaint (Knapen et al., 2009). The disadvantages include no support for web services (Goodall et al., 2011), and a sequential (pull-driven) communication mechanism, which only allows for single threaded execution (OATC 2010a, OATC 2010b).
CSDMS is suggested by some authors to have a broader hydrologic scope than OpenMI (Peckham 2007). An obvious advantage of CSDMS is its interoperability tool Babel, which, by automatically generating the 'glue code', enables communication between models written in different languages (Peckham 2007). CSDMS is intended to be interoperable with ESMF and OpenMI (Peckham 2007); integration with these different frameworks opens opportunities for cross-domain environmental research.
While CCA is intended for high-performance computing applications, it does not provide "any automatic way for the software to take advantage of multiple processors" (Peckham 2007). ESMF, on the contrary, provides direct path to parallel computation through domain decomposition (Peckham 2007). While ESMF is considered rather intrusive (Lawrence et al., Manuscript), OASIS using “a concurrent multiple executable approach requires minimal modification to the existing component code" (Valcke et al., 2012)
Although certain aspects of frameworks are similar, for example interfaces of CCA, ESMF, OASIS, OMS, OpenMI, and TIME all use initialise, run, finalise, get, and set concepts, the amount of code needed to integrate models varies significantly (Jagers 2010). OMS 3.0 is a lightweight framework, which uses metadata approach to integrate models. In the study by Lloyd et al., (2011) it was shown to be the least invasive in comparison with other tested frameworks (OMS 2.2, ESMF 3.1.1C, ESMF 3.1.1Fortran, OpenMI 1.4, CCA 0.6.6), e.g.: OMS 3.0 required the least amount of code for implementation of the Thornthwaite model (Lloyd et al., 2011, David et al., 2013).
TIME, likewise OMS, uses metadata approach to integrate models. The primary difference it that annotations in TIME are embedded in the source code, while in OMS they are encoded as declarations in external XML files (Lu 2011). An evident advantage of TIME is its GIS functionality; a considerable disadvantage is its lack of support for non-TIME models and for interoperability with other frameworks (Fitch and Bai 2009). However, efforts have been undertaken to overcome this limitation by developing software based on web services, which would enable TIME models to interface with other applications (Fitch and Bai 2009).
The use of workflows to integrate hydrologic models is still rather limited (Lu 2011). The challenge comes in refactoring the existing codes into reusable workflow activities. Deciding on the right granularity and complexity of the individual activities is critical for constructing a good workflow (Cuddy and Fitch 2010). Although, still not a common practise, a few high profile projects are exploring ways to employ workflows for water resources modelling. Kepler was suggested to replace OpenMI Configuration Manager in the two-way coupled system linking hydrology and climate models (Goodall et al., 2013, Saint and Murphy 2010); the rationale for this being that Kepler is more extensive and versatile than OpenMI (Saint and Murphy 2010). EVO developers are looking into ways to increase customisation by implementing workflow execution such as that provided by Taverna (Elkhatib et al., 2013). Hydrologists' Workbench, employing Microsoft's TRIDENT, is being developed by The Commonwealth Scientific and Industrial Research Organisation (CSIRO) to help fulfil the Bureau of Meteorology’s legal obligation for producing monthly regional water situation reports based an integrated data and modelling system's output (Cuddy and Fitch 2010, CSIRO 2013). The main advantages of using a workflow are the automation of repetitive tasks, and the ability to document model runs and record the workflow sequence as a file, which guarantees repeatability, auditability, and transparency of scientific computations (Lu 2011, CSIRO 2013).
BFG offers a novel approach to model integration, which 'isolates the science that a model performs from the code used to control and couple it with other models' (BFG 2013). When employing BFG, no changes to the component's code are needed, since a wrapper code is generated which enables it to fit within a framework of choice. Furthermore, models integrated using BFG are 'resistant' to the framework’s modifications (Warren et al., 2008). BFG goes beyond a typical coupling technology that imposes architectural requirements on components, hence it allows for models to be easily exchanged.
Employing tight coupling enables "use of the most efficient algorithms to solve complicated numerical problems, for example fully-coupled systems of differential equations" (Goodall et al., 2011). However, an obvious disadvantage of tight coupling is the difficulty with integrating models that do not comply with the framework requirements (Goodall et al., 2011). "In contrast, a loosely-coupled approach requires only the standardisation of interfaces and data exchanges" (Goodall et al., 2011). The advantages of using loosely-coupled, service-oriented approach extend beyond the ability to integrate disparate models. The user does not have to be concerned with large computing resources or datasets needed. Each model operates in its own hardware environment and the system's functionality can be accessed through web services interfaces (Goodall et al., 2011). In the case of the cloud technology, the resources are available on demand, which reduces the computing equipment and run-time costs (e.g.: electricity, administration, etc.) (EVO 2013). Hence, using web services frees user from some of the technological concerns, allowing them to focus on the scientific aspect of their work (EVO 2013). Service-oriented technology, however, does not come without its challenges. The design of such a system need to consider potential performance, reliability and security issues (Goodall et al., 2011). The primarily concern is the performance associated with modelling fully-coupled processes with large data transfers and tasks with long execution times (Goodall et al., 2011). Reliability might be a problem as remote servers can become temporarily unavailable (Goodall et al., 2011). Additionally, security must be ensured to prohibit unauthorised use (Goodall et al., 2011).
All of the described advances in the scientific computing technology constitute a significant progress toward comprehensive and efficient modelling systems. Such systems are essential to address water resources management challenges that arise due to the climate change on one hand, and increasing and conflicting demands on the other.
One way file transfer formats
BGS has investigated significant resources in developing Information Management (IM) to serve data both internally and externally. The experience built up in this process as well as the relevant infrastructure is useful in developing any IEM solution. This experience is based around using Oracle databases and the standards associated with it and include:
- Catalog Service for the Web (CSW) is one part of the OGC Catalog Service specification that they describe as follows “Catalogue services support the ability to publish and search collections of descriptive information (metadata) for data, services, and related information objects. Metadata in catalogues represent resource characteristics that can be queried and presented for evaluation and further processing by both humans and software. Catalogue services are required to support the discovery and binding to registered information resources within an information community."
- Web Feature Service (WFS) from the OGC provides an interface which allows clients to query and access geographical features across the web.
- Geospatial Data Abstraction Library (GDAL) is, according to gdal.org “a translator library for raster geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single abstract data model to the calling application for all supported formats. It also comes with a variety of useful command line utilities for data translation and processing.”
Alongside these standards BGS has adopted OpenMI 1.4 as a model linking standard. However, whilst this version isn’t designed to exchange static data, the revision OpenMI 2.0 can and offers promise for linking with static datasets.
The climate community has adopted a number of standards for their data. These include Gridded Binary (GRIB), Network Common Data Form (netCDF) or the Hierarchical Data Format (HDF) system. All are intended for use with modern atmospheric datasets, which encompass information about the atmosphere, sea, and ocean and are used for modelled and observed data. These standards are supplemented by a recently conceived Climate and Forecasting (CF) standard which aims to distinguish quantities (descriptive, units, prior processing, etc) and to spatio-temporally locate data as a function of other independent variables, such as a coordinate system.
Couplers and workflow engines
Given the range of coupler technologies open to the BGS, and others, it is easy to spend a long time reading literature on the theory behind each and attempting to evaluate the relative value of one over another. It would be prudent to identify a shortlist of candidate technologies for hands on evaluation, the aim being to assess model performance over a range of desirable model coupling features.
Given the experience the BGS has with the OpenMI 1.4 standard and ongoing efforts to implement a composition in OpenMI 2.0, this report recommends the later is shortlisted for inclusion in the coupler evaluation process.
CSDMS provides an alternative approach to OpenMI in as much as the philosophy behind the technology is more related to the use of High Power Computing, something that the BGS has relatively little experience of. One of the key similarities between CSDMS and OpenMI is the use of the Initialise — Run — Finalise (IRF) principle raising the potential for code re-use across both technologies, allowing the modellers to select the best coupling option for the job without the need for extensive re-factoring. Therefore, the CSDMS technology is recommended for the shortlist, although it is unclear whether this should be CSDMS1.0 or CSDMS2.0, the later was launched in 2013 but relatively little information was found during this investigation about real world applications of the technology.
Given that the remit of the study was coupling technology within the hydrological and atmospheric sciences, it is necessary to extend the scope outside of these communities. A recently developed approach that shows promise is OASIS-LMF (Loss Modelling Framework) whose aim is to provide a methodology to provide risk assessments for the Insurance and re-insurance industry. It is suggested that the trial composition be tested using OASIS-LMF.
Finally this report recommends the evaluation of one or more ‘workflow engines’, the Trident project is open source and appears to have been successfully used by CSIRO to develop the ‘Hydrologists Workbench’, however, there does not seem to be a particularly strong community of users outside of CSIRO. It seems unlikely that this is a solution the BGS should spend too much time evaluating unless a strong contact can be established with members of the CSIRO team involved in the ‘Hydrologists Workbench’.
Another workflow option is the Kepler project, it appears to have an active community, producing numerous peer reviewed publications, discussing topics such as environmental sensor networks, climate change and species distribution, due to time spent reviewing other options these papers were not studied in depth. (see https://kepler- project.org/publications?tags=keplerworkflow)
It may also be possible to use existing workflow tools within the BGS such as FME (http://www.safe.com/fme/fme-technology/). It is recommended that in-house experts in FME (e.g. Tony Myers) should be consulted on the capabilities of the system to see if this approach is worth taking any further.
This activity should be linked with the TSB-AHRC funded project Confluence. This project, led by HR Wallingford and undertaken in conjunction with Nottingham University aims to assess the use of the Pyxis workflow tool. The project will involve including BGS models in Pyxis and assessing how this improves the management of the overall workflow.
The shortlisted coupler technologies should be evaluated by directly comparing the scientific accuracy, ease of use and feature richness when applied to a single linked model. The exact nature of the linked model scenario should be designed in consultation with geologists, mathematical modellers and senior staff within the BGS Environmental Modelling Directorate to ensure the scenario being considered is consistent with current and anticipated future challenges.
Once a scenario has been defined it will be possible to identify the key resources and components required to answer the question, i.e. which datasets, models and conversion functions are required. The following diagrams show a possible scenario to use in our proposed bench test, the entities in Figure 4 are generic whereas Figure 5 provides a real world context containing BGS examples.
Data file formats
To ensure that the compositions described use appropriate standards then it is necessary to define a set of internationally recognised ones to use. Given the reliance of data from BGS corporate databases then those used for the Geological Object Store should be used. These include CSW and GDAL. Alongside these, the use of NetCDF and CF for large datasets should be investigated.
Finally the use of WFS for data transfer between dynamic models should be included within one part of the composition.
- LU, B. 2011. Development of A Hydrologic Community Modeling System Using A Workflow Engine. PhD thesis, Drexel University.
- KNAPEN, M J R, VERWEIJ, P, WIEN, J E and HUMMEL, S. 2009. OpenMI — The universal glue for integrated modelling? 18th World IMACS/MODSIM Congress. Cairns, Australia.
- GOODALL, J L, ROBINSON, B F and CASTRONOVA, A M. 2011. Modeling water resource systems using a service-oriented computing paradigm. Environmental Modelling and Software 26, 573–582.
- OATC 2010a. OpenMI Document Series: The OpenMI 'in a Nutshell' for the OpenMI (Version 2.0). The OpenMI Association Technical Committee. In: MOORE, R. (ed.).
- OATC 2010b. OpenMI Document Series: OpenMI Standard 2 Specification for the OpenMI (Version 2.0). The OpenMI Association Technical Committee. In: MOORE, R. (ed.).
- PECKHAM, S D. Evaluation of Model Coupling Frameworks for Use by the Community Surface Dynamics Modelling System (CSDMS). American Geophysical Union Fall Meeting 2007.
- LAWRENCE, B N, BALAJI, V, CARTER, M, DELUCA, C, EASTERBROOK, S, FORD, R, HUGHES, A & HARDING, R. Manuscript. Bridging Communities: Technical Concerns for Integrating Environmental Models.
- VALCKE, S, BALAJI, V, CRAIG, A, DELUCA, C, DUNLAP, R, FORD, R W, JACOB, R, LARSON, J, O'KUINGHTTONS, R, RILEY, G D and VERTENSTEIN, M. 2012. Coupling technologies for Earth System Modelling. Geoscientific Model Development, 5, 1589–1596.
- JAGERS, H R A. Linking Data, Models and Tools: An Overview. International Congress on Environmental Modelling and Software Modelling for Environment's Sake, Fifth Biennial Meeting 2010 Ottawa, Canada.
- LLOYD, W, DAVID, O, ASCOUGH II, J C, ROJAS, K W, CARLSON, J R, LEAVESLEY, G H, KRAUSE, P, GREEN, T R and AHUJA, L R. 2011. Environmental modeling framework invasiveness: Analysis and implications. Environmental Modelling & Software 2 6, 1240–1250.
- DAVID, O, ASCOUGH II, J C, LLOYD, W, GREEN, T R, ROJAS, K W, LEAVESLEY, G H and AHUJA, L R. 2013. A software engineering perspective on environmental modeling framework design: The Object Modeling System. Environmental Modelling and Software, 39 201–213.
- FITCH, P & BAI, Q F. 2009. A standards based web service interface for hydrological models. 18th World Imacs Congress and Modsim09 International Congress on Modelling and Simulation: Interfacing Modelling and Simulation with Mathematical and Computational Sciences, 873–879.
- CUDDY, S M and FITCH, P. 2010. Hydrologists Workbench — a hydrological domain workflow toolkit. In: SWAYNE, D A, YANG, W, VOINOV, A A, RIZZOLI, A and FILATOVA, T. (eds.) International Congress on Environmental Modelling and Software, Modelling for Environment's Sake, Fifth Biennial Meeting. Ottawa, Canada: International Environmental Modelling and Software Society (iEMSs).
- GOODALL, J L, SAINT, K D, ERCAN, M B, BRILEY, L J, MURPHY, S, YOU, H, DELUCA, C and ROOD, R B. 2013. Coupling climate and hydrological models: Interoperability through Web Services. Environmental Modelling and Software, 46 250–259.
- SAINT, K & MURPHY, S. End-to-End Workflows for Coupled Climate and Hydrological Modeling. International Congress on Environmental Modelling and Software, Modelling for Environment's Sake, Fifth Biennial Meeting 2010 Ottawa, Canada.
- ELKHATIB, Y, BLAIR, G S and SURAJBALI, B. Experiences of Using a Hybrid Cloud to Construct an Environmental Virtual Observatory. 3rd International Workshop on Cloud Data and Platforms 2013 Prague Czech Republic.
- CSIRO 2013. The Workbench (TWB) Website [Online]. Last revised 8 February 2013. [cited 14 November 2013]. Available:Https://wiki.csiro.au/pages/viewpage.action?pageId=305136129.
- BFG 2013. Bespoke Framework Generation Website. [Online]. Centre for Novel Computing University of Manchester. [cited 14 November 2013]. Available: http://cnc.cs.man.ac.uk/projects/bfg.php.
- WARREN, R, DE LA NAVA SANTOS, S, ARNELL, N W, BANE, M, BARKER, T, BARTON, C, FORD, R, FÜSSEL, H M, HANKIN, R K S, KLEIN, R, LINSTEAD, C, KOHLER, J, MITCHELL, T D, OSBORN, T J, PAN, H, RAPER, S C B, RILEY, G, SCHELLNHÜBER, H J, WINNE, S and ANDERSON, D. 2008. Development and illustrative outputs of the Community Integrated Assessment System (CIAS), a multi-institutional modular integrated assessment approach for modelling climate change. Environmental Modelling and Software 23, 1215–1216.
- EVO 2013. Environmental Virtual Observatory Website [Online]. [cited 14 November 2013]. Available: http://www.evo-uk.org/.