OR/14/061 Summary of progress made

From Earthwise
Jump to: navigation, search
Watson, C, Baker, G, and Nayembil M. 2014. Open geoscience data models: end of project report. British Geological Survey Internal Report, OR/14/061.

How it unfolded

Work started on the first of May 2011, it had been agreed that due to the central role the borehole database plays in the BGS, this should be the first design released. A platform independent data model was produced in the software, ER/Studio (by Embarcadero), this was done by reverse engineering the BGS database and editing the design in accordance with a number of reviews held between the developers and data architect. Once the data model was agreed, the ER/Studio software was used to automatically generate technical specification documentation and implementation scripts for a variety of relational database platforms. The team were very comfortable performing these data modelling steps and the technical outputs were produced relatively quickly. Writing the documentation to make the data model more openly accessible to a wider audience came less naturally and took longer to conclude. A number of BGS staff carried out internal reviews of the documentation and a final draft was circulated amongst project partners with a complimentary copy of a Microsoft Access implementation of the database.

The first website was a relatively simple set of pages which was designed to describe the project objectives and provide access to the newly created downloadable database design package. Feedback was encouraging and comments helped the team to refine the way subsequent documents would be produced.

OpenGeoscience Data Models website.

In parallel with the technical work, community engagement took place through presentations at events such as the CGI GeoSciML committee meeting, Edinburgh and arrangements for technical visits to The Minerals Resource Authority (MRA), Papua New Guinea and the Nigerian Geological Survey Agency (NGSA) were made.

Security concerns started to emerge by the end of the first six months and it became apparent that the technical visits were going to be more difficult to arrange than initially thought. Nigeria and Papua New Guinea were project partners since the project planning phase but alternative plans had to be made in light of civil disorder. In the first instance, attempts were made to arrange meetings in alternative locations, this was not ideal, informal conversations were held at conferences attended by project partners but true requirements gathering and implementation activities needed to be performed in situ.

Over the subsequent months attention switched to development and promotion of the downloadable designs and take up increased from an average of two or three visitors a day to five or six. The most popular format for data model implementations was Microsoft Access and this appears to be due to the users wanting a quick and simple way to look at the database without executing scripts. This allowed us to stream line the download production for future data models, with only Microsoft Access implementations being produced by default, with other implementation being provided on demand.

Making use of existing communities and their communication channels lead to significant leaps in visitor numbers, as shown in the following graph. The effect on download numbers after the project was promoted in the GeoSciML community newsletter and the social networking site LinkedIn (through groups such as GEOinformatics and Geocomputing Professionals) was particularly noticeable in these relatively early days.

Download statistics for late 2011.

There was some debate regarding the level of intellectual property that should be retained by the BGS/NERC and whether potential users of the outputs should register prior to downloading content. It was decided that the designs would be made available totally free and that they could be used for any subsequent purpose, commercial or other. In order to maximise take up of the downloadable design packages it was decided that registration would not be required despite this preventing the tracking of users.

The second data model to be released was designed for Geochemistry data, this was promoted through LinkedIn, and in particular the creation of a discussion in the International Association of Geochemistry group lead to a notable spike in downloads. A number of questions and suggestions were submitted via the LinkedIn platform and the development team were also contacted directly by UK based commercial laboratories wishing to use the design in a bench testing exercise. In order to satisfy this requirement the scripts for a MySQL implementation were produced.

High level Geochemistry data model diagram.

Through link ups with other BGS projects, travel costs were shared for the attendance at a number of international conferences. A presentation on ‘The need for an Open Exchange of Geoscience Data Models’ was given by project coordinator, Carl Watson, as an invited speaker at the International Geological Congress (IGC) in Brisbane, 2012. A number of contacts were made at the IGC which lead to the website and contact details being published in the newsletter for the Coordinating Committee for Geoscience Programmes in East and Southeast Asia (CCOP). Informal meetings were held with, amongst others, ground water experts at Queensland University of Technology and project partners MRA of Papua New Guinea. Another notable spike in downloads was observed in the following week.

Despite not being able to visit Papua New Guinea for safety reasons LL1: R1 contact was regularly made using email and their new database designs were reviewed and recommendations made remotely.

EarthDataModels.org was launched in November 2012, up to this point the online presence for the project was limited to LinkedIn and the BGS OpenGeoscience website so a community website to share data models from the BGS and other organisations was helpful in separating out the project as an initiative independent of the BGS brand R4. Primarily populated by BGS designs contacts were made with a number of organisations which lead to the linking up with USGS, GEUS and a few others and the range of data models was significantly increased.

Community website: EarthDataModels.org.

The community website included an online forum which initially attracted a reasonable number of users but activity was very limited, most people preferred to contact each other directly via email or phone calls rather than post an enquiry publically.

Another problem with the forum was spam and the amount of time required to manage users and filter out unwanted posts LL2. In the end the decision was made to close the forum down and focus resources on maintaining direct contacts and producing more data models.

The next significant event took place in January 2013 with two team members attending the Colloquium of African Geology (CAG24), in Addis Ababa, Ethiopia. A presentation was given and a booth was set up for the week from which demonstrations of data models were given and a survey was conducted to assess which subjects were of most value to conference delegates.

Data Model demos and questionnaire sessions at CAG24.

The delegates indicated that they wanted geophysics and geochemistry models as their top priorities, with groundwater models coming third, for full details see: http://www.earthdatamodels.org/designs/CAG/surveyResults.html

Over the subsequent months the Lexicon of Names Rocks database design package was released along with links to one of the most popular controlled vocabularies on the Open Geoscience website.

Ad hoc data modelling enquiries were received on a regular basis and dealt with by the team, these included contacts from an GSO in Bhutan who were analysing options for digitising their spatial data holdings, the Swedish Geological Survey asking for designs to help with the redesign of their borehole database and a South African software developer interested in producing a Laboratory Information Management System (LIMS) based upon the Geochemistry design published by the project.

Results for one of the questions asked of CAG24 delegates.

IGS (International Geoscience Services) Ltd, a UK based company providing geoscientific and geodata services to the global market, started the IGS Geodata project during the first year of the OpenGeoscience Data

Models project. IGS Geodata is a dedicated effort to create a geodata management and enrichment system. The project focuses on regions in which mineral exploration industry has a considerable potential to grow but lack central data repository and geodata has to be acquired separately and then combined.

From an email dated 06/11/2014, Sławomir Wójcik said:

“Members of the Geodata project team has met Mr Carl Watson as he offered help and advice regarding his previous experiences in a similar field. He described BGS efforts in this area in detail and guided the team to find out more about BGS Linked Data project. A fair amount of time has been spent investigating BGS achievements and the Earth Data Models database designs. We followed the advice of Mr. Watson to take a closer look at OpenGeoScience data models and standards proposed by CGI. Mr. Watson talked about GeoSciML, an industry standard language that is used to describe geodata, being an extension of well-known XML standard.
As a result of Mr Watson's involvement, Geodata project benefited by adopting several industry standards we would have to spend a considerable amount of time and effort to find. A suitable, well tested ontology provided by CGI has been selected and adapted to Geodata project needs. GeoSciML is also considered as a part of data interchangeability module in the future. BGS Linked Data website and endpoint also served as an inspiration for how to open the databases in a succinct and efficient way.”

Malawi was the destination for a ten day technical visit in June 2013, the project-coordinator hosted and attended a number of meetings to investigate data modelling capacity within departments responsible for spatial data management, identify potential collaboration opportunities and promote the free to use data models published at www.earthdatamodels.org. The visit was primarily based in Lilongwe and Blantyre, with appointments at institutions such as the Ministry of Health, Ministry of Agriculture and Surveys Department.

There are a large number of projects run by government, commercial organisations and Non- Government Organisations (NGOs) in the country that involve the capture and storage of spatial data, methods vary from organisation to organisation. There is a high regard for GIS expertise in the country and plenty of well trained and keen practitioners, there is also a significant number of researchers and decision makers who are interested in finding out what spatial datasets exist within the country. Unfortunately there appear to be a large number of datasets that could have national significance but remain hidden on individual laptops, often held in spreadsheets and GIS files that are well understood by the authors but poorly documented, raising the risk of data being lost or misunderstood by future potential users.

The June visit lead to a subsequent visit in February 2014 which was more technical in nature and was very much focussed on supporting the work being carried out by the Department of Surveys.

One shining example of how spatial data management could be improved in Malawi, and perhaps other low income countries, is http://www.MASDAP.mw. This system provides an online portal for the upload, search and download of spatial datasets, its development is funded by the World Bank programme for Disaster Risk Management.

Malawi spatial data portal: MASDAP.

The Department of Surveys are in the process of creating a National Spatial Data Centre for the country, they will host and administer the MASDAP system and are in the process of developing national standards for the capture and storage of spatial data.

The MASDAP system is a powerful tool for capturing and storing spatial data, however, it is only possible to capture metadata when a user is uploading a dataset. Many data owners are willing to tell others what data they possess but strongly resist ‘giving it away’. We came up with a plan to encourage the capture and use of spatial metadata through a series of changes to the existing system and by arranging data management and data modelling training in the UK and Malawi. The BGS data model for spatial metadata and information workflows were presented and used to inform discussions on future developments.

Over the two visits we met a lot of welcoming people, in particular Alice Gwedeza at the Surveys Department and Allan Chilimba at the Ministry of Agriculture who treated us with great enthusiasm and hospitality.

Email dated 06/03/14 from Alice Gwedeza:

“I am writing this with great pleasure to thank you for the enthusiasm that you have for assisting Malawi to come up with a better database infrastructure specifically in developing the metadata for Malawi. Further I want to thank you for your efforts in not just seeing something but to come and discuss with us in order to map the way forward. Honestly, as a mapping/GIS profession I am dying to see a well organised, up to date and comprehensive spatial data Infrastructure to avoid duplication of efforts whereby people collect data and do not share as a result the data for the same area can be collected several times by different users.
We are very eager to work with assistance from the British Geological Survey. Sometimes here we have the knowledge but we are frustrated by lack of resources, as a result a lot of resources are wasted. With the metadata in place we will be assisted in knowing which data sets exist.
We are looking forward to work with you to achieve the production of metadata.”

In order to cement the relationships with the Malawi partners a successful bid was submitted to the Commonwealth Scholarship scheme to fund a member of the Department of Surveys to travel to the UK for data modelling and information management training.

The final months of the project involved significant effort to forge new connections with European GSOs and others who would be interested in collaborating on data modelling and data management issues after the initial knowledge exchange project ceased. In particular links with the Geological Survey of Denmark and Greenland (GEUS) through the COST Sub-Urban network workshops lead to follow up knowledge exchange meetings between the BGS data architect and GEUS head of information systems and new collaborative opportunities continue to be discussed. A presentation was made at the commercial and public sector GSPEC community in Glasgow 2014, to promote centralised and well controlled data management to support geotechnical engineering activities. These efforts have lead to several members of the project team being invited to take part in new international projects within the field of geoscience data modelling.