This Model - the GA Supermodel - is Geoscience Australia's overarching data model that provides integration logic for data elements in many of GA’s specialised domains. It also provides logic for Persistent Identifier (PID) patterning. It is based on the general-purpose Supermodel Model.

eg
Figure 1. Example of data instances, Sample AU279 & Site 17939, with Data Domains shown using diagram elements from the Key in Figure 1

1. Metadata

IRI

https://linked.data.gov.au/def/ga-supermodel

Title

Geoscience Australia Supermodel Specification

Description

This Model - the GA Supermodel - is Geoscience Australia's overarching data model that provides integration logic for data elements in many of GA’s specialised domains. It also provides logic for Persistent Identifier (PID) patterning.

Created

2021-12-14

Modified

2022-05-23

Issued

0000-00-00

Creator

Geoscience Australia

Publisher

Geoscience Australia

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

Machine-readable form

supermodel.ttl

2. Preamble

2.1. Abstract

This Model - the GA Supermodel - is Geoscience Australia's overarching data model that provides integration logic for data elements in many of GA’s specialised domains. It also provides logic for Persistent Identifier (PID) patterning. It is based on the general-purpose Supermodel Model.

2.2. Namespaces

This model is built on a "baseline" of Semantic Web models which use a variatey of namespaces. Prefixes for thess namespaces, used througout this document, are listed below.

Table 1. Namespaces
Prefix Namespace Description

super:

https://linked.data.gov.au/def/supermodel/

the Supermodel meta-model

dcterms:

http://purl.org/dc/terms/

Dublin Core Terms vocabulary namespace

ex:

http://example.com/

Generic examples namespace

owl:

http://www.w3.org/2002/07/owl#

Web Ontology Language ontology namespace

rdfs:

http://www.w3.org/2000/01/rdf-schema#

RDF Schema ontology namespace

sosa:

http://www.w3.org/ns/sosa/

Sensor, Observation, Sample, and Actuator ontology namespace

skos:

http://www.w3.org/2004/02/skos/core#

Simple Knowledge Organization System (SKOS) ontology namespace

time:

http://www.w3.org/2006/time#

Time Ontology in OWL namespace

void:

http://rdfs.org/ns/void#

Vocabulary of Interlinked Data (VoID) ontology namespace

xsd:

http://www.w3.org/2001/XMLSchema#

XML Schema Definitions ontology namespace

2.3. Terms & Definitions

The following terms appear in this document and, when they do, the definitions in this section apply to them.

The following terms appear in this document and, when they do, the definitions in this section apply to them. This section’s content is also presented online in a formal vocabulary at:

Term IRI Definition Source

Central Class

td:central-class

Central Classes are the generic data classes at the centre of Data Domains with high-level relationships between them defined in this supermodel.

These classes are taken from general standards - usually well-known international stadnards - and the Indigenous Data Network specialises and extends them to make specific, custom, classes for their needs.

Supermodel model

Component Data Model

td:component-data-model

A data model for a particular component of a Supermodel. The Component Data Model may have been designed for a particular Supermodel that uses t but it may also pre-exist and it just indicated for use within the Supermodel.

A Supermodel will always need to provide mappings from classes within a Component Data Model to other Supermodel elements for interoperability

Supermodel model

Data Domain

td:data-domain

High-level conceptual areas within which Geosicence Australia has data.

These Data Domains are not themed scientificly - 'geology', 'hydrogeology', etc. - but instead based on parts of the Observations & Measurement [ISO19156] standard, realised in Semantic Web form in the SOSA Ontology, part of the Semantic Sensor Network Ontology [SSN].

Current Data Domain are shown in Figure 1.

Supermodel model

Knowledge Graph

td:knowledge-graph

A Knowledge Graph is a dataset that uses a graph data tructure - nodes and edges - with strongly-defined elements.

Common use, e.g. https://en.wikipedia.org/wiki/Knowledge_graph

Linked Data

td:linked-data

A set of technologies and conventions defined by the World Wide Web Consortium that aim to present data in both human- and machine-readable form over the Internet.

Linked Data is strongly-defined with each element having either a local definition or a link to an available definition on the Internet.

Linked Data is graph-based in nature, that is it consistes of nodes and edges that can forever be linked to further conceps with defined relationships.

https://www.w3.org/standards/semanticweb/data

Semantic Web

td:semantic-web

The World Wide Web Consortium's vision of an Internet-based web of Linked Data.

Semantic Web is used to refer to something more than just the technologies and conventions of Linked Data; the term also encompases a specific set of interoperable data models - often called ontologies - published by the W3C, other standards bodies and some well-known companies.

The 'semantic' refers to the strongly-defined nature of the elements in the Semantic Web: the meaning of Semantic Web data is as precicely defined as any data can be.

https://www.w3.org/standards/semanticweb/

2.4. Conventions

All model diagrams use elements introduced in Figure 1. These elements are defined in the RDF, RDFS and OWL ontologies, see [OWL] for mode details.

All code snippets in this document, used to show formal and machine-readable versions of concepts, are expressed using the Turtle RDF syntax [TTL].

3. Introduction

This model described a set of Data Domains with Central Classes that are associated using Linked Data principles. Specialisations of the Central Classes are made to cater for particular data needs.

Altogether, these things form a Knowledge Graph of data for Geoscience Australia that participates in the wider, international, Semantic Web.

This model is predicated on an assumption that GA is a data aggregation organisation and therefore data cataloguing is its major concern. At the centre of this model then is a domain of Data Cataloguing, the main elements of which are taken from the Data Catalog Vocabulary ([DCAT]). The things that GA’s data are mostly about are spatial things, hence a Data Domain of Spatiality, for which the GeoSPARQL [GEO] is core. The way GA generates information about spatial things is via observations and various forms of sampling, hence a Data Domain of Sampling and for this one the Sensor, Observation, Sample, and Actuator (SOSA) ontology [SSN] is mainly used as it which focuseses on observations and how they produce results, what the results are about and so on. GA’s data is cetegorised in various ways and for this the Data Domain of Theming is indicated. Within it, taxonomy representation using [SKOS] is paramount. Finally, all organisations relate their data and processes to thus who are responsible for them, thus the final Data Domain is indeed Organisations and People which is modelled using a number of models such as [DCTERMS], [PROV] and [SDO].

These models are also all Semantic Web models and they have been selected for their easy interoperability.

All elements of this model are modelled using the Web Ontology Language [OWL] and specailisations of it, such as the Simple Knowledge Observation System [SKOS] which is used for modelling taxonomies of concepts. As well as the textual and image descriptions of the model here, in the next Section, a machine-readable version of this model is available (see Metadata).

3.1. Uses

This model should be used to understand the broad relationships between any data elements within GA. It can also be used to inform policy that is based on GA’s overall data structure, for example persistent identifier (PID) policy for which there is a dedicated Annex, Annex A: PID Policy.

4. Model

This model, the actual GA Supermodel itself, is a profiling - specialised reusing - of multiple, well-known Semantic Web models. It is organised into a series of Levels which serve different purposed. All elements of the model are only defined once and the various Levels simply present views of the model at differnt levels of abstraction to serve their viarous intended purposes.

4.1. Level 0: Model Background

This view of the model is a backgrounding one which describes the underpinning model mecahnics that it uses. The object modelling used is based on the Web Ontology Language [OWL] and its own underlying use of RDF & RDFS [1]. The Provenance Ontology [PROV] is used to model real-world causal dependencies - provenance.

4.1.1. Diagram Key

The figure below is a key for the elements in all of the model diagrams in this document.

key
Figure 2. Diagram elements key

4.1.2. Object Modelling

The elements from the above subsection are shown in relation to one another in the figure below.

level0 owl
Figure 3. OWL objects and their relations

The elements shown above are identified with prefixed IRIs that correspond to entries in the Namespace Table. A short explanation of the diagram key elements is:

  • owl:Class - represents any conceptual class of objects. Classes are expected to contain individuals - instances of the class - and the class, as a whole, may have realtions to other classes

  • owl:NamedIndividual - an individual of an owl:class. For example, for the class ships, an individual might be Titanic

  • rdf:property - a relationship between classes, individuals, or any objects and Literals

  • rdfs:subClassOf - an rdf:property indicating that the domain (from object) is a subclass of the range (to objects). An example is the class student which is a subclass of person: all students are clearly persons but not vice versa

  • rdf:type - the property that related an owl:NamedIndividual to the owl:Class that it’s a member of

  • Literal - a simple literal data property, e.g. the string "Nicholas", or the number 42. Specific literal types are usually indicated when used

The remaining diagrams in this document use extensions to this basic model, for example Figure 4 uses colour-coded specialised forms of owl:Class (subclasses of it) and the relations in Figure 6 are specialised forms of rdf:property.

4.1.3. Provenance

General provenance/lineage information about anything - a rock sample, a dataset, a term in a vocabulary etc. - is described using the Provenance Ontology [PROV] which views everything in the world as being of one or more types in Figure 4.

level0 prov
Figure 4. PROV main classes and main relations

According to PROV, all things are either a:

  • prov:Entity - a physical, digital, conceptual, or other kind of thing with some fixed aspects

  • prov:Agent - something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent’s activity

  • prov:Activity - something that occurs over a period of time and acts upon or with entities

While not often in front of mind for objects in any Data Domain, provenance relations always apply, for example: a sosa:Sample within the Sampling domain is a prov:Entity and will necissarily have been created via a sosa:Sampling which is a prov:Activity. Another example: an sdo:Person related to a dcat:Dataset via the property dcterms:creator in the DataCataloging domain is a specialised form of a prov:Agent related to a prov:Entity via prov:wasAttributedTo.

4.2. Level 1: Data Domains

The top-level view of the GA supermodel that assumes Level 0 background mechanics shows a set of 5 Data Domains which are:

These are shown in Figure 2 below.

data domains
Figure 5. Top-level view of the GA Supermodel showing Data Domains

These Data Domains are defined formally in a simple SKOS vocabulary within this model’s set of machine-readable resources. The vocabulary may be access ddirectly at https://linked.data.gov.au/def/supermodel/data-domains.

Elements at all other levels of detail in this model are classified according to these Data Domains by use of the dcat:theme property, for example, the class sosa:Sample is within the Sampling Data Domain, so it is defined as follows:

sosa:Sample
    a owl:Class ;
    dcat:theme super:sampling ;
    ...
.
Note
This supermodel’s origins are in GA’s geology and spatial data work and thus other areas of GA’s responsibility may not be adequately provided for with the Data Domains. While these current Data Domains are at a concpetual level above particular science domains and shouldn’t be extende to cater for science areas such as Offshore Petrolium Resources or responsibility areas such as Community Safety, it may still be necissary for these domains may be added to accomodate other GA work in some way. Extensions should occur as little as possible.

4.3. Level 2: Central Classes

The next level of detail after the Data Domains introduces the Central Classes. Here the most significant, general, class per Data Domain is indicated, along with the main relationships between each of them. Figure 3 shows this.

central classes
Figure 6. Next level view of the GA Supermodel showing Central Classes

The Central Classes of each of the Data Domains are well-used classes from well-known models. For example, the Central Class of Organisation & People is [PROV]'s Agent class which is one of the three main classes of thing in PROV and used every time PROV is used to represent causal agents. PROV is used extensively to indicate how things - data, resources, systems - come to be.

A list of the Data Domains' Central Classes, their definitions, as given by their defining systems, and their defining system are given in Table 2 below.

Table 2. Data Domains their Central Classes and those Classes' definitions and origins
Data Domain Central Class Definition Defined By

Data Cataloguing

dcat:Dataset

A collection of data that is listed in the catalog.

Data Catalog Vocabulary [DCAT]

Sampling

sosa:Sample

A Sample is the result from an act of Sampling.

Feature which is intended to be representative of a FeatureOfInterest on which Observations may be made.

Physical samples are sometimes known as 'specimens'.

Sensor, Observation, Sample, and Actuator Ontology, within [SSN]

Spatiality

geo:Feature

A discrete spatial phenomenon in a universe of discourse

GeoSPARQL Ontology [GEO]

Theming

skos:Concept

An idea or notion; a unit of thought

Simple Knowledge Organization System ontology [SKOS]

Organisations & People

prov:Agent

An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent’s activity

PROV-O: The PROV Ontology [PROV]

The definitions of the main relations between Central Classes are given in

Table 3. Central Class main relations their definitions and origins
Central Class Definition Defined By

dcat:Dataset

A collection of data that is listed in the catalog.

Data Catalog Vocabulary [DCAT]

sosa:Sample

A Sample is the result from an act of Sampling.

Feature which is intended to be representative of a FeatureOfInterest on which Observations may be made.

Physical samples are sometimes known as 'specimens'.

Sensor, Observation, Sample, and Actuator Ontology, within [SSN]

geo:Feature

A discrete spatial phenomenon in a universe of discourse

GeoSPARQL Ontology [GEO]

skos:Concept

An idea or notion; a unit of thought

Simple Knowledge Organization System ontology [SKOS]

prov:Agent

An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent’s activity

PROV-O: The PROV Ontology [PROV]

4.4. Level 3: Domain Main Classes

At this level, the main classes within each Data Domain are identified and related to one another. In each Data Domain there is a well-known model used for the majority of the classes and relations. These well-known models are indicated to ensure that they can be followed if extensions to this level’s modelling need to be made.

4.4.1. Data Cataloguing

This subsection details the main elements of the Data Cataloguing Data Domain.

dd data cataloguing
Figure 7. Domain Main Classes for Data Cataloguing

This Data Domain’s main classes are essentially the DCAT2 data model [DCAT] with a slight profiling: dcterms:hasPart should be used to indicate elements within catalogues (e.g. dcat:Dataset and other things within a dcat:Catalog) rather than the specialised properties of dcat:dataset because GA expects to catalogue many types of things and the type of the thing should be given by the thing, not the catalogue property used to indicate it.

4.4.2. Organisations & People

This subsection details the main elements of the Organisations & People Data Domain.

dd orgs people
Figure 8. Domain Main Classes for Organisations & People

This Data Domain’s main classes are centered on [PROV]'s prof:Agent class but specific types of agent - person & organisation are defined using schema.org [SDO], the general-purpose ontology provisioned by Google, Microsoft & Yahoo for the description of web page data.

schema.org objects and properties are also used to define agents in the VocPub profile [VOCPUB] and are understood by ontology documentation tools such as pyLODE [2] which is used by GA.

4.4.3. Sampling

This subsection details the main elements of the Sampling Data Domain.

dd sampling
Figure 9. Domain Main Classes for Sampling

Most of this Data Domain’s main classes are taken directly from the Sensor, Observation, Sample, and Actuator Ontology (SOSA) which is part of the Semantic Sensor Networks Ontology [SSN] with only the tern:Site class taken from another model, the TERN Ontology [TERN], which is just a specialisation of SOSA anyway. The TERN Ontology is the domain model of the Australian Biodiversity Information Standard (ABIS) [ABIS] with which GA sampling data is intended to be compatible.

In addition to samples & sampling, this domain can cater for general observations of things, e.g., observations of:

  • chemicals in rock samples, determined in a lab

  • images of the earth

  • classification of statigraphic unit

Where sosa:Sampling activities result in sosa:Sample objects, sosa:Observation activities result in sosa:Result objects. The observation/result pair is a more generic form of the sampling/sample pair.

SOSA also has a sosa:Platform class - something that hosts sensors and other equipment - so GA field sites that contain notes on equipment are a combination of a tern:Site and a sosa:Platform.

4.4.4. Spatiality

This subsection details the main elements of the Spatiality Data Domain.

dd spatiality
Figure 10. Domain Main Classes for Spatiality

This Data Domain’s main classes are taken directly from GeoSPARQL 1.1 [GEO] which is used extensively in GA already. GeoSPARQL’s main purposes are to relate things (geo:Feature) to their spatial projections - their geometries - and to relate things to one another - topological relations between features, such as within, touches, disjoint etc.

Particular datasets tend to implement specialised types of things (usually referred to as Feature Types) and sometimes specialised relations between things, e.g. special hydrological catchment feature type might relate to another by being upstream of it. This is as per modelling in the Geofabric [3].

4.4.5. Theming

This subsection details the main elements of the Theming Data Domain.

dd theming
Figure 11. Domain Main Classes for Theming

This Data Domain’s main classes are taken from [SKOS] and their expected/required properties and relations are formally defined in VocPub, a "vocabulary publication profile of SKOS" [VOCPUB]. VocPub just mandates certain vocabulary metadata and relations between elements in vocabularies. Conformance of vocabularies to VocPub is also easily testable using the profile’s validator and online tooling that support it [4].

4.5. Data Domains Details

The Data Domains described above are implemented using multiple models and other resources. The following subsections describe the Domains' details and link to resources within them, such as Component Models.

4.5.1. Data Cataloguing Domain

Models
Model Role Notes

Data Catalog Vocabulary [DCAT]

Domain main model

Supplies main modelling structures for this domain

Examples

To do

4.5.2. Organisations & People Domain

Models
Model Role Notes

The PROV Ontology [PROV] & schema.org [SDO]

Domain main models

Supplies main modelling structures for this domain. prov:Agent is the main class and sdo:Person & sdo:Organization are the two schema.org classes differentiating People & Organisations. PROV properties are used to link data to Agents; schema.org properties used to label Agents and relate them to one another

Examples

To do

4.5.3. Sampling Domain

Models
Model Role Notes

Semantic Sensor Network ontology [SSN]

Domain main model

Supplies main modelling structures for this domain

TERN Ontology [TERN]

Supporting Model

Based on the Semantic Sensor Network Ontology [SSN], this model is used to characterise sites and field sampling:

  • supplies the tern:Site class which is the parent class of gas:Mine, gas:FieldSite etc.

  • supplies tern:SiteVisit which is the parent class of gas:Survey

GA Sampling Profile

A Component Data Model defined for this Supermodel.

Caters for detailed Material Sample properties; Specialised Sites modelling

A profile of several ontologies and a Semantic Web implementation of the Sampling Features Schema described in clauses 9-11 of [ISO19156].

Directly inherits from:

SAM Lite ontology

Background Model

A Semantic Web implementation of the Sampling Features Schema described in clauses 9-11 of [ISO19156] that was used for GA samples modelling 2015 - 2022. This ontology is no longer in direct use but is the foundation for the GA Samples Ontology

Loop3D GSO

The Loop3D initiative's GeoScience Ontology

Alignment target

Incomplete

The Loop3D GSO is a detailed geology model and attempts are being made to align GA Samples data with it.

Examples: Sample

Example representations of a Sample, http://pid.geoscience.gov.au/sample/AU128, according to one of the models listed above, is given below.

Representations of AU128 according to all of the models listed above is delivered by the GA Samples API online at:

See also complete static data examples fpr AU128:

Example data according to the [GAS]:

@prefix role: <http://def.isotc211.org/.../CI_RoleCode/> .
@prefix sample: <http://pid.geoscience.gov.au/sample/> .
@prefix samples: <http://pid.geoscience.gov.au/samples> .

sample:AU128
    a tern:MaterialSample , geo:Feature ;
    rdfs:label "Sample igsn:AU128" ;
    dcterms:identifier "https://igsn.org/AU128"gas:IGSN ;
    sosa:isSampleOf <http://pid.geoscience.gov.au/site/17594> ;
    gas:samplingLocation [
        a geo:Geometry ;
        geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4283> POINT(137.6250691792 -34.0411726571)"^^geo:wktLiteral ;
    ] ;
    gas:currentLocation [
        a dct:Location ;
        dct:description "GA Services building" ;
    ] ;
    gas:samplingMethod <http://pid.geoscience.gov.au/def/voc/ga/igsncode/Rock> ;
    prov:hasQualifiedAttribution [
        prov:actor [
            a sdo:Person ;
            sdo:name "Raymond, O.L." ;
        ] ;
        prov:hadRole role:originator ;
    ] ,
    [
        prov:actor <http://linked.data.gov.au/org/ga> ;
        prov:hadRole role:custodian ;
    ] ;
.

samples: rdfs:member <http://pid.geoscience.gov.au/sample/> .

In the exmple data above, the sample AU128 is identified using the Persistent IRI http://pid.geoscience.gov.au/sample/AU128 which is shortened to sample:AU128 using a prefix.

It is declared to be of certain classes:

  • tern:MaterialSample - a physical sample of something

  • geo:Feature - a geospartial Feature

It has a basic label, "Sample igsn:AU128", and an alternate identifier, the International GeoSample Number (IGSN) “https://igsn.org/AU128”.

It is a sample of Site 17594, a sampling location, a geometry, and a current location are given, as is a sampling method (clearly incorrect here!) and is indicated as being a member of a Feature Collection http://pid.geoscience.gov.au/sample/ - the list of all GA’s Samples.

Finally, two Agents are indicated as haveing roles in relation to this sample:

  • a person "Raymond, O.L." - the sample’s originator

    • a more specialised role for this might be defined shortly

  • <http://linked.data.gov.au/org/ga> (Geoscience Australia) - the sample’s custodian

Examples: Survey

Surveys are modelled according to the GA Samples Profile, inheriting from the TERN Ontology, as per Figure X

surveys
Figure 12. Survey modelling in the GA Samples Profile [GAS]

4.5.4. Spatiality Domain

Models
Model Role Notes

GeoSPARQL 1.1 [GEO]

Domain main model

Supplies main modelling structures for this domain

Examples

To do

4.5.5. Theming Domain

Models
Model Role Notes

Simple Knowledge Organization system (SKOS) ontology [SKOS]

Domain main model

Supplies main modelling structures for this domain

Examples

To do

dd sites platforms
Figure 13. Domain Main Classes for Sites and Platforms

Surveys are a subclass of Samplings, and where the survey took place at a site, are related to the site through a tern:SiteVisit. Soft typing of surveys is done through the dcterms:type property, and vocabularies for types and methods of surveys are defined in the [xx] and [yy] vocabularies

dd surveys
Figure 14. Domain Main Classes for Surveys

Additional classes in this domain cater for special forms of Sites and Samples. See the Sampling Domain detailed section below.

Annex A: Profile Requirements

Requirements for this specification are organised as perr the classes within the GA Supermodel. See the Supermodel main documentation for more information.

A.1 Material Sample Class

tern:MaterialSample

The TERN Ontology defines a tern:MaterialSample subclass of SOSA's Sample class which is used by GA for physical samples.

ID Title Rule Notes

req:ms.1

Material Sample IGSN

Each Material Sample MUST indicate an IGSN with a dcterms:identifier property with an object of datatype gas:igsnIri

req:ms.2

Material Sample Sample ID

Each Material Sample MUST indicate an GA Sample ID with a dcterms:identifier property with an object of datatype gas:sampleid

req:ms.z

Material Sample State

Each Material Sample MAY indicate the Australian State they were aquired within with a geo:sfWithin property and, if it does, it must indicate the state with its IRI taken from the ASGS2021 dataset

req:ms.y

Material Sample Date Aquired

Each Material Sample MUST indicate the date at which it was aquired with the gas:dateAcquired property with an literal object of datatype xsd:date

req:ms.x

Material Sample Sampling Location

Each Material Sample MUST indicate exactly one location at which it was aquired with the gas:samplingLocation property with an geo:Geometry type object

Annex B: PID Policy

  • sources of principles

    • previous GA practice

    • AGLDWG

    • international practice

  • own principles

    • as flat as possible

      • reject old /def/voc/ga/…​

      • classes only

References


1. RDF: https://www.w3.org/RDF/, RDFS: https://www.w3.org/TR/rdf-schema/. These references generally need not be followed as descriptions of the use of OWL will cover their relevant concepts.
2. https://pypi.org/project/pyLODE/
3. https://linked.data.gov.au/dataset/geofabric
4. The validator itself is online at https://w3id.org/profile/vocpub/validator and is pre-loaded into GA’s vocabulary servers e.g. https://vocabs.ga.gov.au. It can also be selected for online validation use at https://rdftools.surroundaustralia.com