Principles

This project was established to bring methods and technologies forward to achieve new capacities within government for the use of government structure and functions data. The core idea being that descriptions (datasets) of government structure and functions (also a type of structure) could be presented in ways allowing for both interoperability (associations, cross-walking) and that the set of datasets established here could act as a spine, that is a set of government structure/function reference datasets for other data to associate with, see below.

Principles sections

§ Spine

There are a series of datasets in use at by the Federal government used to describe current and past government structure/function. These datasets are related - they are all about real-world related concepts - and share many similar structures and specific objects but they are neither identical, deliberately mapped to one another or even managed by a single authority. For example, we have the Australian Government Organisations Register (AGRO) database, managed by the Department of Finance, and also the Commonwealth Record Series (CRS) database, managed by the National Archives of Australia (NAA) which both describe government structure but they are not made to work together. This is mostly due to different foci - AGOR is about government now, CRS about government in the past - and partly due to having different host agencies - nevertheless, the can, and optimally should, be linked up for cross-querying to support government research into structural chang eover time, among other things.

A data spine, as it is conceived of in this project and the Location Index, is a collection of datasets and methods for data presentation that act as an anchor point for other data. Spines are created around a theme and LongSpines's spine centers on government structure and functions.

LongSpine then (re)presents existing datasets about government structure/function in interoperable, machine-readable formats which allow for the best technical access to them. It also presents, for the first time, mapping datasets that allow for cross-walking of these structure/function datasets. These Linksets (specialised types of datasets that link other datasets together, see below) are managed independently from the original datasets allowing for independent governance of mappings.


Figure P1: The core spine created by LongSpine (in grey). Items within the spine are linked to one another and further links between spine items may be assumed by joining multiple links together. Here, Govt Structure Database X may be crosswalked with Govt Functions Database Z by following links made via Govt Structure Database Y. Things outside the spine, here Other Data, may be crosswalked to Yet Other Data, also outside the spine, if both external datasets are associated with parts of the spine.

CACHE: the cache of LongSpine's elements, as described on the Cache Page, is a conveninet way of accessing the spine's content but not the only way or the point of truth. See that page for more details.

§ Datasets and Linksets

LongSpine contains a series of Datasets - databases and collections of information - about government structure, such as the Commonwealth Record Series (CRS) database. It also contains a series of vocabularies of government functions, such as the Administrative Arrangement Orders. While these vocabularies are different in size and structure to the government structural datasets, we will refere to them also as Datasets for these Datasets and vocabularies are all managed in a similar way: as a single, homogeneous dataset, with few connections to other Datasets. For example, the National Archives of Australia manages the CRS database as a singl, stand-alone Datasets not dependent on any thing else that NAA doesn't also manage.

LongSpine also contains a series of datasets that join other datasets and we call these Linksets. Linksets are an interesting sort of dataset since they potentially cross jurisdictions - from one agency's Dataset to another. It would be possible to create multiple, different, joins between Datasets with different methods or with different levels of authority. For this reason, Linksets are published independently from individual Datasets. An example of a cross-jurisdictional Linkset is the AGIFT/COFOG-A Linkset which joins the AGIFT and COFOG-A vocabularies (remember, these are Datasets!) published by the NAA and the Australian Bureau of Statistics respectively. Figure P2 below shows a conceptual view of a Linkset.

The formal definitions of Dataset and Linkset, as used by LongSpine are taken directly from the Location Index (LocI) project's defintion given in it's top-level ontology:


Figure P2: An example Linkset ("A/B") between Datasets A & B. The Linkset contains both dataset-level metadata (who published it, when, what methods were used to generate it) and a series of links that actually joint the target Datasets. Here Dataset A element A2 links to Dataset B element B1, and A4 to B3.

Since Linksets are presented separately from the Datasets they join, multiple Linksets can be published that join the same Datasets. If two methods, X & Y, were used to join Datasets A & B in different ways, you could publish two Linksets, as per Figure P3.


Figure P3: Two Linksets, A/B 1 & A/B 2 joining Datasets A & B created using different methods.

Due to the way graph database systems such as the LongSpine cache work, users can use which elements are used to answer queries they pose. This means that a user could query a system containing the elements in Figure P3 and decide which of the two Linksets are to be used to make joins. This allows any/all joining methods to be implemented but then to be used only where appropriate.

§ Identifiers

The primary purpose of LongSpine, and all similar spines, is to identify a set of things within a domain that can be used as references for other information within that domain. To achieve this, the particular identifiers used for LongSpine items need to be considered carefully.

With multiple Datasets and Linksets constituting LongSpine's spine, it's important to be able to unambiguously identify them as a whole as well as the smaller elements within them which people will ultimately use as reference items. Datasets such as the Australian Government Organisations Register (AGOR) are identified with dataset identifiers (AGOR: http://linked.data.gov.au/dataset/agor), vocabularies such as COFOG-A and LongSpine's part models such as the CRS Ontology get definitional item identifiers (COFOG-A: http://linked.data.gov.au/def/cofog-a, CRS Ont: http://linked.data.gov.au/def/crs).

Items within datasets, such as AGOR's representation of the Department of Prime Minister & Cabinet are identified with a derivative of the dataset they are within's identifier, as are things within vocabularies & ontologies. For instance, The AAO Ontology's definition of a Matter is http://linked.data.gov.au/def/aao#Matter which derives from the AAO ontology's identifier, http://linked.data.gov.au/def/aao, + the specific part identifying the item, Matter.

URI Identifiers

It's also a very useful thing to be able to discover more information about items identified. This is why we are using Uniform Resource Identifiers (URIs) - essentially web addresses - as universally unique and resolvable (clickable) identifiers for all items in LongSpine. This is in contrast to, say, just using local database Primary Keys or data codes or even UUIDs for item identifiers. Use of URIs for identifiers is part of established Linked Data practice, see section below.

More examples of LongSpine identifiers are:

URI Identifier origins

Most of the URI identifiers used in LongSpine are based on the linked.data.gov.au web domain that was established to provide long-term stable web identifiers for Australian Government data. See the Australian Government Linked Data Working Group's governance web page for more information. Some other domains in use are agency-secific, such as data.naa.gov.au - the National Archives of Australia's data subdomain.

Identifier parts

Since most of the identifiers in use in LongSpine are based on Australian Government Linked Data Working Group recommendations, they mostly follow a pattern of:

http://linked.data.gov.au/ + TYPE + / + COLLECTION_ID + / [+ CLASS_ID] + ITEM_ID

Where the first part, http://linked.data.gov.au/, is the Linked Data WG's persistent domain, the second, the TYPE, is either def, for definitional items such as vocabularies or ontologies or datasets for Datasets and Linksets. The COLLECTION_ID identifies the definitional item or Dataset, e.g. agor for the AGOR dataset or crs for the CRS Ontology. The CLASS_ID is optional - it's only present in Datasets & Linksets and is Dataset/Linkset-specific (whatever the creator of the Dataset/Linkset chooses to indicate the class of individual items within it), e.g. org within the AGOR dataset to indicate an Organisation. The last part, the ITEM_ID, identifies a definitional item or vocabulary concept or a particular dataset item and is definitional item / Dataset-dependent.

Wherever possible, previously existing, well-known, item IDs for things have been used. For example, in the original ABS, non-Semantic Web, publication of COFOG-A, the concept of Financial and fiscal affairs has the code 0112 so the URI identifier for this item is:

http://linked.data.gov.au/ + TYPE + / + COLLECTION_ID + / + ITEM_ID
==
http://linked.data.gov.au/def/cofog-a/0112

The test identifier for Department of Northern Australia, Central Office within the CRS Dataset is based on the CRS Datasbase's code of CA 1889 and is:

http://test.linked.data.gov.au/dataset/crs/commonwealthAgency/1889

§ Semantic Relations

Semantic Web data models use typed relationships between objects. So, while we can say that a Thing may be partOf another (larger) Thing, we can specialise this relationship and perhaps say that a particular CRS CommonwealthOrganisation is a subOrganizationOf of a CommonwealthAgency. Not only is there a very rich set of mechanics within the Semantic Web's methods to make specialised relationships, this project utilises many specialised relations already defined within the area of organisations and functions, such as the Organization Ontology's subOrganizationOf.


Figure P4: Specialisation of part/whole relations for firstly an Organisations context and secondly the more specialised CRS context

The sort of specialisation of relationship, as well as the Semantic Web's ability to specialise objects (an Organisation is a specialised tpy eof Things) allows projects like LongSpine to implement specialised models of things and yet also generalised models that allow for interoperability across specialised elements. For example, where we have Government Entity in AGO and we have Commonwealth Agency in the CRS, we can deal with each as is - with all the properties expected of them in their orignal datasets - but also deal with either as a a specialisation of an Organisation and thus, at a certain level of abstraction, interoperate across the datasets on that basis.

§ Linked Data

Where the Semantic Web is a conceptual way of relating data, Linked Data is a set of mechanics for actually implementing Semantic Web data over the Internet. Linked Data relies on data being modelled according to models such as the Semantic Web models LongSpine uses and for elements to be identified with URIs, again as LongSpine does.

LongSpine uses Linked Data mechanics to allow the the data and models that make it up as to be presented as distributed elements, accessed over the Internet. This allows different agencies to publish point-of-truth data models and have it work together. For example, the Portfolio Budget Statement (PBS) data can be delivered by the Department of Finance using the system of their choice and yet still be technically, and instantly, interoperable with the CRS data from the National Archives of Australia delivered through different chanel.

Systems such as this LongSpine DB can cache information from disparate sources for ease of use but caches such as this are not to be regarded as points of truth for the data - they are not the spine, only a temporary utilisation of it. The spine exists as the collective whole of the distributed datasets and models.

Modelled Interoperability

The elements of the spine are able to interoperate - to be crosswalked and used together - due to their adherence to a common set of Semantic Web models. See the Models Page for more information.

Technical Interoperability

LongSpine data can be technically aggregated for use since, as per Linked Data norms (see general information on Linked Data, such as Wikipedia's article) all of the information in it is delivered as Resource Description Framework (RDF) data. For example, to pull all of the parts of LongSpine into a single cache for querying, one need only collect the items represented in RDF and load them into an RDF-capable database. See the Cache Page for more information.

Item's home pages

As per Linked Data norms again, all LongSpine items (Datasets, Vocabs etc.) have home or landing pages on the internet in human-readable HTML. This is so that any item, when it's identifier is follow, can be simply seen and understood. Some examples are:

§ Time

Time (temporality) is one of the conceptual pillars of the LongSpine project. The organisations and functions that make up the core data of the spine are related structurally - Function X is performed by Organisation Y - and also temporally - Function X was performed by Organisation Y at time z or Organisation A was the precursor of Organisation B. A sophisticated handling of time by the spine will allow users of it to integrate datasets with time-based queries.

To facilitate a powerful handing of time, elements within Datasets and Linksets within LongSpine are associated with temporal objects (instants and ranges) to indicate their real-world temporal natures. This is in accordance with the way Time Ontology in OWL models time. Some examples:

The first point above just indicates how an information object - the CRS record about Paul Keating - represents a relationship that was in effect for a certain time. The second point indicates how particular time intervals can be named - here the time period in which Administrative Arrangement Order #80 was in effect - and other objects associated with them.

By naming important time intervals and instants, we can calculate things using them without having to actually check specific days. We can, for instance, ask "Which where all of the government agencies responsible for agriculture during the Hawke Prime Mininstership?". Figure P5 below shows some of the relations between objects we can have as a result of using the Time Ontology in OWL.


Figure P5: Possible relations using the Time ontology in OWL. A. an object can be associated with a time interval to show it has real-world temporality. B. A relationship between objects can do likewise. C. Objects can have temporal relations between them such as before and after. D. & E. Further temporal relations are intervalMeets (Y starts where X ends) and intervalDuring.