Data

LongSpine currently contains of the following types of data:

  • Vocabularies
    • SKOS vocabularies of government functions
  • Datasets
    • government structural units and government functions, not in SKOS but other, more detailed, models
  • Linksets
    • published crosswalks Datasets → Datasets & Vocab → Vocab
  • Vocab/Dataset Joins
    • where and how Vocabs join Datasets

§ Vocabularies

Datasets

Figure D1: The LongSpine's Vocabularies of government functions

LongSpine's vocabularies are all existing national and international vocabularies of government functions. For the LongSpine project, all these vocabularies were presented in Linked Data (RDF) formats and according to the well-known SKOS vocabulary model. Some of these vocabularies (AGIFT & COFOG) existed before LongSpine in RDF/SKOS form. The table below lists the vocabularies, descriptions of them and gives their idieifying namespace / persistent ID (PID) URIs. These vocabularies are stored in the LongSpine cache in Named Graphs (schema) also identified by these same PIDs. See the Cache Page for more details about their storage.

Table D1: LongSpine Vocabularies

Ontology Description Namespace / PID Release Status
Australian Government Interactive Functions Thesaurus (AGIFT) A vocabulary of government functions created by the NAA and released online in SKOS before the LongSpine project https://data.naa.gov.au/def/agift In stable release by NAA as Linked Data at its namespace location.
Classifications of Functions of Government (COFOG) An international government functions classifications vocabulary issued by the UN's statistical office. http://linked.data .gov.au/def/cofog

Draft: in code repository by CSIRO online at http://test.linked.data.gov.au/def/cofog

Note: this vocabulary has been published as Linked Data by the UN and also CSIRO previously but not a according to SKOS in a way able to be easily be used by LongSpine so the version linked to here is modified for effective use by LongSpine.

Classifications of Functions of Government - Australia (COFOG-A) An Australian version of COFOG (above) issued by the Australian Bureau of Statistics http://linked.data .gov.au/def/cofog-a Draft: in code repository by CSIRO online at http://test.linked.data.gov.au/def/cofog-a
Commonwealth Record Series Thesaurus (CRS-Th) The CRS Thesaurus is a government functions thesaurus used within the NAA's CRS Database (see Component Datasets above). http://linked.data .gov.au/def/crs-th Alpha: Released at it's namespace location online, but currently redirecting to a code repository awaiting online delivery as Linked Data
Government Purpose Classification (GPC) The Australian Bureau of Statistics' legacy government functions thesaurus, now mostly superseded by COFOG-A http://linked.data .gov.au/def/gpc Draft: in code repository by CSIRO online at http://test.linked.data.gov.au/def/gpc
Local Government Purpose Classification (LGPC) The Australian Bureau of Statistics' legacy local government functions thesaurus. An extension to GPC (above). http://linked.data .gov.au/def/lgpc Draft: in code repository by CSIRO online at http://test.linked.data.gov.au/def/lgpc
Records Disposal Authority Classes Vocabulary A vocabulary of just the 'Classes' within Records Disposal Authorities (see the component dataset listed above) http://linked.data .gov.au/def/rda-voc Draft: in code repository by CSIRO online at http://test.linked.data.gov.au/def/rda-voc

Note that the vocabularies are also listed on the Models Page since they are both models (of concepts) and data. The Datasets not considered vocabularies are about either government structure or function but contain content that was not possible to present using the SKOS vocabulary model.

§ Datasets

Datasets

Figure D2: The LongSpine's Datasets

Longspine's Datasets, sourced either from the National Archives of Australia (NAA) or the Department of Finance (Finance) are listed below. The links go to the static dumps of the Datasets temporarily stored by CSIRO as RDF files and documentation in online code repositories. Note that the datasets are described by component data models that are detailed on the Models Page.

  1. Administrative Arrangements Orders (AAO) - Finance/NAA
  2. Australian Government Organisations Register (AGOR) - Finance
  3. Commonwealth Record Series (CRS) - NAA
  4. Portfolio Budget Statements (PBS) - Finance

Table D2: LongSpine Datasets

Dataset Description Namespace / PID Release Status
Administrative Arrangements Orders (AAO) Content of the AAO documents, delivered as RDF http://test.linked.data.gov.au/ dataset/aaos Draft dataset stored here:
aao-dataset (DRAFT)
Australian Government Organisations Register (AGOR) Content of the publicly available portions of AGOR derived from http://directory.gov.au http://linked.data.gov.au /dataset/agor Not ready yet
Commonwealth Record Series (CRS) Some demonstration (test) data from the publicly available access to the CRS database via the NAA's website records search http://linked.data.gov.au /dataset/crs-test Test dataset in a code repository by CSIRO online at http://test.linked.data.gov.au/dataset/crs-test
Portfolio Budget Statements (PBS) Data extracted from the textual Portfolio Budget Statements for the last 5 - 10 years. http://linked.data.gov.au /dataset/pbs Not ready yet

These Datasets are all Linked Data representations (see Principles) of existing Datasets, some of which are decades old. LongSpine created Semantic Web (OWL) models for the content of each of these Datasets (see Models) and then exported parts of the Datasets according to those models into static files. These files have been ingested into this LongSpince cache (see Cache).

§ Linksets

Vocabulary Linksets

Vocab Linksets

Figure D3: The LongSpine's vocabularies and the Linksets that crosswalk them

The Linksets implemented by the LongSpine project to crosswalk vocabularies of government functions to each other are show above and listed and linked to below.

Currently no Linkset joins the RDA Classes Vocab to any other functions vocabulary however there are other ways to match RDA functions to others (see Vocab/Dataset joins below) and additional RDA Classes Vocab → Vocab X Linksets could be made using methods similar to the other Linksets.

The links go to the Linksets stored by CSIRO as RDF files and accompanying documentation (e.g. methods used to create them) in online code repositories.

Table D3: LongSpine Vocabulary Linksets

Linkset Description Namespace / PID Release Status
AGIFT → CRS Thesaurus SKOS mappings from AGIFT to CRS Th http://test.linked.data.gov.au/dataset/agiftcrsth Draft: CSIRO code repository: agiftcrsth-linkset
AGIFT → COFOG-A SKOS mappings from AGIFT to COFOG-A http://test.linked.data.gov.au/dataset/agiftcofoga Draft: CSIRO code repository: agiftcofoga-linkset
COFOG → COFOG-A SKOS mappings from COFOG to COFOG-A http://test.linked.data.gov.au/dataset/cofogcofoga Draft: CSIRO code repository: cofogcofoga-linkset
LGPC → COFOG SKOS mappings from LGPC to COFOG http://test.linked.data.gov.au/dataset/lgpccofog Draft: CSIRO code repository: lgpccofog-linkset
LGPC → GPC SKOS mappings from LGPC to GPC http://test.linked.data.gov.au/dataset/lgpcgpc Draft: CSIRO code repository: lgpcgpc-linkset

These Linksets are all initially published by CSIRO. Most of the Linkset's mappings were made by CSIRO staff however, for two Linksets, the mappings (correspondences as they are often called) were already published in non-RDF form by the ABS.

Dataset Linksets

The Linksets implemented by the LongSpine project to crosswalk Datasets of government structural units to each other have been proposed but not completed. So far, only placeholder code repositories exist for them and they are:

  1. PBS → AGOR
  2. PBS → CRS

It is envisaged that, in addition to using text matching and human review matching to make Dataset Linksets, as was done for Vocabulary Linksets, other methods could also be used to create government structural unit Dataset/Dataset crosswalks, such as:

  • using multi-classified instance data to build up Linksets statistically
    • where data such as government records are dual tagged with attribution to more than one Dataset
  • using Dataset A → Functions Vocab X & Dataset B → Functions Vocab X
    • Linksets created by leveraging transitive links between Doatasets & Vocabularies