Data standards

Data standards describe the expected meaning and acceptable representation of data for use within a defined context. The need for consistency of meaning is vital to facilitate information sharing among users of the data.

Take the case of Information Systems in the government. Organizations all across the government have developed various Information Systems to perform different tasks. These Information Systems do serve the internal needs of these organizations but also houses data which can be of use to many other organizations as well, if shared. However, historically, the ways in which these Information Systems have been developed, had no plans to inter-operate - the ability of Information Systems to exchange information and to use the information that has been exchanged - with other Information Systems, and therefore adopted independent approaches including independent naming conventions for the data fields - one of the common practices, through which 'things' are defined/denoted in Information Systems.

For example, assuming that the term 'address' is meant the same by all Information Systems, it could still be denoted as 'add' in one system, as 'adr' in another and as 'addr' in yet another. Now, if at all these two or three systems attempt to share or exchange data among themselves, it can not happen until and unless there is a common reference vocabulary available to all which establishes what is 'address' and how it is represented (lets say, format) and subsequently based on this information, individual Information Systems map, 'add' to 'address', 'adr' to 'address' and 'addr' to 'address' respectively and that the mapping is also shared with the systems with which data is indented to be shared before actually sharing the data.

The LIFe provides this common reference vocabulary under its 'Data standards' section. Note that it does not forces any one to use 'address' for the field indented for denoting an address. However if data exchange is expected to happen, then two things are prerequisite. First, the meaning should be consistent all across and second, a 'mapping' connecting native labels (such as 'add' or 'adr' or 'addr') to the labels from the common reference vocabulary (such as 'address') must also be specified, for other Information Systems to understand and reciprocate (please see the image above).

Had it been a manual system (paper based, for example) it would have been sufficient to just say that, 'the place where a person lives is to be called address (meaning) and it is spelled as 'address' (representation). However, since we are talking about information systems here, there are other areas of semantics as well which, if specified and shared, can further enhance the ease of data exchange. The other areas of semantics may include: context, conditions, and validity etc.

In the LIFe, the data standards, include the common reference vocabulary and other areas of semantics as well, to ensure that the terms exchanged between Information Systems and their components are unambiguously defined and represented in appropriate context.

While the data standards are currently aggregated based on specific Information Domains and keeping in view the 'generality' of data elements, attempts will be made simultaneously to create ad-hoc data standards as and when required. This will facilitate submission of proposals for data standards by project teams developing information systems in government or from working groups of specific Information Domains (to be constituted) -[candidate data standards].

The repository for approved data standards is available online at: The Registry of Data Standards