3.7 Databases and XML
In Table 1, it was easy to see which pieces of data belonged to which fields, where the records began and ended, and so on. The tabular layout enabled us to see at a glance the salient features. If you wanted to find a particular name in a table, you ran your eye down the 'name' field. It is a different matter for a computer. How does a computer 'know' which pieces of data belong to which field? How does it look in the right places? The data on the hard drive or in the RAM is not even arranged in a tabular way.
For a human user, a tabular layout gives a structure to the data. When data is structured, it is clear where a piece of data begins and ends, and which record and field a piece of data belongs to. For the computer to be able to work in a similar way with data, the data needs to be structured in a way that the computer (or rather, the program) can interpret. Different database systems have different ways of recording where data begins and ends, and which fields the data belongs to. Often additional data is incorporated into the database for this kind of 'housekeeping', but it is hidden from the user. This extra data is used by the program to encode the structure of the data. Word-processor files similarly contain data that is hidden from the user – for instance, instructions to display certain pieces of text bold and other pieces as italic.
Different database programs demarcate the structure of data in different ways, and this has proved to be a major problem in the e-government projects of many countries. Consider, for instance, accessing a government portal in order to use a particular service. You might have to log on, supplying a username and password. Behind the scenes, verification processes will check these and either allow you to proceed or not. You might then move on to other parts of the system to investigate, for example, your entitlement to benefits or to check tax liability.
Although you may have entered the e-government website via a single portal, behind the scenes the data required for these activities will typically be held in several different proprietary database systems. This is because of the long history of piecemeal implementation of databases in entra and local government. Typically there will be no common standard for coding the data fields in these databases. For example, in one system addresses might have fields with names such as House umber, Streetname, Town, City, Postcode and so on. Another system might have Address1, Address2, Address3 instead. This is an example of the 'legacy problem'. In many cases it is too expensive to replace these diverse systems with new, integrated systems operating to common standards. Somehow the older systems have to be incorporated into the newer e-government systems and have to be able to work together with them. A vital tool for enabling these diverse systems to work together has been XML, or extensible Markup Language, which I will briefly discuss.
The idea of marking up goes back to pre-computer printing technology The (human) printer would be supplied with a typescript of the document to be printed. The document would be 'marked up' with handwritten tags or labels (Figure 2).
The tags were coded instructions for particular fonts and sizes, and an accompanying sheet explained what they represented. The meaning of the tags would change from typescript to typescript; the tags did not have fixed meanings that applied to all typescripts the printer would work on.
Tagging is a way of keeping appearance and content separate. In Figure 2, the typescript is the content, and the appearance (how the text should look on the page) is embodied in the tagging. XML uses this idea of tagging to indicate the form or structure. As with the print markup, XML tags have no fixed meaning, and so any particular XML document needs an accompanying definition of what the tags represent. This is usually done in a schema. Although to some extent XML resembles HTML, the need for a schema in connection with XML documents is a crucial difference. In HTML there is no schema, and the meanings of tags are set down in a standard.
One difference between XML coding and the old style of print mark-up is the embedding of XML tags within the content of the document itself, rather than their being in a reserved part of the document. (In the case of the print mark-up, the reserved area is the margins.)
Activity 14 (exploratory)
What safeguard is needed if tags are embedded in the text itself?
The most important safeguard is that the tag should not be interpreted as part of the content of the document. This is usually done by surrounding the tags with special characters (or symbols). XML uses angle brackets, < and >, as HTML does. Notice that in the print mark-up, the tags are encircled as a further aid to keeping them separate from the content.
As an example of XML in practice, Figure 3 shows a small StarOffice spreadsheet. (StarOffice uses XML coding in all its files.)
Below is a small part of the XML file for this table. Don't worry about trying to understand it. I have picked out some of the data items from Figure 3 in bold. Notice how little of this extract is content, and this extract is just a small part of the entire file.
|<table:table-cell table:value-type=“float” table:value=“21”><text:p>21</text:p></table:table-cell>|
|<table:table-cell table:value-type=“date” table:date-value=“1953-05-01”><text:p>01/05/53</text:p></table:table-cell>|
|<table:table-cell table:value-type=“float” table:value=“75”><text:p>75</text:px/table:table-cell>|
|<table:table-cell table:value-type=“date” table:date-value=“1964-07-25”><text:p>25/07/64</text:p></table:table-cell>|
From what you have learned about HTML you will recognise the use of symbols like < > and / to distinguish parts of the file that deal with layout from parts that deal with content. HTML and XML have both evolved from an earlier mark-up language called SGML (Standard Generalized Markup Language), devised for use with print documents.
Many proprietary IT systems use coding to keep information about appearance and content separate. Two things make XML different from proprietary equivalents:
It is an open standard. Its openness means that it is not owned by any particular company.
It is extremely adaptable to new ways of distributing and presenting information.
These two factors make XML invaluable as a common language for exchanging structured data. However, by itself XML does not solve thel legacy problem. In addition, there need to be various types of middleware to translate legacy data into XML, and to translate in the opposite direction. (Middleware is a general name for software that canl enable separate systems to work together.) Middleware is specific tol particular systems, so solving the legacy problem also involves thel appropriate middleware. This might mean buying it, but in some casesl it means creating it specially.
XML is widely used where different systems need to operate together, and not just legacy systems. For instance, through the adoption of standard schemas for data exchange, banks can swap information easily among themselves, even though their information systems are very different. Many other types of business use XML to allow for standardised ways of transmitting information. XML has also been influential in the growth of web services. Web services are self-contained reusable programs that are components of online services. Examples are authentication of identity, currency conversion, shipping processing, etc. The programs that perform web services are self-contained units, and can be incorporated directly into a more complex online service.
These services need to be able to work on many different platforms, that is, in many different computing environments, with many different programming languages. The open nature of the XML standard has allowed this to happen. XML is also increasingly used 'behind the scenes' in word-processing documents, spreadsheets, databases and online documents and forms.
The impact of XML on information exchange and online provision of services has been enormous and will almost certainly continue to grow. Bisson (2005) wrote:
Tomorrow's XML will also be much more visible in the foreground of computing. Computer desktops will become canvases for active documents that mix XML data and formatting information – and include links to web services. Your online tax return will be a document that looks like the paper forms the Inland Revenue sends, but it will be able to work with online calculation services, before delivering XML data directly into the Inland Revenue systems (and automatically transferring your refund into your bank account).
The development of XML has been very timely for the e-government project because it has allowed incompatible systems to work together. However, XML is only part of a much bigger picture. In the UK, the government has set up an e-GIF (e-government interoperability framework) initiative, which is a set of compulsory standards for the public sector of the UK. These standards define the way that data should be structured and accessed. For instance, the e-Gif framework specifies the use of web browsers for viewing data, the use of XML for integrating data, the use of internet and Web protocols, and so on. Systems that are 'e-Gif compliant' (i.e. which conform to the e-Gif specifications) should be able to communicate between themselves. A European e-Gif system is under development at the time of writing.
The increasing ease of transfer of data is practical and convenient. But is it always in the best interest of the people whose data is held? People who have investigated the data held about them by, for instance, credit card companies, have often been surprised at the amount of data held, and questioned whether much of it was relevant to the business of credit card companies. In the context of e-government a particular concern is the ease of transfer of data from one government department to another, so that, for instance, a person's medical records or tax record might be viewable by other departments. Another concern is that personal information could be made available to private companies. With the contracting to private companies of work formerly done by government departments, private companies often need access to the information held in government databases.