3 Databases
3.1 Tables and flat databases
Databases lie at the heart of many e-government systems, and at the heart of many other IT systems. The local government websites you looked at in Activity 6, for instance, almost certainly used databases a great deal, as do the majority of central government sites. Away from e-government, the websites for Amazon or eBay, for example, use huge databases.
Constructing a database of any complexity requires careful thought about the way information is organised in any particular context. A database can therefore be regarded as the outcome of an analysis of the structure and use of information. The significance of this remark might not be clear at the moment, but should become clearer as you work through this section. Careful analysis of the structure and use of information is a vital part of all large-scale projects such as e-government.
For large, complex information systems, relational databases, which I will describe shortly, are generally used, although relational databases can be used for simpler projects too. It is not the size of a project that dictates the need for relational databases, but its complexity. More specifically, the complexity of the relationships between information in the database is what dictates the need for relational databases. I hope to give you a flavour of what I mean by this in the short example in the following pages.
Relational databases are contrasted with simpler databases known as flat databases. Some people, however, use the term 'database' to mean only relational databases, and flat databases are sometimes simply called tables.
To understand what relational databases are and why we use them, we need to appreciate some of the shortcomings of the simpler, flat database. Table 1 is an example of a flat database. It contains information about some fictional students and the local authority evening classes they are studying. As you will recall, in a table like this a column can be described as a field, and each row is a record.
Table 1 Students and evening classes
Name | Address | Postcode | Course1 | Course1_Sessions | Course2 | Course2_Sessions | Course3 | Course3_Sessions |
---|---|---|---|---|---|---|---|---|
Lodhi, Mona | 22 The Grove, Newport | AB12 3CD | Yoga | 10 | Holiday Spanish | 15 | ||
Jones, Bob | 2 High Street, Stratford | AB6 4PQ | IT for all | 5 | Oil painting | 15 | ||
Cherry, Colin | 59 Acacia Avenue, Brompton | AB2 12ZY | Digital photography | 10 | Sewing | 15 | ||
Cherry, Colin | 13 The Limes, Leighton | XY6 7LR | Creative writing | 20 | ||||
Edwards, Delia | 40 Eldon Court, Hampton | XY12 4TK | Digital photography | 10 | Oil painting | 15 | Ballroom dancing | 30 |
Roberts, Albert | 18 Mount Pleasant, Greenhill | AB12 7UB | Drawing | 15 | ||||
Singh, Sara | 7 Marina View, Sutton | ABM 8WQ | Woodwork | 15 | ||||
Chang, Patrick | 21 Green Lane, Newport | AB12 9TU | Everyday maths | 20 | ||||
Evans, Mary | 13 The Limes, Leighton | AB6 7LR | Oil painting | 15 |
This table is quite short, but you could imagine it extending downwards to cover many more students.
Table 1 allows for up to three courses per student, and the fields for them are Course1, Course2 and Course3. Each course has an associated number of sessions, which is the number of classroom sessions in the course. This is shown in the fields Course1_Sessions, Course2_Sessions, Course3_Sessions. (I have avoided spaces in the field names because many database management systems do not permit them, but this is irrelevant to the principles I am outlining here.)
Table 1 could be set up as a spreadsheet. You can think of a spreadsheet as a flat database with a selection of built-in, simple database functions, such as searching and the ability to sort the data in various ways and to perform calculations on it (for example counting the number of students enrolled on each course). A spreadsheet also allows you to display the data as charts, graphs, lists and so on.
Something that databases and spreadsheets share is the concept of a data type. Typical data types are 'text' (that is, alphabetic characters), 'number' and 'date'. Usually each field needs to be specified as holding a particular type of data. For example, in Table 1 the fields for course sessions would be defined as having the data type 'number'. Defining these fields as having numerical data allows arithmetical operations to be performed on the data, such as adding the number of course sessions in each record to find the total number of sessions a student has enrolled for. Similarly, using the data type 'date' allows data to be sorted chronologically. So, for instance, if there were a field with the starting dates of the courses, the records could easily be sorted into date order. A further benefit of defining a data type for fields is that it can help prevent the wrong type of data being entered. If a field is defined as containing text data (for instance), the program can perform checks on any data entered to make sure it is the right kind.
Table 1 is a very simple table. Much more data would need to be recorded in a real example, for instance course fees, classroom allocated, course leader's name, course code, start date, and so on. However, Table 1 has other problems which are not due simply to its lack of typical data. These other problems arise because the organisation of the data has not been considered carefully enough. I will come back to this point shortly.
Much of the usefulness of databases as a way of holding data arises from the fact that the data is organised, and can be interrogated in various ways. By 'interrogated' I mean that a question can be framed whose answer can be drawn from the database. For instance, a question might be: 'How many people with a postcode beginning “AB6” are enrolled on courses with 10 sessions or more?' For a human reader of Table 1, that question is easy to answer, although tedious if the table is big. Getting a computer to answer the question is not so straightforward, because the processes a computer would have to go through are quite complicated. In essence, a small computer program needs to be written to work through the data and to apply appropriate tests. This is done by using what is referred to as a query language. A query language is not quite like a natural language, such as English or French, but has some of the features of a natural language mixed with some mathematical features. A very common query language is SQL, or structured query language. You will see an example of a query using SQL later.