3.3 Entities and attributes
A well-designed relational database overcomes the problems outlined in Section 3.2 by using two or more tables, rather than a single table, such as Table 1. This means that the data has to be divided in some way between the tables. The construction of tables is done according to several rules. I will just look briefly at one rule to give you an idea of the underlying principles.
The rule I shall look at states that there must be one table per entity. An entity is an item for which we want to store information. It can be tangible – for example a person or an object – or intangible – for example an event. (Typical events might be a sale, registration or renewal.) An entity can be a concept – for example a bank account.
In Table 1 the most obvious entity is 'student'. Any particular student, such as Colin Cherry, is an instance of this entity. Entities are the things we store data about in tables. Each piece of information we store is an attribute of that entity. An attribute is descriptive information about an entity. For the 'student' entity, attributes in Table 1 are name, address and postcode. In a properly constructed table there is one field per attribute.
You might argue that 'Course1', 'Course2' and 'Course3' are also attributes of 'student', and in some contexts that would be reasonable. In this case, though, there is descriptive information about courses, namely the number of sessions they run for. Courses therefore have attributes. In this context, then, 'course' is another entity. Whether something is regarded as an entity or not is dependent on the context.
Activity 7 (exploratory)
Does 'postcode' have any attributes in Table 1 (see screen 6)?
No. There are no attributes of 'postcode'. There is no descriptive information, or label or other qualifying information about postcodes.
I said above that in a properly constructed relational database we have one table per entity. Table 2 is the table for the student entity and Table 3 is the one for the course entity. I have added a few more fields to the tables to make them more realistic. For instance, Table 3 has a column for 'Fee' and a column for 'Course_code'.
Table 2 Student entity table
|1||Lodhi||Mona||22 The Grove, Newport||AB12 3CD|
|2||Jones||Bob||2 High Street, Stratford||AB6 4PQ|
|3||Cherry||ColinI||59 Acacia Avenue, Brompton||AB2 12ZY|
|4||Cherry||Colin||13 The Limes, Leighton||XY6 7LR|
|5||Edwards||Delia||40 Eldon Court, Hampton||XY12 4TK|
|6||Roberts||Albert||18 Mount Pleasant, Greenhill||AB12 7UB|
|7||Singh||Sara||7 Marina View, Sutton||ABM 8WQ|
|8||Chang||Patrick||21 Green Lane, Newport||AB12 9TU|
|9||Evans||Mary||13 The Limes, Leighton||AB6 7LR|
Table 3 Course entity table
|E08||IT for all||5||£10|
The next step is to link the records in these two tables, which I will do shortly. Linking the records in separate tables requires the use of keys. A key uniquely identifies each record of a table. Table 1 did not have any keys. ('Name' cannot be a key because, as Colin Cherry shows, names might not be unique.) In Table 2 I have introduced a new column named 'Student_index'. Each entry in this new column uniquely identifies a particular record. The 'Student_index' in fact consists of just the sequential numbering of the records. This is the simplest way to create a key when there is no other key readily available.
The ultimate intention is to link students to courses through the keys. By associating a particular student with a particular course we have a way of representing an enrolment. We need therefore to think about a suitable key for Table 3.
Activity 8 (self-assessment)
In Table 3 I have not numbered the records as a key, as I did in Table 2, because I would like to use the course code as a key. How valid is the course code as a key under the following circumstances?
Courses are presented only once, and never repeated.
Courses are run only once per year, and repeated the following year.
Some courses are run twice or more per year.
If the course is never repeated, then we can safely use the course code as a key, because the code identifies something unique.
If courses are repeated annually there is a problem with using the course code as a key because it no longer represents something unique. For instance, students enrolling for Yoga in successive years are enrolling for different instances of the course. Each instance of the course would need its own record in the database. However, if the database is made afresh each year, and just covers one year's worth of data, then the problem does not arise and the course code can be used as a key.
If courses are run twice or more per year, there is a problem with using the course code as a key for the same reasons as in (2). That is, each presentation of the course is a new instance (assuming the database has a lifetime of a year or more), and so the code does not represent something unique.
In the following I am going to assume that the course code can be used as a key in Table 3; that is, the conditions in (1) or (2) of Activity 8 apply, with the proviso in the case of (2) that the database holds only a year's worth of data.
Splitting Table 1 into Tables 2 and 3 has lost the relationship between the entities 'student' and 'course'. We can no longer tell who is enrolled for what. The essence of the relational approach to database design is to relate entities to each other by relating their keys. In this example we would do this by constructing a joining table to capture the relationship between entities. Table 4 is my joining table. Student keys are on the left, and the keys from the courses table are on the right. To keep things simple, I am using the course code as the key although, as you saw in Activity 8, this is acceptable only under certain conditions.
I have annotated the table with names so that you can identify the records; but the annotations are not part of the table.
|Joining table for students and courses|
You will notice in Table 4 that student keys can appear more than once on the left, and courses can appear more than once on the right. This follows from the fact that the relationship between the student entity and the course entity is many-to-many. That is to say, one student can enrol for many courses, and each course can be taken by many students.
The entity tables and joining table taken together constitute the relational database. A real example might have many entity tables and many joining tables. It might also have no joining tables – for reasons I will come back to.
Notice how the new relational database solves the problems I identified with Table 1. If someone enrols for several courses, we just add more records to Table 4, as required. Each record in Table 4, therefore, represents the enrolment of a student on a course. We do not need to have empty fields available to accommodate additional enrolments by a student, as we did in Table 1. Also, if there is an administrative change to a course, such as the number of sessions, we need make only a single change, in Table 3. (You will recall that in Table 1, the repetition of the same piece of data in several places meant that administrative changes would require hunting out every occurrence of the data and modifying it.)
The process of organising data efficiently into tables, so that unnecessary repetition is avoided and so that each table represents a single entity and its attributes, is known as normalisation. The process we have gone through with Tables 1, 2, 3 and 4 is a very simple example of part of the normalisation process.
it contains data about more than one entity;
it attempts to capture the relationship between entities.
It is therefore mixing different functions, and this is why it is unsatisfactory.
Activity 9: (self-assessment)
Table 5 is a flat database relating to some regular committee meetings. What are the entities and their associated attributes?
Table 5 Committee information
|Committee||Meeting time||Member||Telephone number|
|Planning||Every Monday a.m.||Jones||1239|
|Recreation||Every Friday p.m.||Jones||1239|
|Education||Final Wednesday of month, p.m.||Patel||4728|
One entity is 'Committee'. The associated attribute is 'Meeting time'.
Another entity is 'Member'. The associated attribute is 'Telephone number'.
I mentioned that some relational databases might have no joining tables. The following activity is designed to show how this might come about.
Activity 10: (exploratory)
What would happen to the joining table, Table 4, if this particular local authority allowed people to enrol for only one course at a time? Write the first few rows of a new joining table on the assumption that each student takes only their 'Course1' as shown in Table 1.
Table 6: Modified joining table: one course per student
The relationship between the keys in Table 6 could be captured by a simple amendment to the student table, which would remove the need for a joining table. Table 7 shows the first few rows of the modified student table.
Table 7: Modified student entity table
|1||Lodhi||Mona||22 The Grove, Newport||AB12 3CD||E01|
|2||Jones||Bob||2 High Street, Stratford||AB6 4PQ||E08|
|3||Cherry||Colin||59 Acacia Avenue, Brompton||AB2 12ZY||E02|
|4||Cherry||Colin||13 The Limes, Leighton||XY6 7LR||E05|
|5||Edwards||Delia||40 Eldon Court, Hampton||XY12 4TK||E02|
The additional key in Table 7 (the course code) is known as a foreign key. Using a foreign key has enabled us to dispense with the joining table, but only because the relationship between students and courses is no longer many-to-many. When the relationship between entities is many-to-many, a joining table is needed.
Activity 11 (self-assessment)
Suppose Table 5 is to be normalised into separate tables for the 'Committee' entity and the 'Member' entity.
Why is a joining table needed?
Construct the joining table, using Committee as one key and Member as the other.
A joining table is required because the relationship between 'Committee' and 'Member' is many-to-many. (A committee has several members, and each member can be on more than one committee.)
See Table 8.
Table 8 Joining table for Activity 11
This is as far as we shall go in constructing relational databases. The important point to appreciate from this brief introduction to the topic is that even a fairly simple example has called for a careful analysis of the nature of the information and how it is related. Many large-scale information systems succeed or fail on the quality of the analysis, and the quality of the design that follows from the analysis.
Some of the processes of database construction can be done automatically. For instance, if you construct a table like Table 1 in Microsoft Access, the program itself can make a reasonable attempt at creating entity tables and joining tables for you. However, for large-scale information systems, such as those involved in e-government, there is no substitute for a careful study of the nature of the information, its relationships, and its uses.