IS-A relations in Relational Databases

Share on Google+0Share on Facebook0Tweet about this on Twitter0Email this to someone

In database (DB) design, IS-A relations is probably the most ignored technique. Even when the classification of entities is obvious, many choose to create one table with as many fields as to cover all cases, which eventually ends up being a sparse, space-wasting table. Others, on the other hand, prefer to implement one independent table per final entity (subclass). Those tables are totally irrelevant between them, and you finally end up shifting the problem of relationship definition to a higher level in the application model.

Even worse, the debate for the preferable implementation often comes down to the dillema: “a sparse table or many independent ones?” thus completely eliminating the possibility of an IS-A relation. Especially when it comes to “small” projects, where the efficiency and maintainability overhead is ignorable, these two options seem automatic, as opposed to the “complexity” of implementing a more structured schema.

In order to better understand the implementation of IS-A relations, as well as to compare them with other techniques, let’s imagine an e-shop’s DB. Among other things, it stores information for several types of people: employees, customers, suppliers and other contacts. It is obvious that all these people have a fullname, and probably a phone number. Employees have a salary and belong to a department, suppliers are related to a company and customers have a credit amount. Aditinally, employers and customers have usernames and passwords, so as to be authenticated. Of course, in a real-life scenario, there would be many more fields defined for each of these entities, but let’s keep it simple.

What is not so obvious, but is certainly implied, is the fact that these entity sets often overlap. For example, there is no reason why an employee or a supplier shouldn’t also be a customer. Moreover, a number of people could be neither of those — just “other contacts”.

Now, using some sample data, let’s compare the three methods mentioned above: the sparse table, the independent tables and the IS-A approach.

1. A wide, sparse table

I will start with the sparse table, not only because it provides a good visual representation of the example, but also because it could be a fairly good point to start a normalization process. So here it is:

people
id fullname phone salary department company credit username password
3465 John Doe 555-12345 1345.67 R&D jdoe **************
9674 Jane Doess 980.55 Sales 630.43 jane **************
5789 Michael Smith 555-98736 Widgets Co. 176.00 msmith **************
1287 Samantha Blue 555-03456 354.99 sammyblue **************
9824 Alice Red
7289 Ben Purple 555-98765 Acme Ltd.
1485 Vincent Maroon 555-37108

The reason why we call it a “sparse table” is very clear now. As the table grows, more and more space will be wasted. But that’s not really the biggest issue.

All data being in the same database object (a table), makes it almost impossible to apply the necessary constraints at the DB layer. For example, since the department column cannot be defined as “NOT NULL”, what if a person has a salary but the corresponding department cell is null? Or, how can we enforce uniqueness to the username column when it accepts nulls?

2. Many independent tables

We will implement one table for each entity: employee, customer, supplier and other contact. Each table contains as many fields as the attributes of each individual entity, plus a numeric primary key. Of course, some of them have attributes that uniquely define the tuple and are candidate primary keys, but let’s do it this way for uniformity between the methods.

employee
id fullname phone salary department username password
4589 John Doe 555-12345 1345.67 R&D jdoe **************
7856 Jane Doess 980.55 Sales jane **************
customer
id fullname phone credit username password
1085 Jane Doess 630.43 doess **************
3718 Michael Smith 555-98736 176.00 msmith **************
3376 Samantha Blue 555-03456 354.99 sammyblue **************
supplier
id fullname phone company
9836 Michael Smith 555-98736 Widgets Co.
3376 Ben Purple 555-98765 Acme Ltd.
other_contact
id fullname phone
1234 Alice Red
4589 Vincent Maroon 555-37108

The inability to apply constraints is now partially eliminated. A username can now be unique, but only within the same table, not throughout the system. Also, we can guarantee that the salary as well as the department column will always be filled.

We have a space-wasting problem too, though, due to the overlapping of the entity sets. Jane and Michael appear in two tables each, carrying their fullname and phone with them. Moreover, because Jane is an employee and a customer, her username and password appear in both corresponding tables. This introduces the need for synchronization of data in many different places which can be a tricky task. Or, concerning the account info, the compromise that Jane would have to have a different account for each subsystem (as she does, in the example).

Finally, in order to retrieve a contiguous list of any of the common data, we would have to execute UNION queries. For example, to retrieve a global name/phone list, a UNION between all four tables is required.

3. The IS-A approach

To implement an IS-A relation, we must first define all the classes, from the root class all the way down to the final ones. It is obvious that the root class is the person with the fullname and phone attributes. The persons who can be authenticated will be called “users” with attributes username and password, who are further classified as employees(salary,department) and customers(credit). Apart from users, there are suppliers(company) and other contacts with no extra attributes. To sum it all up:

Let person(fullname,phone). A user(username,password) is a person. An employee(salary,department) is a user. A customer(credit) is a user. A supplier(company) is a person.

Each of the following tables only contains the fields that are specific to the corresponding (sub)class. The id field is a primary key and, at the same time, a foreign key to the immediate superclass (with the exception of person, which has no superclass). For example, the user.id is the user table’s primary key and a foreign key to the person.id column. This way, both a class’s connection to its superclass and the uniqueness of each instance in its class are guaranteed.

person
id fullname phone
3465 John Doe 555-12345
9674 Jane Doess
5789 Michael Smith 555-98736
1287 Samantha Blue 555-03456
9824 Alice Red
7289 Ben Purple 555-98765
1485 Vincent Maroon 555-37108
user
id username password
3465 jdoe **************
9674 jane **************
5789 msmith **************
1287 sammyblue **************
customer
id credit
9674 630.43
5789 176.00
1287 354.99
employee
id salary department
3465 1345.67 R&D
9674 980.55 Sales
supplier
id company
5789 Widgets Co.
7289 Acme Ltd.

Note that there is no need to implement a table for “other contacts”, since they have no extra attributes.

The waste of space is now zero. We can apply any constraints necessary. Every instance can belong to any of the classes, but only once in each, provided that there is a record of it in the immediate superclass. We can easily extract cumulative data for any level in the hierarchy. No synchronization is necessary and each user has a single account throughout the system.

Conclusions

IS-A relations align with the concept of a self-sustained DB, regardless of the applications that use it. They help in distinguishing the subsystems and in system maintainability, because if a further expansion of the hierarchy is required, higher levels are not affected. Moreover, if a class structure needs to be changed, only the corresponding table is affected. All this, using the existing RDBMS functionality.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
Share on Google+0Share on Facebook0Tweet about this on Twitter0Email this to someone

Submit comment

Allowed HTML tags: <a href="http://google.com">google</a> <strong>bold</strong> <em>emphasized</em> <code>code</code> <blockquote>
quote
</blockquote>