IS-A relations in Relational Databases

In database design, IS-A relations is probably the most ignored technique. Even when the classification of entities is obvious, many choose to create one table with as many fields as to cover all cases, which eventually ends up being a sparse, space-wasting table. Others, on the other hand, prefer to implement one independent table per final entity (subclass). Those tables are totally irrelevant between them, and you finally end up shifting the problem of relationship definition to a higher level in the application model.

Even worse, the debate for the preferable implementation often comes down to the dillema: “a sparse table or many independent ones?” thus completely eliminating the possibility of an IS-A relation. Especially when it comes to “small” projects, where the efficiency and maintainability overhead is ignorable, these two options seem automatic, as opposed to the “complexity” of implementing a more structured schema.

In order to better understand the implementation of IS-A relations, as well as to compare them with other techniques, let’s imagine an e-shop’s DB. Among other things, it stores information for several types of people: employees, customers, suppliers and other contacts. It is obvious that all these people have a fullname, and probably a phone number. Employees have a salary and belong to a department, suppliers are related to a company and customers have a credit amount. Aditinally, employers and customers have usernames and passwords, so as to be authenticated. Of course, in a real-life scenario, there would be many more fields defined for each of these entities, but let’s keep it simple.

What is not so obvious, but is certainly implied, is the fact that these entity sets often overlap. For example, there is no reason why an employee or a supplier shouldn’t also be a customer. Moreover, a number of people could be neither of those — just “other contacts”.

Now, using some sample data, let’s compare the three methods mentioned above: the sparse table, the independent tables and the IS-A approach.

1. A wide, sparse table

I will start with the sparse table, not only because it provides a good visual representation of the example, but also because it could be a fairly good point to start a normalization process. So here it is:

*people*
id	fullname	phone	salary	department	company	credit	username	password
3465	John Doe	555-12345	1345.67	R&D			jdoe	**************
9674	Jane Doess		980.55	Sales		630.43	jane	**************
5789	Michael Smith	555-98736			Widgets Co.	176.00	msmith	**************
1287	Samantha Blue	555-03456				354.99	sammyblue	**************
9824	Alice Red
7289	Ben Purple	555-98765			Acme Ltd.
1485	Vincent Maroon	555-37108

The reason why we call it a “sparse table” is very clear now. As the table grows, more and more space will be wasted. But that’s not really the biggest issue.

All data being in the same database object (a table), makes it almost impossible to apply the necessary constraints at the DB layer. For example, since the department column cannot be defined as “NOT NULL”, what if a person has a salary but the corresponding department cell is null? Or, how can we enforce uniqueness to the username column when it accepts nulls?

2. Many independent tables

We will implement one table for each entity: employee, customer, supplier and other contact. Each table contains as many fields as the attributes of each individual entity, plus a numeric primary key. Of course, some of them have attributes that uniquely define the tuple and are candidate primary keys, but let’s do it this way for uniformity between the methods.

*employee*
id	fullname	phone	salary	department	username	password
4589	John Doe	555-12345	1345.67	R&D	jdoe	**************
7856	Jane Doess		980.55	Sales	jane	**************

*customer*
id	fullname	phone	credit	username	password
1085	Jane Doess		630.43	doess	**************
3718	Michael Smith	555-98736	176.00	msmith	**************
3376	Samantha Blue	555-03456	354.99	sammyblue	**************

*supplier*
id	fullname	phone	company
9836	Michael Smith	555-98736	Widgets Co.
3376	Ben Purple	555-98765	Acme Ltd.

*other_contact*
id	fullname	phone
1234	Alice Red
4589	Vincent Maroon	555-37108

The inability to apply constraints is now partially eliminated. A username can now be unique, but only within the same table, not throughout the system. Also, we can guarantee that the salary as well as the department column will always be filled.

We have a space-wasting problem too, though, due to the overlapping of the entity sets. Jane and Michael appear in two tables each, carrying their fullname and phone with them. Moreover, because Jane is an employee and a customer, her username and password appear in both corresponding tables. This introduces the need for synchronization of data in many different places which can be a tricky task. Or, concerning the account info, the compromise that Jane would have to have a different account for each subsystem (as she does, in the example).

Finally, in order to retrieve a contiguous list of any of the common data, we would have to execute UNION queries. For example, to retrieve a global name/phone list, a UNION between all four tables is required.

3. The IS-A approach

To implement an IS-A relation, we must first define all the classes, from the root class all the way down to the final ones. It is obvious that the root class is the person with the fullname and phone attributes. The persons who can be authenticated will be called “users” with attributes username and password, who are further classified as employees(salary,department) and customers(credit). Apart from users, there are suppliers(company) and other contacts with no extra attributes. To sum it all up:

Let person(fullname,phone). A user(username,password) is a person. An employee(salary,department) is a user. A customer(credit) is a user. A supplier(company) is a person.

Each of the following tables only contains the fields that are specific to the corresponding (sub)class. The id field is a primary key and, at the same time, a foreign key to the immediate superclass (with the exception of person, which has no superclass). For example, the user.id is the user table’s primary key and a foreign key to the person.id column. This way, both a class’s connection to its superclass and the uniqueness of each instance in its class are guaranteed.

person
id	fullname	phone
3465	John Doe	555-12345
9674	Jane Doess
5789	Michael Smith	555-98736
1287	Samantha Blue	555-03456
9824	Alice Red
7289	Ben Purple	555-98765
1485	Vincent Maroon	555-37108

user
id	username	password
3465	jdoe	**************
9674	jane	**************
5789	msmith	**************
1287	sammyblue	**************

*customer*
id	credit
9674	630.43
5789	176.00
1287	354.99

*employee*
id	salary	department
3465	1345.67	R&D
9674	980.55	Sales

supplier
id	company
5789	Widgets Co.
7289	Acme Ltd.

Note that there is no need to implement a table for “other contacts”, since they have no extra attributes.

The waste of space is now zero. We can apply any constraints necessary. Every instance can belong to any of the classes, but only once in each, provided that there is a record of it in the immediate superclass. We can easily extract cumulative data for any level in the hierarchy. No synchronization is necessary and each user has a single account throughout the system.

Conclusions

IS-A relations align with the concept of a self-sustained DB, regardless of the applications that use it. They help in distinguishing the subsystems and in system maintainability, because if a further expansion of the hierarchy is required, higher levels are not affected. Moreover, if a class structure needs to be changed, only the corresponding table is affected. All this, using the existing RDBMS functionality.

1. A wide, sparse table

2. Many independent tables

3. The IS-A approach

Conclusions

Leave a Reply Cancel reply