In database (DB) design, IS-A relations is probably the most ignored technique. Even when the classification of entities is obvious, many choose to create one table with as many fields as to cover all cases, which eventually ends up being a sparse, space-wasting table. Others, on the other hand, prefer to implement one independent table per final entity (subclass). Those tables are totally irrelevant between them, and you finally end up shifting the problem of relationship definition to a higher level in the application model.
Even worse, the debate for the preferable implementation often comes down to the dillema: “a sparse table or many independent ones?” thus completely eliminating the possibility of an IS-A relation. Especially when it comes to “small” projects, where the efficiency and maintainability overhead is ignorable, these two options seem automatic, as opposed to the “complexity” of implementing a more structured schema.
In order to better understand the implementation of IS-A relations, as well as to compare them with other techniques, let’s imagine an e-shop’s DB. Among other things, it stores information for several types of people: employees, customers, suppliers and other contacts. It is obvious that all these people have a fullname, and probably a phone number. Employees have a salary and belong to a department, suppliers are related to a company and customers have a credit amount. Aditinally, employers and customers have usernames and passwords, so as to be authenticated. Of course, in a real-life scenario, there would be many more fields defined for each of these entities, but let’s keep it simple.
What is not so obvious, but is certainly implied, is the fact that these entity sets often overlap. For example, there is no reason why an employee or a supplier shouldn’t also be a customer. Moreover, a number of people could be neither of those — just “other contacts”.
Now, using some sample data, let’s compare the three methods mentioned above: the sparse table, the independent tables and the IS-A approach.
1. A wide, sparse table
I will start with the sparse table, not only because it provides a good visual representation of the example, but also because it could be a fairly good point to start a normalization process. So here it is:
|5789||Michael Smith||555-98736||Widgets Co.||176.00||msmith||**************|
|7289||Ben Purple||555-98765||Acme Ltd.|
The reason why we call it a “sparse table” is very clear now. As the table grows, more and more space will be wasted. But that’s not really the biggest issue.
All data being in the same database object (a table), makes it almost impossible to apply the necessary constraints at the DB layer. For example, since the department column cannot be defined as “NOT NULL”, what if a person has a salary but the corresponding department cell is null? Or, how can we enforce uniqueness to the username column when it accepts nulls?
2. Many independent tables
We will implement one table for each entity: employee, customer, supplier and other contact. Each table contains as many fields as the attributes of each individual entity, plus a numeric primary key. Of course, some of them have attributes that uniquely define the tuple and are candidate primary keys, but let’s do it this way for uniformity between the methods.
|9836||Michael Smith||555-98736||Widgets Co.|
|3376||Ben Purple||555-98765||Acme Ltd.|
The inability to apply constraints is now partially eliminated. A username can now be unique, but only within the same table, not throughout the system. Also, we can guarantee that the salary as well as the department column will always be filled.
We have a space-wasting problem too, though, due to the overlapping of the entity sets. Jane and Michael appear in two tables each, carrying their fullname and phone with them. Moreover, because Jane is an employee and a customer, her username and password appear in both corresponding tables. This introduces the need for synchronization of data in many different places which can be a tricky task. Or, concerning the account info, the compromise that Jane would have to have a different account for each subsystem (as she does, in the example).
Finally, in order to retrieve a contiguous list of any of the common data, we would have to execute UNION queries. For example, to retrieve a global name/phone list, a UNION between all four tables is required.
3. The IS-A approach
To implement an IS-A relation, we must first define all the classes, from the root class all the way down to the final ones. It is obvious that the root class is the person with the fullname and phone attributes. The persons who can be authenticated will be called “users” with attributes username and password, who are further classified as employees(salary,department) and customers(credit). Apart from users, there are suppliers(company) and other contacts with no extra attributes. To sum it all up:
Let person(fullname,phone). A user(username,password) is a person. An employee(salary,department) is a user. A customer(credit) is a user. A supplier(company) is a person.
Each of the following tables only contains the fields that are specific to the corresponding (sub)class. The id field is a primary key and, at the same time, a foreign key to the immediate superclass (with the exception of person, which has no superclass). For example, the user.id is the user table’s primary key and a foreign key to the person.id column. This way, both a class’s connection to its superclass and the uniqueness of each instance in its class are guaranteed.
Note that there is no need to implement a table for “other contacts”, since they have no extra attributes.
The waste of space is now zero. We can apply any constraints necessary. Every instance can belong to any of the classes, but only once in each, provided that there is a record of it in the immediate superclass. We can easily extract cumulative data for any level in the hierarchy. No synchronization is necessary and each user has a single account throughout the system.
IS-A relations align with the concept of a self-sustained DB, regardless of the applications that use it. They help in distinguishing the subsystems and in system maintainability, because if a further expansion of the hierarchy is required, higher levels are not affected. Moreover, if a class structure needs to be changed, only the corresponding table is affected. All this, using the existing RDBMS functionality.