On a fundamental level, learning a software application involves understanding where its data resides.
You most likely don’t know where the server that holds your database resides – it could be in a closet down the hall, in your data center, or somewhere in the cloud. But one thing that is available for you to learn is the logical model of the database. That is a map of its tables and how they relate to each other. In the synonym rich world of IT, tables are called “entities,” so this map is called an entity-relationship (or ER) diagram.
Why Should Business Users Care About Data Modeling?
If you are a business user, you may have never seen an ER diagram. That’s okay, in fact, it’s probably more information that you need. You can learn a lot just by getting a brief overview of your application’s database design from a developer or data architect. It’s worth asking them for a short walkthrough. When you understand where your data is stored on database tables, you then you can start to understand:
- what actual information the database holds
- which are the most important fields, the ones you are always going to hear about
- what information the database doesn’t have, which will require connecting to, or programmatically retrieving it from, another database
Knowing this basic information can help you as a business user clarify and engage in more technical discussions. It can help you formulate what information you need to report on, and suggest questions to ask of your data.
In fact, answering these questions first is pretty much what a developer who is new to a database does to become familiar with it. You start by identifying tables, the data fields they contain, and their relationships with each other.
Here’s one application, that ancestor of SaaS applications, Salesforce. It’s a pretty good example for us to use, because it’s familiar to many business people, and because it’s easy to find ER diagrams and record/field information online. Let’s see some examples.
Here’s an example of a menu bar in Salesforce:
It displays some of the most commonly used entities in the application: Accounts, Opportunities, Contacts. Each of these translate into a table on the database: the Accounts table, the Opportunities table and the Contacts table.
And here they are, on an ER diagram for Salesforce:
The type of connecting lines from the Accounts record tells us that an Account can have many Contacts and/or Opportunities connected to it (or none at all.) Seems super simple, right? Only it’s not. For one, there are lots of other entities that interact with Accounts, Opportunities and Contacts – they’re just not shown on this diagram.
Also not shown on this particular diagram, but still important, is what data is actually contained in these tables. For example, in Salesforce an Opportunity record represents a possible sale. In itself, an Opportunity is not a sale. Unfortunately (at least, for a salesperson) there can be many stages before an opportunity reaches the desired stage of “Closed – Won.” Perhaps a person fills out a website form, then talks to a sales rep the next day, then attends a demo the following Tuesday.
Salesforce lets us track this kind of activity by assigning stages to the opportunity. When you need to add additional fields to an Excel spreadsheet, you can just add some more columns at the end. But we can’t just keep appending information about these stages on the Opportunity record. A database enforces rules to ensure its data remains consistent. One of those rules (it’s called First Normal Form) is that you can’t have repeating groups of data on a record. So if you have lots of the same kind of data belonging to an entity, like different events that happen on the journey to become a customer, you have to break them out into a separate table.
That’s exactly what Salesforce does, by creating an Opportunity History table that contains a record for each time there is a change to an Opportunity’s stage. Each of these belongs to a single Opportunity record, which can be identified by the Opportunity ID field, the primary key on the Opportunity record.
Here’s another example: if you have a Customer record, can that customer have more than one mailing address? Perhaps a customer will only ever have a billing address. If so, that information is perhaps best held on the Customer record. But what if (as you may have noticed on your Amazon account) a customer can have dozens of mailing addresses? If this is the case, you will need a dedicated table to hold those addresses.
When we model data, we make sure that the layout of our database conforms to both our business rules and database’s own rules. For example, an Account in Salesforce can have a Parent Account. Perhaps the Account is itself a small division of a larger organization in the system. A Parent Account can be connected to many other Accounts in SF, but an Account can never have more than one Parent. In addition to describing tables and keys in a database, a database schema also contains rules that prevent scenarios like this from occurring.
Here’s another database rule: to prevent redundancy, database fields should not exist as copies on multiple tables. For example, you should hold the Account billing address in one place (in Salesforce, on the Account record.) If you store the billing address on some other record, it is possible that a change to the address on one table will not get applied to versions that exist elsewhere. Rules like this prevent incorrect data from being written to the database. They maintain the integrity of the database as well, by preventing deletion of critical fields like primary keys.
When you are developing a new database system, there is a lot of thought that has to go into it. What are all the possible variations that could occur in the data? How should we group related data fields into tables? Your team has to spend some time thinking about what characteristics (“attributes”) a table (“entity”) has, and how these relate to other tables. Fortunately for data architects, considering all the possible permutations and complications of the data is an interesting intellectual exercise.
Of course, if you’re a business user, you have work to do. There’s payroll to run, or statements to print or quarterly reports to prepare. It may seem like understanding the intricacies of a database is not the best use of your time. But if you are looking to advance your career by becoming a subject matter expert, or if you have just joined an organization and feel lost when confronted with your ERP application, it’s worth a few hours of your time. Ask a data architect, business analyst or database administrator familiar with your application to give you a basic introduction your database. It will provide you with some context to help when you need to build reports, or when new data sources are added, or even on those rare occasions when a production problem occurs.