page updated on November 02, 2015

Denormalize Data at Business Boundaries

If you're writing software for any sort of business, you spend a lot of time thinking about how to manage data. Where should it be stored? What's the most effective model? What kind of logging, auditing, and backing up is necessary? Who controls the data? How is it kept up to date?

The answers to all of these questions depend on how the data is used. Who's it for? What do they want? What do they need? How do they think about the data? A marketing company concerned with the daily clicks of hundreds of millions of users can afford to lose a few clicks here and there as long as the firehose of data is still accurate, while an investment company making money off of every trade needs to keep every piece of data as safe as possible.

It's easy to believe that a Social Network for Marmot Breeders handles its data differently from a Futures Market for Celebrities Writing Embarrassing Tweets. Yet even within both of these businesses, that data is captured, explored, analyzed, retained, and modified in multiple—sometimes incompatible—ways, and that's important to remember.

Data Design by "Don't Repeat Yourself"

Imagine a company focused on selling a product to large, institutional customers. The sales cycle is not short; you don't push a button to order dog treats and have them delivered straight to your door. It can take weeks or months for the salespeople to take a customer from a lead to a Closed Won Opportunity. Along the way, there's a lot of data to track: contact information, potential contract details, available contract details, shipping and billing information, and more.

Through the entire commercial relationship with the customer, much of this data has to flow through other systems. If there's a manufacturing or configuration process, those systems need to know specific customer information and specific order details. Undoubtedly the billing and payments and financial systems need to know financial data about individual orders—and if customers are set up to continue to order new products and services after the initial sales process, there's likely an inside sales group or online ordering system which can create new orders without resetting everything back to the lead tracking state.

... and, of course, all of these salespeople must be paid commissions.

If you were to design a data model for this company, you might be tempted to represent a customer as a single set of data relevant to all of the unique business units within your company. A lead must eventually produce the data useful for billing within your AR group if you're to make any money from the sale, of course.

Yet does it make sense to find a singular representation for this data across the four or five unique business contexts?

Business Contexts are Data Boundaries

Think about the ideal representation of data in these business contexts from the point of view of the business process dominating that context. For example, the sales process thinks in terms of opportunities. What's the state of the opportunity? Is it a mere lead? Is it a cold lead? Is it a qualified lead? Are you negotiating broad terms or figuring out contract specifics? Are you going into a Closed Won opportunity to update the point of contact for contract terms?

The transition of opportunities between all of those states dominates the sales process, and it probably should dominate the data modeling within that context, whereas opportunities mean very little to a manufacturing, purchasing, or Accounts Receivable process. The latter needs to understand contract terms, of course, but it ultimately concerns itself with producing an invoice sent to the customer. Some of the data that makes sense at the opportunity level (point of contact, preferred currency, contract details) make sense here as well, but the state transitions which are so important to CRM software are meaningless here; everything dealt with at this point has been in the Closed Won state that you can forget it even exists.

Perhaps you can come up with an ideal data representation which takes into account all of these various data models, but then you're tying all of your apps and services into a single data monolith—and you're likely to find that the flavor of the first system you build will forever haunt every other system you provide, such that everyone else will have to work around the assumptions you've made.

If, on the other hand, you accept that there is singular business purpose which defines operational contexts—track opportunities, provide a ledger of financial data for planning and analysis, calculate accurate sales commissions—you will find multiple distinct representations of subsets of this data.

Certainly this introduces another technical issue: that of synchronizing data between these operational contexts, but this reflects reality already between the humans using your systems. Understanding the non-software processes which allow your salespeople to call up your manufacturing leads and tell them to fire up the assembly lines will help you determine how and when to transfer exactly which data between systems.

(Understanding how to update data in multiple contexts when it changes—a customer management representative hears that her point of contact at a customer has changed, for example—is another process your business is already solving manually, but automating that process is the subject of another article.)