After working with many customers over the past several years, I believe that today’s data warehouses are going through a similar journey of maturity. In other words, data warehouses are still developing and consequently still fall short of expectations.
Think about the demands and expectations now coming from our users. The primary purposes of a data warehouse are no longer just long term data storage and data archiving. The data warehouse must also support operational, transactional, and analytical needs with results returning in real-time.
For data warehouses to continue maturing, we as the data guardians must collectively establish three key critical expectations: eliminate data redundancy, unify the enterprise data model, and ultimately consume from a single version of the truth. Let’s look at these three requirements in more detail.
Eliminate Data Redundancy
Most data flows in a data warehouse are built with the best of intentions and the utmost concern for efficiency and best practice. Yet the final outcome often disappoints. Does this scenario sound familiar? A data warehouse developer receives the initial requirements for a new data mart: load the data, code the business logic, and make the data available for reporting. The request is straight-forward with low risk. After a few weeks of development, the project completes and the customer is very happy. However, the opportunity for not meeting expectations presents itself with the follow-up request. “Can you quickly add the XYZ dataset?” the customer/end user asks. The developer soon discovers the deliverable requires changing the data flow, updating previous logic, and returning more fields in the result. The request is finally ready and performance unfortunately suffers at the hand of the new functionality. The developer must then incorporate a combination of indexes, aggregates, and/or materialization techniques to deliver a favorable end user experience. Every time this cycle repeats, the data warehouse bloats exponentially in size, complexity, and maintenance cost.
With the speed and power of SAP HANA in-memory technology, the scenario above could play out with an alternative ending. Because accessing data in memory is significantly faster than accessing data on disk, data duplication to increase performance is no longer necessary. The customer table now only exists once. The sales order table now only exists once. Every data element only has to exist once. This simplification of the data model and relative data flows now lead to several key benefits. To name a few: the number and length of overnight batch jobs reduce, development lead times are shorter, and the hardware stack simplifies.
Unify the enterprise data model
I often tell customers to consider their Enterprise Data Model as a collection of logical building blocks (views) which combine to form new definitions of logic. The bottom layer of views should be the most generic in definition and the top layer should be the most specific. As an example, the lowest level might contain a view for customers and another view for sales orders. These two views can combine to enrich the sales order view with customer data. Eventually the architectural pattern continues until every relevant object in the enterprise consists of one or more views – all based on a single copy of the data.
With SAP HANA, calculation views offer distinct advantages for use in the view layer and are the best choice to build the definitions of your enterprise data warehouse model. First, the developer can choose to implement logic with a graphical interface as well as with scripted code. One example of scripting is the use of custom functions to return scalar or tabular results. Second, the calculation view has similar usability as a table. Just like one can “SELECT from TABLE”, one can also “SELECT FROM CALCULATION_VIEW.” This open access means that any person, process, or system that can submit a SELECT statement to SAP HANA can directly access the logic in a calculation view. Finally, calculation views can utilize data sitting in SAP HANA as well as data in other systems. A landscape design may have sales orders in two different systems with two different meta-data definitions. Using Calculation Views, the two datasets can be virtually harmonized and share the same “look and feel” as if the data was from one database. To the end user, the data just came from one source.
Consider this thought:
There is inherent simplicity, agility, and opportunity for innovation in an Enterprise Data Model lacking duplication or ambiguity.
Consume from a single version of the truth
Have you ever attended a meeting where two colleagues presented conflicting results using “the same dataset”? Have you ever found yourself in the middle of another country and not able to order food or to purchase items from a shop? Did your new dog not sit when you gave the command? The red thread in these three scenarios is the lack of a common understanding. The meaning of words, phrases, and facts for one person (or dog) might carry a slightly or significantly different meaning for another. Unity in knowledge and understanding must exist before progress can occur. Within a data warehouse context, I refer to this unity as a single version of the truth.
Every person, process, or system should be able to produce the same data results, every-time. Factors like device, client, or interface should no longer influence the results. If there is a view called sales orders, then any person, process, or system accessing the data warehouse should always get the same sales order data. The great thing here is that the object definitions via views are no longer single purpose. The same view for sales order analytics now sources outbound interfaces, custom application data sources, or even advanced analytic engines like predictive, text, spatial, or graph. Thus, the definition of a single version of the truth: one object with one business definition serving all use cases.
I would also like to point out that a big benefit of establishing a single version of truth is the formation of the Innovation Cycle. The cycle begins with the consumption of data from the enterprise data model and a single copy of the data. As the business validates the data, trust grows as well as the willingness to use the data. This trust always leads to the search for more insight. Why did an event occur? What chance exists that the event will occur again in the future? Insight leads to action as well as positive business outcomes. Positive outcomes lead to requests for new additions and/or enhancements to the data warehouse. The cycle now repeats itself – further enhancing the organization’s one single version of the truth.
The necessity for a real-time, enterprise data warehouse grows more critical everyday. Does your current data warehouse strategy eliminate data redundancy, establish an enterprise data model, and build-out a single version of the truth? If not, I highly recommend reaching out to your SAP account executive as well as reviewing the links below to learn more about SAP HANA. It may be time to expect better from your data warehouse.
SAP HANA SQL Data Warehousing: https://www.sap.com/community/topics/sql-data-warehousing.html
SAP HANA: https://www.sap.com/products/hana.html
Start today building your data warehouse in the cloud: https://cloudplatform.sap.com/hana