Posted by: Marc Breissinger
Abstraction, Architecture, Composite Software, Data Federation, Data Virtualization, Enterprise Architecture, Uncategorized
In How Data Abstraction Works Part 1, I outlined the challenges organizations face today as they deal with the diversity of cloud, big data, data warehouse, enterprise and external data sources. In it I made a case for data abstraction, in general, as well as data virtualization was a superior way to implement data abstraction, in particular.
In this blog, I will call upon the work of one of the best architects I know, Mike Tinius, to describe a reference architecture Mike developed and that you can use as a guide when abstracting data using Composite’s data virtualization platform or others.
Data Abstraction Reference Architecture
The diagram above captures the various layers in this reference architecture including:
- Data Consumers – Client applications need to retrieve data in various formats and protocols that they understand. Composite delivers the data to consumers using the most popular standards including SOAP, REST, JDBC, etc.
- Application Layer – The “Application Layer” serves to map the Business Layer into the format which each application data consumer wants to see. It might mean formatting into XML for Web services or creating views with different alias names that match the way the consumers are used to seeing their data.
- Business Layer – The “Business Layer” is predicated on the idea that the business has a standard or canonical way to describing key business entities such as customers and products. In the financial industry, one often accesses information according to financial instruments and issuers amongst many other entities. Typically, a data modeler would work with business experts and data providers to define a set of “logical” or “canonical” views that represent these business entities. These views are reusable components that can and should be used across business lines by multiple consumers.
- Physical Layer – The “Physical Layer” provides access to underlying data sources and performs a physical to logical mapping by integrating physical metadata and formatting views.
- The “Physical Metadata” is essentially imported from the physical data sources and used as way to onboard the metadata required by the data abstraction layer to perform its mapping functions. As an “as-is” layer, entity names and attributes are never changed in this layer.
- The “Formatting Views” provide a way to map the physical metadata into the Data Virtualization layer by aliasing the physical names to logical names. Additionally the formatting views can facilitate simple tasks such as value formatting, data type casting, derived columns and light data quality mapping. This layer is derived from the physical sources and performs a one-to-one mapping between the physical source attributes and their corresponding “logical/canonical” attribute name. This layer serves as a buffer between the physical source and the logical business layer views. As such, caching may be introduced at this level if and when it makes sense. Rebinding to different physical views during deployment is another role these views take on. Naming conventions are very important and introduced in this layer.
- Data Sources –The data sources are the physical information assets that exist within and without an organization. These assets may be databases, packaged applications such as SAP, Web services, Excel spreadsheets and more.
How Have You Implemented Data Abstraction?
I hope you can benefit from Mike’s great work. And Mike and I would be pleased to answer any questions you might have.
And if you have any data abstraction best practices you can share, we would love to learn about them as well.