Indeed these terms are around for quite some time, however with time and growing business and IT demands, there are lot many changes happening. In this blog (which is the first one in the data Integration series) lets try to understand what is the meaning of these terms, approach, usage/use cases , trend and important consideration etc.
Definition/s:
Data Replication : This is the process of sharing information so as to ensure consistency between redundant resources, to improve reliability, fault-tolerance, or accessibility.( Source : Wiki). The above definition sounds very generic and almost complete as this covers most of important aspects of data Integration. If we add the direction part to it then I believe this becomes complete. When I say direction , it is usually data source to destination/target i.e. one way only. We will talk more about direction in Data Synchronization . One caution , Like others technology area, this too is evolving fast so you might come across numerous definition versions.
Data Synchronization: This is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time( Wiki). This is mere extension of Data Replication or can also be called Advanced Data Replication. To understand better, just visualize, data replication happening between resource A to B and B to A so that both the data sources A & B are synchronized.
Implementation Approach/s: There are are various ways to implement Data replication. I feel these fall under following two categories
- Direct approach : Using Data base Tables directly by using SQL or tools
- Indirect approach :Using logs created by RDBMS in real time ( Change Data capture).
Usecase : Usecases of Data replication are for DW/BI and non-DW/BI purposes. As the name suggests the first one is for integrating DWH and for BI ( operational reporting, dashboards etc). The non DW/BI falls under application integration category e.g. 360 degree view of customer data. Also for another entities like product, reference data this is widely used.
Trends:
External drivers like Increasing competition, shrinking economy has put organizations to strategize in terms of enabling Faster decision making, cost reduction and higher availability. So we can very well see initiatives are coming from IT as well as Business to meet company strategic vision.
As per TDWI survey 17% are using real time DW functionality and more than 90% are committed to use in coming 2-3 years.So one of the biggest trend in realtime Datawarehousing and analytics. There are many many cases in BI/DW area which gets impacted by this , some of these are Operational BI, on Demand management Dashboard, Alerts and Notifications etc.
For non BI/DW, the biggest following trend is single view of Customer data integration, Product data , reference data and some other data entities. And in order to have in-obstrusive and scalable implementations , organizations are opting for CDC ( Change Data Capture ) methodologies. Here comes Data Synchronization into picture because multiple applications / architecture are to be accommodated.
Points to take care:
- Understand 3Vs** of data ( Volume, Variety and Velocity ) requirements. Volume is about size of data. Variety is about the data type ( structured, un structured etc) and velocity is about frequency ( Batch, realtime etc).
- Choose the replication tool with advanced Capabilities in order to cope with scalability by handling heterogeneous sources, conflicts aroused during Data Synchronization , alerts and Notifications etc
- Document the interface specification( source and Target) clearly and keep it updated. This is mandatory as any small change in any source can create havoc.
- Real time integration brings technical challenges and complexities, however also provides tons of opportunities for Business Analyst to see different dimension of data which was not experienced before.
- Use the services of Data integration specialist is highly recommended.
- TDWI webinar ' Data replication for DWH and BI ' by Phillip Russom
- Wikipedia for Definitions
Legends
** 3Vs were coined by Philip Russom of TDWI in his Big data analytics research, however I think these beautiful terms can also be applied to any integration techniques
No comments:
Post a Comment