About the client: One of the Leading payment solutions provider in India.
Business Challenges: Client had undertaken an initiative to implement Cloudera based Data Lake and were facing huge challenges in the acquisition of data from their mainframe applications based on Z/OS as well as from dozens of satellite applications based on Oracle. Z/OS being proprietary in nature did not allow for easy access of data. Client needed a sophisticated solution to extract data in real-time from not only mainframes from also various applications and store them in Cloudera. Client wanted to leverage native Hadoop capabilities for business intelligence reporting as well as for Predictive Analytics.

Solution: IBM CDC was used to extract changed data from mainframes as well as from Oracle RDBMS, these data were then fed to a Kafka cluster and then posted to Cloudera. IBM BigIntegrate jobs were used to process data on Cloudera to generate relevant output data for Business Intelligence and Analytics. As part of the engagement governance catalog was also configured to track technical assets and provide end-to-end lineage.

  1. Real-time Visibility: Real-time replication of data from various source systems to Cloudera (Data Lake)
  2. End-to-End Automation: Leveraged native Hadoop capabilities through IBM products to process data to perform transformation and speed up processing of terabytes of data
  3. Streamlined MIS: Enabled day to day operational reporting
  4. Enabled Analytics: Supported downstream Analytics Users for model building with near real-time data
Technology: IBM CDC, IBM BigIntegrate (Datastage, Quality Stage, Governance Catalog)