Solution: IBM CDC was used to extract changed data from mainframes as well as from Oracle RDBMS, these data were then fed to a Kafka cluster and then posted to Cloudera. IBM BigIntegrate jobs were used to process data on Cloudera to generate relevant output data for Business Intelligence and Analytics. As part of the engagement governance catalog was also configured to track technical assets and provide end-to-end lineage.
- Real-time Visibility: Real-time replication of data from various source systems to Cloudera (Data Lake)
- End-to-End Automation: Leveraged native Hadoop capabilities through IBM products to process data to perform transformation and speed up processing of terabytes of data
- Streamlined MIS: Enabled day to day operational reporting
- Enabled Analytics: Supported downstream Analytics Users for model building with near real-time data