Leveraging the Power of Data Lakes
Who is Workhuman?
Workhuman is a cloud-based provider of human capital management solutions. The company's social recognition platform helps employees recognise and reward one another while reducing employee turnover and improving engagement. Workhuman helps the world’s leading brands, like LinkedIn and Eaton, build cultures that leverage the power of human connection.
Impact Spark Made:
- 92% increase in the frequency of data updates, from 24 hours to 2 hours
- Reduced the average data ingestion time by 77%.
Value Spark Delivered:
- Deployed Spark testing framework tool that reduced testing from 6 days to 6 mins.
- Design and development of new features for the data platform.
- Design and development of the data pipeline that provides Workhuman with an incremental data capture system.
What Workhuman needed:
Data Platform Pains
Workhuman has several products and integrates into many third-party applications and platforms. Available to internal and external users, Workhuman utilises a data platform to monitor and understand how these products are deployed and used. The data platform is primarily sourced from Oracle, but problems were identified in the ingestion process.
- Update Frequency: Users have expressed needing data updates at their chosen frequency. Unfortunately, the current ingestion process is not designed to support frequent updates. When data is ingested, the data pipeline launches a full load to retrieve all the data from the operating system. This process takes at least six hours. Therefore, real-time updates are not achievable in this environment.
- Error Recovery Process: If data ingestion errors occur, the process restarts, and users must wait 6 hours to obtain the data.
- Testing Strategy: Testing the data platform is manual, but an automated approach can reduce redundancies and streamline the process.
- Governance: A challenge/ need that has not been explored, data governance is a fundamental component of any data platform.
Want a copy of this Case Study?
We start our engagements with a Discovery. This is the understanding phase - understanding where Workhuman is at, where they need to be and how they get there. During the discovery phase, consisting of several workshops, we partnered with the business and engineering stakeholders to gain an understanding of their pain points and goals related to ingesting data into Oracle. Building on this input, we set an objective to increase the number of data sources that can be ingested and build a more robust data platform.
Oracle Ingestion Process
Consulting: Spark found options for resolving pain points, considered the impact on other systems, and weighed these factors to recommend to Workhuman stakeholders.
Consulted with key stakeholders to understand their testing strategies. And licensed Spark’s test framework tool.
Outcome & Results.
It takes a fair amount of effort to ingest new sources into the data lake. The pipeline implementation follows an infrastructure as code approach and has been highly parameterizable to reduce development and manual intervention if new tables have to be added to the data platform.
Spark provided the design and development of the data pipeline that provides Workhuman with an incremental data capture system, reducing the update frequency from 24 hours to 2 hours. This will allow final users to get more frequently updated reports.
Significantly reduced the ingestion process time.