The new data infrastructure delivers advanced capabilities in data product delivery and data quality assurance.
On June 22nd Cleeng announced the successful release of its next generation data governance process. The goal of this project was to reform our ETL (extract, transform, load) process through reducing the number of technologies and applications used. This simplification was made possible by implementing 2 new tools: AWS Athena and dbt.
This project benefits Cleeng clients by achieving the following outcomes:
- Reduced time to provide new metrics
- Reduced time to deployment
- Increased data quality
What is dbt?
dbt (data build tool) is a modern analytics tool designed to address the problems that have emerged as the data footprint of modern businesses has grown exponentially in recent years.
Its core idea lies in creating a new, mature analytics workflow. This workflow is based on techniques that are already used by software engineers (modularity, version control, quality assurance, documentation). By following their best practices, we are now able to quickly and collaboratively deploy analytics code which is tested and monitored to an extremely high standard.
Image: The dbt self-generating table documentation
How has dbt improved our ETL process?
With this new tool, all data can be streamlined and collected in one space.
Now, instead of storing processes in 2 separate Spark projects, all necessary aggregations and data engineering operations are contained within dbt. All ChurnIQ team members participated in this transformation, which allowed the whole team to grow their understanding of the new data engineering processes.
Image: Automated mapping of table dependencies
Previously, the ChurnIQ team relied on the PostgreSQL database as a data source, but in this project we also wanted to secure wins in data loading time.
AWS Athena turned out to be a great solution for this problem. It gives us more flexibility and most importantly, fast data delivery. Adding new features is now easier to control. And any bugs can be detected faster because every developer's own changes can be tested in Looker.
As a result, we gain better control of the whole process.
Image: Data schema quality testing
With this newfound control, developers can focus on data transformation instead of making configurations in the EMR cluster. Having a more predictable process also means we can better assess the complexity of our tasks, and make more accurate predictions about delivery time.
What dbt means for OTT streaming services
The development of ChurnIQ 2.0 and ChurnIQ Segments reflects the need for OTT service providers to use more purpose-built analytics platforms for growing their premium audiences. The implementation of DBT and AWS Athena allows Cleeng to ensure even higher quality in its current analytics platform, but it also provides an engine for faster, yet more stable development of new data features.
By minimizing potential ‘points of failure’ in the ETL process, and simplifying the process as a whole, it means that we are able to react in a more efficient way to both customer needs and vulnerabilities within the data pipeline.
This gives Cleeng a platform not just for faster product development, but for improvement across every dimension of the data experience for our clients.
Want to reap the benefits of Cleeng's streamlined data infrastructure?