Data engineering with industrial efficiency

We work differently from other data teams. While most data teams use artisanal methods based on data warehousing, we build “industrial” workflows, and achieve a different level of development productivity and cost efficiency.

The most data-mature companies have taken automated data processing to an industrial level, where data innovation happens at a different scale and speed. In industrialised data processing, work has shifted from the data itself to automated, resilient workflows that process the data. The transition is similar to the evolution of software engineering from the fourth-generation languages (4GL) age of low-code programming tools and manual deployments to modern, highly automated DevOps processes with continuous deployment pipelines and container orchestration.

We participated in building the first and most successful large-scale industrial data platform in Scandinavia a decade ago. Based on those principles and modern cloud components, we built the first lightweight industrial data platform, and have applied this architecture multiple times to match data leaders in productivity and cost efficiency.

Productivity key performance indicators (KPIs) differ between artisanal and industrial methods by orders of magnitude. In a typical enterprise data platform based on a data warehouse, the number of data flows is counted in the 10s or 100s, producing 100s or the 1000s of datasets per day, whereas the most mature industrialised environments have 1000s or 10000s of flows producing millions or even billions of datasets per day.

We have cracked the code of replicating the productivity and cost efficiency of the technology leaders in small, lightweight environments. When engaging with Scling, you can expect us to achieve the following productivity and operations metrics, unless we are constrained by external factors:

These numbers are sustained over time and based on measurements from code repositories and data lakes. Normalised per developer, they match the numbers of the data leaders, without requiring huge platform investments. The numbers are more than 10x improvements over teams we have observed that build data flows on data warehouses, lakehouses, or stream processing. We likewise observe other teams to have more than 10 times higher cloud costs for operating data flows.

The efficiency differences may seem large, but the annual State of DevOps Report is based on systematic measurements at scale and reports 100-1000x differences between leaders and followers on software engineering productivity KPIs. Similar spans in data engineering are expected.

Scling is not the lowest cost option per person or per hour, but we have never encountered a data team outside the tech giants that is more cost-effective. This means more value for you and ultimately, a lower total cost of ownership for your data solutions.