Data factories

Scling’s primary business is data-factory-as-a-service - refining data to extract potential value for our clients, with an “industrial” level of efficiency that is normally reserved for a tiny set of highly technical companies. Today, almost all companies use artisanal methods to refine data, centered around databases, data warehouses, or lakehouses. While modern data warehouses are powerful tools to enhance human users, the processes are primarily driven by humans, and limited by our ability to oversee and steer processes at scale.

A tiny set of data mature, highly technical companies have taken automated data processing to an industrial level, where data value extraction and innovation happens on a completely different scale and speed. Scling’s staff has built “industrial” data platforms and driven data productivity improvements in such environments, and Scling offers these capabilities to the wider range of companies. We build and run data factories.

The data divide

Our services are based on practices that match practices within technically leading companies and actors in the world. This can be understood by observing the so-called data divide, which describes how leading companies creates value from data on an industrial level. The divide between artisanal and industrial data processing can be observed and quantified in two ways:

  1. the latency from idea to data artifact in production. For most companies, taking an idea of data innovation to a production flow that produces data artifacts takes weeks or months. The small fractions of companies that have mastered industrialised data processing are able to turn innovation ideas into data pipelines that evaluate new ideas on production data within hours or days.

  2. the number of artifacts produced from these data pipelines. Leading companies have thousands of pipelines producing millions, if not billions of valuable data artifacts. The majority of companies are still using artisanal processes based on database, data warehouse, or lakehouse technology, and operate dozens of crafted data flows, producing data artifacts at a rate of hundreds or thousands per day.

The difference in latency and operational efficiency translates to a corresponding huge difference in ability to innovate with data and AI. The well-known companies regarded as disruptive have all mastered data processing at an industrial level, but few others have. Scling has figured out to reach the industrial level, with limited investments in technology, using pragmatic ways-of-working based on our experience working in highly industrialised data teams. Providing our clients with this capability is our main offering.

We have never seen traditional ways of client collaboration (selling data tools or consultancy services) come close to achieving the level of productivity and effective innovation that leading companies do. Hence, we are forced to innovate in terms of collaboration and compensation models in order to establish the ways of working that we know to enable effective data innovation. In our collaboration, we are driven by delivering valuable data artifacts over building complex technical solutions. Our compensation model ensures the goals of our customers and Scling are aligned - we succeed when our customers get tangible value from their data. We get paid for things that are running in production, not for the hours we put in to develop those things.

Collaboration model

A partnership with Scling is an agile contract and an iterative journey together. In order to build complex data features, it is essentially to start simple, and learn in small steps along the path. Scling therefore offers a business model where customers subscribe to a value stream of development deliverables, which incrementally improve their data flows.

The starting point of the journey to data maturity depends on your current situation and level of data maturity. For example, the journey might start by us together forming an inventory of available data within your company, and hold workshops to determine use cases with the highest business value potential. Before we start ingesting data sources, we together decide the most suitable use cases for analytics or data-driven features. We break down the use cases to form a backlog of development and integration deliverables. Scling edits the backlog and suggests the best path forward, but as a customer, you are in control of the priorities in the backlog.

Deliverables in a customer backlog are defined such that each deliverable is small but provides tangible business value. Examples of different categories of business value:

Working incrementally is critical for data success. We therefore work closely together with you on your journey to data maturity, and only build flows that deliver value, rather than propose large projects or comprehensive self-service platforms. Different customers have different priorities and needs, many of which are unknown when the journey starts. Adapting along the way is necessary, and cutting away unnecessary complexity is essential to make the data journey yield return on investment.

The data-factory-as-a-service model

Scling builds and operates data factories for our customers, including the data flows that run in the factories. From a customer perspective, we provide a data refinement process, where raw data material is ingested at one end, and refined valuable data artifacts are emitted at the other end. Ingested data is stored in the platform, and available for long-term use for applications that benefit from larger data volumes.

Scling’s processing engine is built on open source and standard cloud components in order to avoid lock-in, in case a customer should decide to take over operations. While we believe that customers benefit most from letting us take care of operations, you can also view an engagement with us as a quick way to data flow in production with the option to eventually take over development and operations in-house.

When engaging in a partnership with Scling, you subscribe to a flow of valuable development deliverables—a value stream. You also subscribe to the service of operating and hosting the technical data pipelines that produce the corresponding refined data artifacts. Each development deliverable is a development step that makes a data flow provide more business value to you as a customer. These are examples of value adding development on a data pipeline:

When working towards a larger goal, a deliverable is the smallest incremental improvement that increases business value from a data flow. A deliverable always delivers some concrete business value. Hence, an internal technical change is not a deliverable. Likewise, changing an algorithm parameter is an effort too small to provide value, whereas conducting an experiment with the purpose to determine the appropriate parameter value is large enough to be considered a deliverable.

Many deliverables come in the shape of technical functionality improvements on automated data pipelines, but could also be other efforts, such as risk reduction, ensuring compliance, raising availability level, or one-off efforts such as analytical investigation of rare events.

Integration modes

Scling offers three different modes of integration and collaboration process, depending on clients’ long-term goals. Clients may mix these modes and use different modes for different use cases. The business model and pricing are the same for all three modes.

Business-oriented integration

For companies that have a primary focus outside IT, but nevertheless have valuable data, Scling offers business-oriented integration, primarily adapted for customer employees outside the IT domain, e.g. sales staff or domain experts. Data typically resides in third-party systems, such as customer relationship management (CRM) systems, product lifecycle management (PLM) systems, sales support systems, document stores, email servers, etc. Scling obtains client data by integrating with these systems, or by creating data ingestion mechanisms integrated in business processes, e.g. spreadsheets or email. We can also collect data from externally facing web services.

Refined data results are exposed to clients in business-oriented interfaces for consumption, requiring no technical effort on the client side, e.g. web dashboards, forecasting in spreadsheets, daily sales reports in a document store, emailed suggestions of upselling opportunities before customer meetings, etc.

Technical integration

In technical integration mode, the client company internally operates IT services that store valuable data, whereas Scling builds and operates data refinement pipelines. We collaborate to decide the most suitable technical integration for each data source, e.g. file uploads, database dumps, web sockets. Refined data products, such as search or recommendation indexes, are exposed to client services via technical interfaces, e.g. file uploads or internal microservices.

Hybrid teams

Scling also offers a collaboration model where Scling staff and client staff form joint teams, potentially co-located, that collaborate on a daily basis. Data delivery works as in the technical integration case above, but client staff uses Scling’s internal “Orion” platform and writes data processing code together with Scling. There is mutual learning on both sides; client staff learns data engineering and DataOps, while Scling staff learns client domain concepts.

Hybrid team mode is appropriate for clients that seek to eventually graduate and operate their data platform without Scling. It is also appropriate for domains that are complex enough to require a tight cooperation between domain experts and data engineers / data scientists, e.g. industrial manufacturing processes.

Pricing model

Scling uses a value-based pricing model, where remuneration is based on the development of data flows or other functionality taken to production. There is a fixed price for the steps involved in ingesting a data source, process it, and make it available in refined form. There are also fixed prices for different types of improvements to a flow, e.g. new functionality, quality monitoring or improvements, combinations with other data sources, etc.

In addition to the development value stream, customers also subscribe to the service of operating and hosting the technical data pipelines that produce the corresponding refined data artifacts. The operation’s price is proportional to the number of functionality-adding deliverables or that have been implemented. Needs and data flows change, and whenever a client no longer needs a flow or decide to use a new version of a flow, the operational subscription cost is removed for the decommissioned flow.

Client’s data is processed by technology that is naturally scalable. Hence, we can handle large data volumes, but there is increased cost in complexity, storage, compute, and operations induced by size. Data ingestion over 1GB daily is subject to additional operational cost.

Innovating to go beyond the traditional time-based business model is challenging, and we are iterating on our menu of deliverables as we learn. For a copy of the current list including item pricing, please contact us.