We need four roles to be fulfilled:
These roles can be combined into one or more persons.
In addition, we need to be able to access your data and we need to have direct contact and work on a regular basis with both the data owners and with the stakeholders, i.e. those that benefit from the data processing.
No, you own your data. We only process it on your behalf and for your benefit. Our relation in this aspect is similar to the relation between a cloud provider and cloud customers.
With a major cloud provider, within the EU.
Scling pays cloud providers and other suppliers that we use. It is included in the pricing for operations. The pricing includes the cost of storing a single copy of ingested data for 10 years.
Most pipelines do not require a strict SLA, and customers should not pay for it. So by default, the SLA is “best effort” with email support on working hours. For pipelines with higher requirements, raising the SLA level one step is a deliverable, with a small increase in operational cost. SLA level beyond “best effort” requires a customer engagement of at least 6 deliverables per month.
No, we work together to break down complex features into small deliverables - as small as they can get while still providing some value to you as a customer. The agile workshop “Elephant Carpaccio” is a good exercise for learning to break down complex features into small deliverables. For example, if you want a recommendation API, it might be split into a handful of deliverables, e.g:
As you can see, each deliverable is small, and that is important. In order to build valuable data products, each step should be evaluated in order to determine the next step. In many cases, well-tuned simple solutions work as well as complex algorithms. We should only use new shiny things that are expensive to build and operate where they really matter, and always benchmark them to simple alternatives, or combine both.
Our value proposition is that we are proficient with data engineering, have built data platforms many times, for many years, have the appropriate tooling, and can use our knowledge and machinery to be more efficient and take your data features to production quicker. Some deliverables will seem easy for us, since we apply our tools and patterns that are well known to us, but might take others more time to figure out. In such cases, Scling profits. Other deliverables will require more work, but would have been riskier and taken a long time for companies with less experience and without adequate tooling. Building recommendation systems or respecting the right to be forgotten in a data lake are such examples. In those cases, you profit. Over time, we share the profit from our partnership.
Domain expertise is crucial for success. For some customers, such as media or retail, the domain is comprehensible by laymen. In those cases, knowledge transfer through meetings and documents is sufficient. In other cases, e.g. manufacturing, learning the domain takes time, and customers may have valuable algorithms to contribute. In such cases, subject-matter experts from the customers embed with us, and we develop the solutions together. It requires customer to spend work time, but that time spent is also an intensive course in practical data engineering for customer staff, so the benefit is mutual.
Your data is not shared. We share reusable code among our customers. That is one of the benefits for our customers - shared development and maintenance costs. The shared code is typically technical or generic, and not specific to your applications. For common domains, such as web and retail, we share reusable domain-specific code and definitions between customers. We do not share corporate secrets, and if you want a particular innovation not to be shared, we can comply.
Yes, we can handle the technical arrangements. If your data is covered by the GDPR, you cannot sell it, only lease it out. In that case, we can arrange for user requests for deletion or withdrawn consent to be passed on to the leasee.
We have more than a decade of experience with secure cloud environments, and we apply cloud security best practices, e.g. hardware-based multifactor authentication for personal credentials and asset management with infrastructure as code. We use standard practices for developing applications based on open source software, i.e. take security precautions that do not significantly hinder the development process or add excessive complexity. All security has a cost, and for some types of security hardening, there is a tradeoff. The right level depends on the sensitivity of data, and should be chosen by each customer. For example, we do not want customers that ingest publicly available data to pay the cost of strict manual security procedures. For other customers, manual change validation, strict open source dependency lockdown, additional protection layers, and external penetration testing might be justified.
We are happy to be transparent with our processes, as well as apply stricter security procedures when needed. Security hardening would be one form of development deliverable, and we can provide a suitable backlog of hardening deliverables based on threat modelling.
We handle ingested data in compliance with GDPR regulations, including minimising access, applying anonymisation and pseudonymisation where possible, limiting data retention, respecting consent, providing user data extracts, and respecting the right to be forgotten. Adding technical compliance solutions is one form of development deliverables.
As a customer, you have the relation to end users, and are therefore the data controller, and must implement additional procedures in order to be compliant, e.g. receive deletion requests, and pass them to us. In GDPR terminology, we are a data processor. We have implemented technical GDPR compliance solutions for multiple companies, and are happy to work together to ensure that your data handling overall is GDPR-compliant. We do not provide legal advice, however.
We can run in an environment that provides a Kubernetes cluster, scalable storage, a relational database, and sufficiently secure access control. The pricing will be different than the fully hosted solution, however, and will depend on whether you supply infrastructure, and what procedures are required. In case you want us to run in a particular location where there are no suitable cloud providers, but take care of the infrastructure for you, we will team up with suitable partners to operate the underlying infrastructure that we need.
If you decide to leave, you can take over the operations of developed pipelines. You get a copy of the data processing code, as well as any libraries and operational configuration necessary to run the pipelines. The platform is built on open source technology and cloud services available on any of the major clouds, in order not to lock our customers to proprietary technology. In order to put our money where our mouth is, we can offer to port data flows at a predetermined fixed price per flow once you have established an adequate destination environment.
If you leave, you do not get access to our internal operational tools or monitoring tools that are not required to execute the pipelines. We have operation automation tools that generate code and configuration, e.g. for Kubernetes. When you leave, you will get the generated files, which you then maintain and change when you want to change your pipelines.
We avoid using commercial providers that would make it difficult for our customers to leave our service. We do use a major cloud provider, but restrict ourselves to using cloud services that are available with all the three major cloud providers. All services that we rely on also have open source equivalents for customers that wish to move to an on-premise installation. We use no other commercial services for production-cricital components.
You need to use a major cloud service, or install the following components in an on-premise datacenter:
For development and operations, you will also need:
There are no specific requirements on these systems, so you can connect your data platform to the corresponding systems that you are already operating. If you are not operating such systems, any IT consultancy firm can do it for you.
For the reasons explained above, we will not take dependencies on commercial vendors, except for the major cloud providers.