Questions and answers.
Do you sell our data or use for your own purposes?
No, you own your data, and we only process it on your behalf, and for your benefit. Our relation in this aspect is similar to the relation between a cloud provider and cloud customers.
Where is my data stored?
With a major cloud provider, within the EU.
Who pays the cloud bill?
Scling pays cloud providers, and other suppliers that we use. It is included in the pricing for operations. The pricing includes the cost for storing a single copy of ingested data for 10 years.
What is your SLA or availability?
Most pipelines do not require a strict SLA, and customers should not pay for it. So by default, the SLA is best effort with email support on working hours. For pipelines with higher requirements, raising the SLA level one step is a deliverable, with the standard increase in operational cost. SLA levels beyond best effort requires a customer engagement of at least 6 deliverables per month.
I’d like a complex feature. Is that one deliverable?
No, we sit together and break down complex features into small deliverables - as small as they can get while still providing some value to you as a customer. The agile workshop “Elephant Carpaccio” is a good exercise for learning to break down complex features into small deliverables. For example, if you want a recommendation API, it might be split into a handful of deliverables, e.g:
Minimum viable product: Emit daily file with recommendations of most popular items based on sales source. This provides some minimal business value, since it can be compared with real sales for evaluation.
Combine sales with demographic data source, recommend popular based on country.
Serve recommendations in an unauthenticated API.
Add basic API authentication.
Recommend popular items based on age and country.
Add user history, avoid recommending previously bought items.
Make recommendations individual, with basic collaborative filtering.
As you can see, each deliverable is small, and that is important. In order to build valuable data products, each step should be evaluated in order to determine the next step. In many cases, well tuned simple solutions work as well as complex algorithms. We should only use new shiny things that are expensive to build and operate where they really matter, and always benchmark them to simple alternatives, or combine both.
Some deliverables are so easy for you - why do you charge the same amount for all?
Our value proposition is that we are proficient with data engineering, have built data platforms many times, for many years, have the appropriate tooling, and can use our knowledge and machinery to be more efficient and take your data features to production quicker. Some deliverables will seem easy for us, since we apply our tools and patterns that are well known to us, but might take others more time to figure out. In such cases, Scling profits. Other deliverables will require more work, but would have been riskful and taken a long time for companies with less experience and without adequate tooling. Respecting the right to be forgotten in a data lake is one such example. In those cases, you profit. Over time, we share the profit from our partnership.
How can you process our data without our expertise?
Domain expertise is crucial for success. For some customers, such as media or retail, the domain is comprehensible by laymen. In those cases, knowledge transfer through meetings and documents is sufficient. In other cases, e.g. manufacturing, learning the domain takes time, and customers may have valuable algorithms to contribute. In such cases, subject matter experts from the customers embed with us, and we develop the solutions together. It requires customer to spend work time, but that time spent is also an intensive course in practical data engineering for customer staff, so the benefit is mutual.
Will code or data be shared with your other customers?
Your data is not shared. We share reusable code among our customers. That is one of the benefits for our customers - shared development and maintenance costs. The shared code is typically technical or generic, and not specific to your applications. For common domains, such as web and retail, we share reusable domain-specific code and definitions between customers. We do not share corporate secrets, and if you want a particular innovation not to be shared, we can comply.
I’d like to sell my data, can you help me?
Yes, we can handle the technical arrangements. If your data is covered by the GDPR, you cannot sell it, only lease it out. In that case, we can arrange for user requests for deletion or withdrawn consent to be passed on to the leasee.
What happens if you fumble and delete my data?
We of course take precautions and build our systems to make this unlikely. But we do not by default keep redundant copies of data, since customers that do not need high data durability should not pay for it. Therefore, by default, we assume that in case of emergency, you will be able to assist us with data copies in case of an operational data loss. In case that should be needed, your deliverables to us will be counted as credits for extra deliverables. :-)
For customers that desire storage redundancy for data, we set up emergency backups to a diverse storage, e.g. a different cloud provider, and make regular restoration tests and disaster recovery drills.
How do I leave Scling? Can I take over data pipeline operations?
If you decide to leave, you can take over the operations of developed pipelines. You get a copy of the data processing code, as well as any libraries and operational configuration necessary to run the pipelines. The platform is built on open source technology and cloud services available on any of the major clouds. You can run the pipelines in any environment that provides a Kubernetes cluster, a relational database, and a scalable storage service, such as a cloud object store or a Hadoop cluster. For stream processing, a Kafka cluster is also required.
You do not get access to our internal operational tools or monitoring tools that are not required to execute the pipelines. Hence, you will need to manually edit the pipeline Kubernetes configurations when you want to modify the pipelines. You will also need to connect operational metrics to a monitoring system, such as Prometheus.
Is my data secure?
We have more than a decade of experience with secure cloud environments, and we apply standard cloud security best practices, e.g. hardware-based multi-factor authentication for personal credentials and asset management with infrastructure as code. We use standard practices for developing applications based on open source software, i.e. take security precautions that do not significantly hinder development process or add excessive complexity. All security has a cost, and for some types of security hardening, there is a tradeoff. The right level depends on the sensitivity of data, and should be chosen by each customer. For example, we do not want customers that ingest publically available data to pay the cost of strict manual security procedures. For other customers, manual change validation, strict open source dependency lockdown, additional protection layers, and external penetration testing might be justified.
We are happy to be transparent with our processes, as well as apply stricter security procedures when needed. Security hardening would be one form of development deliverable, and we can provide a suitable backlog of hardening deliverables based on threat modelling.
Is the data processing compliant with GDPR?
We handle ingested data in compliance with GDPR regulations, including minimising access, applying anonymisation and pseudonymisation where possible, limiting data retention, respecting consent, providing user data extracts, and respecting the right to be forgotten. Adding technical compliance solutions is one form of development deliverables.
As a customer, you have the relation to end users, and are therefore the data controller, and must implement additional procedures in order to be compliant, e.g. receive deletion requests, and pass them to us. In GDPR terminology, we are a data processor.
Can you run your data platform in my data center? Or in my home country?
We can run in an environment that provides a Kubernetes cluster, scalable storage, a relational database, and sufficiently secure access control. The pricing will be different than the fully hosted solution, however, and will depend on whether you supply infrastructure, and what procedures are required. In case you want us to run in a particular location where there are no suitable cloud providers, but take care of the infrastructure, we will team up with suitable partners to operate the underlying infrastructure that we need.