Introduction
This FAQ provides clear and concise answers to the most common questions about MPC and Roseman Labs. Use this as your go-to resource. For any further inquiries, don’t hesitate to contact your Roseman Labs representative or email us at support@rosemanlabs.com.
Explore the answers to the following questions below:
- What is Roseman Labs?
- What is Multi-Party Computation?
- What type of computations can be done on data in Roseman Labs?
- How can undesired queries be prevented?
- How can you avoid statistical disclosure?
- Who sees the results of an analysis?
- Where does the data I upload reside?
- Are there cryptographic keys involved and who holds these?
- Who controls the data?
- How secure is Roseman Labs?
- How can input data quality be ensured if it is not visible?
- Can I train machine learning models in Roseman Labs?
- What are the optimal browser requirements and screen sizes for using Roseman Labs?
- How can I explore the data with Roseman Labs?
- Who should approve my analysis?
- How do I analyze the data?
- What is the difference between analyzing anonymous and encrypted data?
What is Roseman Labs?
Roseman Labs is a Deep-Tech company from the Netherlands, founded in 2020, with the mission to protect privacy and enable collaboration on sensitive data to address global challenges.
Our software uses patented multi-party computation to enable organizations to encrypt, link, and analyze multiple data sets quickly and securely, ensuring the privacy and commercial sensitivity of the data without exposing the underlying information.
Our technology’s performance has earned us awards like the CES Innovation Award in Cybersecurity & Privacy and the 2024 Dutch Privacy Award. We also hold ISO 27001 and NEN7510 certifications, ensuring top-tier information security.
For more information on some of the challenges we solve, see here.
What is Multi-Party Computation?
Multi-Party Computation (MPC) is a privacy enhancing technology that enables multiple parties to analyze data together without revealing the actual data to each other. It allows for calculations like averages and comparisons across different datasets while keeping the individual records in the data private.
At its core, MPC encrypts and divides data into "secret shares" at the source. These shares, which reveal nothing on their own, are then distributed across three servers that work together as an engine. The servers perform calculations on the secret shares, and only the final result is decrypted and shared with the analyst - assuming the analysis has been approved.
This ensures that sensitive data remains protected throughout the analysis process, providing a secure way to gain insights from multiple data sources without compromising privacy.
What type of computations can be done on data in Roseman Labs?
Many common types of data analysis computations can be performed in the Roseman Labs engine, including more complex calculations like logistic and linear regression, as well as K-nearest neighbors. Our engine translates any Python code you write so that it can be executed without revealing the underlying data, thanks to the expertise of Roseman Labs’ cryptography team.
For the full functionality of our Python package crandas, please refer to our documentation.
How can undesired queries be prevented?
Undesired queries can be prevented through a robust script approval process. Upon getting access to an environment, a set of approvers is invited and confirmed before it can be used. These approvers review and approve any analyses submitted by analysts before their execution on sensitive data. They carefully examine the Python scripts, to validate that it only discloses information that is in line with the agreements that were made as part of the collaboration. Besides, they verify if risk metrics have been implemented properly to prevent undesired disclosure of individual records. This ensures the integrity and confidentiality of the data.
How can you avoid statistical disclosure?
We offer a guide to assist you in assessing and limiting unwanted disclosure of sensitive information from the source data. In this way, you are shown how to recognize parts of the code that could be potentially risky. Where disclosure risks cannot be validated by inspecting the code, several metrics have been implemented in crandas to prevent disclosure risks at runtime. A script may only be executed after inspection and explicit approval to do so by the approvers.
Who sees the results of an analysis?
Analysts execute the queries on sensitive data following approval within the environment. Once the queries are approved and executed, the output can then be shared with relevant parties. While the analysts run the queries, the results are accessible to others as specified, but only after the review and approval process, ensuring both security and compliance.
Where does the data I upload reside?
The unencrypted data always remains in your own system and is never uploaded in raw form to Roseman Labs. Before any interaction with our platform, the data is encrypted directly on the client side, meaning the encryption process happens entirely on your local device. This ensures that sensitive information is fully protected before it is sent to our privacy engine.
Once encrypted, the data is divided into "secret shares," which are then distributed across multiple nodes. These secret shares, held by the privacy engine, are cryptographically secured and cannot reveal any information about the underlying data on their own. As a result, neither Roseman Labs nor your collaboration partners can view your data—full ownership and control remain with you at all times.
Are there cryptographic keys involved and who holds these?
Yes, keys are involved during script signing, and for communication between the engine and the python client. You download the keys through the Roseman Labs platform, and these keys are used to authenticate actions as a user. The keys serve different purposes depending on the role. For the approver, the key is used to approve the analysis that will be executed on sensitive data. For the analyst, the key is used to verify that the analyst requesting approval is the one who actually executes the analysis.
Who controls the data?
The data owner retains full control over the data. The owner needs to approve computations that are performed on the data, or can retrospectively review them. And, if the owner no longer wants to participate in the partnership it simply withdraws it’s approval for any query. Because the data is not copied to any other place, the data cannot be misused or retained after the data owner withdrew his / her approval.
How secure is Roseman Labs?
The engine operates on three servers in different clouds (OVHcloud, Fuga, and Scaleway), each controlled by different cloud admins. Due to the Multi-Party Computation protocol used, a majority of these parties would need to collude to reveal the underlying sensitive data. It is mathematically proven that no data can be revealed as long as the parties operating the servers do not collude. Additionally, all data is encrypted on the client side before it interacts with our platform, adding an extra layer of security from the very beginning. Security is further enhanced through our script approval process, ensuring that only approved analyses are executed on the sensitive data.
How can input data quality be ensured if it is not visible?
As part of our data request functionality, Roseman Labs includes robust data validation features. When a data request is created, the user can define validation rules such as specific column names, data types, and constraints like minimum and maximum values, required strings or the option to leave a column empty. These validation checks are performed client-side, ensuring that the data meets the specified schema and rules before it is uploaded to our platform. This means the participant’s data is thoroughly validated on their device, maintaining privacy and ensuring quality without exposing the data.
Can I train machine learning models in Roseman Labs?
Yes, machine learning models can be trained in Roseman Labs. We support various Machine Learning functions, including binomial logistic regression, multinomial logistic regression, ordinal logistic regression, linear regression, and k-nearest neighbors. Our platform ensures secure and efficient model training on sensitive data, with comprehensive documentation available to guide you through the process.
What are the optimal browser requirements and screen sizes for using Roseman Labs?
Currently, Roseman Labs is optimized for desktop use. Our platform is compatible with all common browsers, including the latest versions of Safari, Chrome, Brave, Edge, and Firefox.
How can I explore the data with Roseman Labs?
Given that direct access to the sensitive data is restricted, you can explore the data by first working with dummy data that mirrors the structure of the actual dataset, including the same data types and columns. While this approach won't allow you to analyze real distributions or assess data completeness directly, it enables you to develop and validate your scripts in a secure environment. Once your scripts are thoroughly tested, they can be submitted for approval within the platform. Upon approval, the scripts can be performed on the sensitive data.
Who should approve my analysis?
Approvers should be selected based on their comfort and proficiency in reading Python code, as they approve analyses based on the description provided by the analyst and the submitted Python script. Additionally, they should be trustworthy and knowledgeable about the envisioned analysis results, as outlined in the collaboration agreement.
How do I analyze the data?
We have a Python package called crandas, which combines the functionalities of pandas and scikit-learn, allowing you to easily analyze data using multi-party computation. Additionally, we offer custom support to assist you with scripting and analysis. Our team is available to help you write and implement your analysis, ensuring it meets the necessary standards and complies with all security protocols.
What is the difference between analyzing anonymous and encrypted data?
When analyzing anonymous data, the information is typically stripped of identifiers, but reidentification is often still possible, especially with advanced techniques or in cases where data is not sufficiently aggregated. On the other hand, data in Roseman Labs is not anonymous by default but is protected using Multi-Party Computation. This involves encrypting the data into secret shares, which provides a much stronger form of protection compared to traditional anonymization. With encrypted data in MPC, you can securely link data across different datasets, a capability that is usually not possible with fully anonymous data.
Moreover, while our script approval process is robust and designed to prevent statistical disclosure, ensuring that the output data is truly anonymous depends on how the analyst applies aggregation thresholds and the checks made by the approver. To support this process, we offer a Python guide for approvers to help identify any potential disclosure risks in the analysis scripts.