I want to understand the product better

This article explains the background knowledge necessary to use the product, including information on the components that the Virtual Data Lake consists of and how they interact. Directions are given to guides on how to use each of them.

Virtual Data Lake software components

The Virtual Data Lake essentially consists of three different components: one encryption engine which is responsible for the actual encryption and processing of the data, and two client modules interacting with it. Below, all components are explained in detail.

 

Image1-Software overview-2

Figure 1: Roseman Labs software overview

MPC servers

A computation under MPC relies on a set of MPC servers. To be able to perform an analysis on their data, you first need to encrypt/secret-share this data and distribute it over these different servers. The servers also verify the approvals for scripts that are requested, and actually execute these scripts on the data they hold. You can find an elaborate explanation on the theory behind MPC and secret sharing here.

The MPC servers can either be hosted by Roseman Labs or on premise by yourself. In case Roseman Labs is responsible for the hosting environment, strict segregation of duties is adhered to within the organization and each server is hosted at a different cloud provider to ensure security of the encrypted data.

Web portal

The web portal is an online interface that runs in the browser. It can be used to assign roles and associated rights to individuals involved in the execution of an analyses. Also, it allows you to upload and delete data sources, request approvals for analysis scripts and reject/approve those scripts.

You can find an elaborate explanation of all the roles existing in the web portal here. For more information about the web portal and how to use it, please check the web portal documentation in the knowledge base.

Crandas

To write an analysis script and interact with the MPC servers, Roseman Labs developed a Python package called crandas. crandas is a Python package that uses roughly the same syntax as the popular python package pandas, but it delegates the computations to the MPC servers such that those computations are performed obliviously. More information about the similarities and differences between crandas and pandas can be found here.

Before you run the analysis on your data, you will first need to design it. To do this, Roseman Labs provides two types of environments: one for the design and one for the actual execution of an analysis. For more information, please check this page in the knowledge base.

For elaborate information about crandas, as well as how to install and use it, please inspect the crandas documentation portal.

 

Image2-Data flows and data types-1Figure 2: Data flow

 

If you found this article helpful, please let us know below!