Understanding the script approval flow

Int his article, we explain how the design and authorized environments interact to form the approval flow.

Design environment

An analyst has the responsibility to design the script that will be executed. This can be done in the design environment using crandas. The design environment can be any analytics platform in which python packages can be installed and python scripts can be executed, such as (such as Jupyter notebook, spyder or Databricks). The design environment is connected to the engine, in which script signing is disabled. It can either run locally on the analyst's own device, or be provided by Roseman Labs and accessed via a remote connection.

design_env

Jupyter notebook running python script using crandas in design environment

In the design environment, the script can be thoroughly tested using dummy data before it is applied on production data. This environment is also used to record the script that is eventually uploaded to the platform to be approved.

When recording a script, the analyst uses the "script.record()"  functionality of crandas to transcribe the steps of which the analysis consists. These are then automatically stored into a .json file, which will be uploaded to the authorized platform. 

script_recording-1

Script recording in design environment

To test a script, the analyst needs data that is syntactically similar (same column names/same data types) to the actual source data that is processed in production. This is called "dummy data". This dummy data could be in the form of a csv or a DataFrame that you create, in combination with using 'dummy_for'

Note that data that is processed in the design environment is not encrypted. Therefore, one should only work with dummy data here and never use sensitive data in the design environment.

Authorized environment

The authorized environment can be used to actually execute scripts that were created in the design environment and have been approved. Similar to the design environment, the authorized environment is a local set up of crandas. However, in contrast to the design environment, the authorized environment has script signing/authorized mode enabled. The authorized environment is where the data providers upload their sensitive data via the platform.

In the authorized environment, a script can only be executed when it has been signed by all approvers in the platform sing their private keys. This generates a ".approved" file, which can be used to execute the script on real data in the authorized environment. Assuming a valid approval file, the script can then be executed on real data (must be executed in exactly the same order as the script was recorded). As such, only those scripts can make use of the data that was uploaded by the data providers.

production

Running an approved script on production data

Below is a video that walks you through accessing datasets using the design script, requesting approval and executing the final analysis on production data.

 

For more information on using crandas, please refer to our documentation.