Understanding the script approval flow

In this article, we explain how the design and authorized environments interact to form the approval flow.

Design environment

An analyst has the responsibility to design the script that will be executed. This can be done in crandas, by means of a design environment. Crandas can run in any analytics platform in which python packages can be installed and python scripts can be executed, such as (such as Jupyter notebook, spyder or Databricks). It generally runs locally on the analyst's own device. This Python analytics platform is connected to the design environment: a pair of platform and engine in which script signing is disabled.

design_env

Jupyter notebook running python script using crandas in design environment

In the design environment, the script can be thoroughly tested using dummy data before it is applied on production data. This environment is also used to record the script that is eventually uploaded to the authorized platform to be approved.

When recording a script, the analyst uses the "script.record()"  functionality of crandas to transcribe the steps of which the analysis consists. These are then automatically stored into a .json file, which will be uploaded to the authorized platform. 

script_recording-1

Script recording in design environment

To test a script, the analyst needs data that is syntactically similar (same column names/same data types) to the actual source data that is processed in production. This is called "dummy data". This dummy data could be in the form of a csv or a DataFrame that you create, in combination with using 'dummy_for'

Note that data that is processed in the design environment is not encrypted. Therefore, one should only work with dummy data here and never use sensitive data in the design environment.

deployment_overview

Authorized environment

The authorized environment can be used to actually execute scripts that were created in the design environment and have been approved. Similar to the design environment, the authorized environment is connected to a local set up of crandas. However, in contrast to the design environment, the authorized environment has script signing/authorized mode enabled. The authorized environment is where the data providers upload their sensitive data via the platform.

In the authorized environment, a script can only be executed when it has been signed by all approvers in the platform sing their private keys. This generates an "APPROVED" file, which can be used to execute the script on real data in the authorized environment. Assuming a valid approval file, the script can then be executed on real data (must be executed in exactly the same order as the script was recorded). As such, only those scripts can make use of the data that was uploaded by the data providers.

production

Running an approved script on production data

Below is a video that walks you through accessing datasets using the design script, requesting approval and executing the final analysis on production data.

 

For more information on using crandas, please refer to our documentation.