How do I create an analysis?

    Once other users participating in the collaboration have uploaded datasets into the engine, you're likely eager to begin your analysis! This article will walk you through how to start utilizing the data as quickly as possible. Let's get started!


    1. Navigate to the "Data" tab on the left sidebar. Identify which dataset you would like to access and click the "Eye" button on the right of the row. 


     

    2. Once you have identified the dataset that you wish to use, you copy the handle for the dataset. 

     

    3. Next you can head to the crandas design environment and begin designing a script using either a dummy data csv or a dummy DataFrame. Both options are shown below. The dummy data must be defined before using 'script.record()' as the dummy_data will stand in for the real data (and thus must have the same data schema as the real data - same column names, data types, etc.). 

    import crandas as cd

    # Option 1 - Create a Dummy Dataframe with the same columns as the original data
    dummy_data1 = cd.DataFrame({'Fruit':['apple','pear','banana','apple'],
                                'Prijs':[2,3,4,2]}, dummy_for='DE36C412502576F60293635BCC4080C498FD5E1C09E81064549C9783C9EC7C9A')

    # Option 2 - upload a csv of dummy data to use
    dummy_data2 = cd.read_csv('PATH/TO/dummy_data.csv', dummy_for='DE36C412502576F60293635BCC4080C498FD5E1C09E81064549C9783C9EC7C9A') 

    You only need to do one of these options, not both. 

     

    4. Set up 'script.record()' and any analysis that you wish to do on the data (please refer to the crandas documentation). Upon execution, the mean is shown to be 2.75 (on the dummy dataset). 

    script = cd.script.record()

    table = cd.get_table('DE36C412502576F60293635BCC4080C498FD5E1C09E81064549C9783C9EC7C9A') 

    # Add any other analyses here - the below is an example
    print(table['Prijs'].mean())

    script = cd.script.save('analysis.json')

    Ensure that you actually replace the handle with the handle of the actual dataset that you wish to access, and that the column names and data types are identical

     

    5. You can now use `script = cd.script.save('analysis.json')` to produce the resulting '.json' file which you can download. 

     

    6. You can then right-click on the 'analysis.json' file and click 'download'.

    For this example, we used the dummy DataFrame to record the script and the output shown when recording is for this dummy dataset that we have created. 

     

    7. We can then go back to the platform and create a new analysis that needs to be approved. We then upload the '.json' file that we have just downloaded. 

    8. Once the approver has approved your script, you can download the resulting '.approved' file and execute on the real data. 

     

    9. Finally, you head to the crandas authorized environment and you start by uploading both the '.approved' file and your private key (this is if you are working in a hosted jupyter environment - if working locally just reference where the key and .approved file are stored on your system).

    Then you can adjust the script to remove the dummy data, reference your analyst key and load your '.approved' file as shown below. When executing on real data we see that the mean is 3.75 rather than 2.75 - illustrating that we have in fact used the production data this time. . 

    import crandas as cd
    cd.base.session.analyst_key = '/home/jovyan/work/private_key_internal_dev.sk'

    script = cd.script.load('analysis.approved')

    table = cd.get_table('DE36C412502576F60293635BCC4080C498FD5E1C09E81064549C9783C9EC7C9A') 

    # Add any other analyses here - the below is an example
    print(table['Prijs'].mean())

    script.close()

    Hopefully this has explained how you can access a dataset using crandas!


    Tip: You can copy the handle easily by clicking the handle associated with the dataset you want to access through crandas. 



    If you want to learn more about crandas - click here to visit our documentation portal. 


    Thank you for your time to read this article. If you have feedback or if you seek more information on specific topics, leave your comments below or reach out to support@rosemanlabs.com