MS1: How should I manage data?

Includes data protection, versioning, labelling

Short version

Dealing with data check and when possible ensure that FAIR principles are supported and in place (FAIRsharing | Home).

There are two stages when a model user typically is dealing with data: model building stage and model usage stage.

 

Model building stage:

  1. Provide your domain area expertise and coordinate efforts with data scientists, architects and other stakeholders.

  2. Ensure your inclusion into decision making processes considering data and model usage.

  3. Prepare data for the model training by providing data curation and labelling when needed.

 

Model usage stage:

  1. Before the model usage, make sure that you know what data type and format the model expects and how to preprocess the data.

  2. The good practice of machine learning model usage includes storing data preprocessing details, data and model version details, performance results (e.g. accuracy and run-time). It helps to ensure the replicability of the results and to compare different model performances.

  3. As a model user, ensure that your data is protected. For example, if the model monitoring process/software is in place, check what information the monitoring software is logging and sharing.

  4. Human in the loop (HITL)?