/
MS1: How should I manage data?

MS1: How should I manage data?

Includes data protection, versioning, labelling

Short version

Dealing with data check and when possible ensure that FAIR principles are supported and in place (FAIRsharing | Home).

There are two stages when a model user typically is dealing with data: model building stage and model usage stage.

 

Model building stage:

  1. Provide your domain area expertise and coordinate efforts with data scientists, architects and other stakeholders.

  2. Ensure your inclusion into decision making processes considering data and model usage.

  3. Prepare data for the model training by providing data curation and labelling when needed.

 

Model usage stage:

  1. Before the model usage, make sure that you know what data type and format the model expects and how to preprocess the data.

  2. The good practice of machine learning model usage includes storing data preprocessing details, data and model version details, performance results (e.g. accuracy and run-time). It helps to ensure the replicability of the results and to compare different model performances.

  3. As a model user, ensure that your data is protected. For example, if the model monitoring process/software is in place, check what information the monitoring software is logging and sharing.

  4. Human in the loop (HITL)?

Related content

A2: How should I manage the data?
A2: How should I manage the data?
More like this
MS3: How do I make sure the model produces sensible answers?
MS3: How do I make sure the model produces sensible answers?
More like this
DS7: Where do I store predictions?
DS7: Where do I store predictions?
More like this
DS6: How should I acquire and manage data?
DS6: How should I acquire and manage data?
More like this
Boston 2019 Summary and Materials
Boston 2019 Summary and Materials
More like this
Points Made by the Pistoia Alliance Panel in the Discussion on April 5th 2023
Points Made by the Pistoia Alliance Panel in the Discussion on April 5th 2023
More like this