Multi-label dataset train/test split
Context
My girlfriend’s master’s thesis was the Perception of Anthropomorphic Traits in
Cars
. She wanted to create a questionaire to test this hypothesis.
She had acquired a dataset of images of cars, and after selecting the images
that could be used in the questionaire, she created a spreadsheet with the file name
and the features for each car, such as the size of the grille, the shape of the
headlights, etc.
She needed to select 10 images for the questionaire, and those images had to
be representative of the different classes of the various labels, e.g. Bumper
Shape: upturned lower edge-straight upper edge
or Headlights Position: only
upper
.
Approach
To tackle this, I used scikit-multilearn to split the stimuli into train and test sets where all labels were represented, and used one of them for the questionaire.
You can find the notebook with the code here.