← All entries June 22, 2026

Day 20 — Individual work

By Leona Francis

Reflection

Today was indeed a difficult step as we have now started building our machine learning workflow independently without any starter notebooks or guided code. I connected the cloud storage with my notebook and saved the file into the deliverables folder in order to secure myself from any loss of progress due to some runtime issues. Since defining file paths correctly is very important for maintaining neatness in our code, I defined the directories for images, metadata paths, and save folders from the start. When I loaded the master metadata file and ran the commands to inspect it, I spent quite some time cleaning up the missing values and deleting all the samples with ids that were not matching any file on disk. My particular demographic assignment required filtering specific merged codes for both ends of the scale, which resulted in a cleaned and verified data set after resetting the index. In order to get a statistically sound validation of my results, I did two stratified splits of the data set based on the target column to get the samples distributed among 70% training, 15% validation, and 15% testing subsets. Then I exported each subset to separate files to keep the data integrity and prevent accidental shuffling during subsequent experimental runs. Finally, I established the foundation for the data pipeline by utilizing built-in utilities to encode the string labels and configuring the standardized image target dimensions and batch sizes. Although I only completed the workflow through the data pipeline parameter setup before running out of time, establishing this rigorous data preparation phase provided essential hands-on experience with the fundamentals of computer vision research.