AI solution in Screening Mammography Breast Cancer Detection
International organization in Medical domain
2 Data scientists with expertise in Machine learning
Python, PostgreSQL, PyTorch, Joblib, Discomsdl, AWS
About the project
Our task was to detect breast cancer based on screening mammograms obtained during regular screening.
The main task for ML was binary classification with highly imbalanced classes. The training set had 53,000+ negative class positions and 1158 positive class positions (with cancer)
According to the World Health Organization, breast cancer is the most common cancer worldwide, with 2.3 million new diagnoses and 685,000 deaths in 2020 alone. However, breast cancer mortality in high-income countries has decreased by 40% since the 1980s due to regular mammography screening. Early detection and treatment are crucial in reducing cancer fatalities, and machine learning skills can help streamline the process of evaluating screening mammograms used by radiologists.
For Machine learning the task was challenging from the start due to the low number of positive class samples.
Despite the encountered difficulties we were able to get reasonably good results after implementing a good training pipeline that included positive class balance, scaling, model selection, and post-processing.
The final solution was based on the voting strategy, and then the average score based on votes. The four steps of the solution were straightforward, including processing the DICOM files into PNG, inferring the three posterior models from TTA, averaging the ensemble probabilities or voting, and thresholding.
We've sucussfully implemented a training pipeline, with
processing the DICOM files into PNG
inferring the three posterior models from TTA
averaging the ensemble probabilities or voting, and thresholding