Detecting Brain Tumors using Artifical Intelligence
Deep Learning
Problem
Over 700,000 Americans are living with a brain tumor today and studies show that more than 84,000 people will be diagnosed with a primary brain tumor in 2021 [1]. Diagnosing a brain tumor usually begins with a Magnetic Resonance Imaging (MRI) scan. The results are then reviewed by a neurologist to see if there is a tumor in the brain.
Using artificial intelligence to detect a brain tumor from the MRI scan would save money and most importantly time. Not only that, this could also possibly reduce human error in the tumor detection. With today’s ever-growing population, it is imperative that doctor’s use technology to determine brain scan results in a timely fashion. In this article, we will build a Convolutional Neural Network (CNN) model to classify MRI brain scans.
Analysis
Dataset
For the task in hand, we chose to use the Brain MRI Images for Brain Tumor Detection dataset from Kaggle. This dataset contains 155 images of MRI scans with brain tumors and 98 without any tumor. The brain tumor scans are in the folder labelled “yes” and healthy brain images are in the “no” folder.
Below are sample scans showing a brain without tumor (left) and a brain with tumor(right).
Data Preprocessing
All images are grayscale, thus when imported will have the same value for R, G and B channels. We can use any one channel for our model. However, all images do not have the same dimensions and hence will have to be resized or padded before passing them into the neural network model. Additionally, we normalize images by scaling pixel values ranging from 0 to 255 to a range of 0 to 1.
Features and Labels
The input data images will be used to train the model to classify MRI scan images as having a tumor or not. The images in this case will be the features for the algorithms and a label, either 0 or 1 will be used to classify those with no tumor and tumor respectively.
We defined a preprocess_data function that takes in the path where the images are stored and a desired shape to resize the images to. After reading, resizing and scaling the images, it stores them in an array X (features) and their corresponding labels in an array y (labels).
def preprocess_data(path, img_size):'''Reads in images classified into folders, resizes and scales them. Returnsthose processed images as features and their associated labels as well.Arguments:path (str) - path to classified image foldersimg_size (tuple) - tuple containing resized image height and widthReturns:X (array) - features (brain scan images)y (array) - feature labels (0 - no tumor, 1 - tumor)'''unsuccessful_files = {}X = []y = []for folder_name in os.listdir(path):if folder_name == 'no':label = 0else:label = 1folder_path = os.path.join(path, folder_name)for fname in os.listdir(folder_path):fpath = os.path.join(folder_path, fname)try:img = cv2.imread(fpath)img = cv2.resize(img, img_size)img = img / 255.0X.append(img)y.append(label)except Exception as e:unsuccessful_files[fname] = eif unsuccessful_files:print(f'Error processing the following files:\n')for index, key in enumerate(unsuccessful_files, 1):print(f'{index}. {key} - {unsuccessful_files[key]}')else:print('Successfully processed all images.')X = np.array(X)y = np.array(y)return X, y
Algorithms
The objective of this project is to build a Convolutional Neural Network (CNN) model that accurately classifies MRI brain scans as having a tumor or not. As a benchmark, we will compare our neural network model performance to a Support Vector Machine image classifier. The contributor to this dataset on Kaggle does not specify the data source and hence we will not be able to compare our model performance to a published benchmark.
Train Test Split
We use sklearn’s train test split to split the data into training (75%), validation (12.5%) and test (12.5%) sets.
# split data into train, validation and test setsfrom sklearn.model_selection import train_test_split
X_train, X_test_val, y_train, y_test_val = train_test_split(X, y, test_size=0.25, random_state=42)X_test, X_val, y_test, y_val = train_test_split(X_test_val, y_test_val, test_size=0.5, random_state=42)
Visualizing the first 9 images in our training set along with their labels:
Implementation
Convolutional Neural Network
- Build the Model
We build a sequential model consisting of three convolution blocks (16, 32 and 64 units resp.) with a max pool layer in each of them. These layers act as the feature extractors. The output of these layers after passing into a dropout layer is flattened and then passed through a fully connected layer with 128 units that is activated by a relu activation function. Finally, we have a dense layer with sigmoid activation that outputs the probability of detecting a tumor.
# create the modelmodel = Sequential([layers.Input((img_height, img_width, 3)),layers.Conv2D(16, 3, padding='same', activation='relu'),layers.MaxPooling2D(),layers.Conv2D(32, 3, padding='same', activation='relu'),layers.MaxPooling2D(),layers.Conv2D(64, 3, padding='same', activation='relu'),layers.MaxPooling2D(),layers.Dropout(0.2),layers.Flatten(),layers.Dense(128, activation='relu'),layers.Dense(1, activation='sigmoid')])
- Compile the Model
We then compile the model using adam optimizer, binary cross entropy loss and accuracy as metric.
# compile the modelmodel.compile(optimizer='adam',loss=tf.keras.losses.BinaryCrossentropy(),metrics=['accuracy'])
- Train the Model
We use batch size of 32 and 20 epochs to train the model. We add an early stopping callback to prevent overfitting the data. Additionally, the dropout layer in our sequential model should help with preventing overfitting as well.
# train the modelearly_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=4)history = model.fit(X_train,y_train,batch_size=32,validation_data=(X_val, y_val),epochs=20,callbacks=[early_stop])
- Test the Model
We test the model by making predictions on our test set. Note that the model outputs a probability from 0 to 1, hence we round this value to obtain a classified label for each of the predictions.
# make predictions on the test sety_pred = model.predict(X_test)y_pred = np.squeeze(y_pred).round().astype(int)# classification reportfrom sklearn.metrics import classification_report , confusion_matrixprint(classification_report(y_test, y_pred))
Looking at the classification report, our model gave us 75% accuracy and 75% weighted average recall. The performance is not the greatest, mainly due to the fact that we do not have enough samples in our dataset. Our dataset has 253 images in total.
- Data Augmentation
To improve our model performance, we use data augmentation to increase sample size. We start by defining a data augmentation layer which adds random flip, rotation and zoom to an image.
# define augmentation layerdata_augmentation = tf.keras.Sequential([layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"),layers.experimental.preprocessing.RandomRotation(0.2),layers.experimental.preprocessing.RandomZoom(0.1)])
We then define an augment_image function that taken in an input image and returns a list of a specified number of augmented images.
def augment_image(image, n_augmented_images):'''Returns a list of augmented images for the given input imageArguments:image (array) - input imagenumber_of_images (int) - number of augmented images to returnReturns:images (list) - list of augmented images'''image = tf.expand_dims(image, 0)images = []for i in range(n_augmented_images):augmented_image = data_augmentation(image)images.append(np.array(augmented_image[0]))return images
We generate 12 augmented images for each image in our dataset, resulting in a total number of 3289 image samples. That’s a lot more than we had in our original dataset. Visualizing the first 9 images in our training set, we can see a blend of the original and augmented images.
After training and testing the model using the augmented dataset, we get an accuracy of 86% and weighted average recall of 86%. That’s a 15% increase in model performance.
- Hyperparameter Tuning
We try to improve upon the base CNN model by evaluating a range of values in the hyperparameter space using cross validation. For this, we use Keras hyperband tuner and obtain the following best hyperparameters:
After building, training and testing the model using the best hyperparameters, we obtained 91% accuracy and 91% recall. That’s ~6% improvement compared to our previous model.
Support Vector Classifier
We will compare our model performance to a Support Vector Machine (SVM) classifier as benchmark. We use the augmented dataset over here. The shape of the features need to be tweaked a little for us to use SVM. The images of size (128, 128, 3) need to be flattened before fitting the SVM.
Once that is done, we create a SVM classifier and train it.
from sklearn import svm# Create a classifier: a support vector classifierclf = svm.SVC(gamma=0.001)# Learn the digits on the train subsetclf.fit(X_train, y_train)
Testing the SVM classifies on our test resulted in an accuracy and weighted average recall of 79%.
Results
To summarize our work, we started with the original dataset and trained a CNN model to classify brain scans with and without tumors. We quickly realised that the model performance wasn’t the greatest. This was attributed to the small sample size of the original dataset. We then used image augmentation to generate augmented images for each image in the original dataset resulting in 3289 samples as compared to the 253 images we had in our original dataset.
We then optimized our model performance further using hyperparameter tuning to find the best hyperparameters.
Finally, we built and trained a SVM classifier as a benchmark. Since we did not use grid search to tune the SVM classifier’s hyperparameters, we will compare its performance to our CNN model prior to hyperparameter tuning. Note that data augmentation was used in both cases.
Although we did get a good accuracy and recall of 91% with our tuned CNN model, there’s room for improvement. Here are a few suggestions to improve performance:
- Crop augmented images to remove artifacts we don’t want. We can use OpenCV to find the largest contour and crop the rest.
- Use a better dataset with more brain scan images so that we don’t have to rely heavily on data augmentation.
- We only ran a search to find the optimal number of units in each layer. Obtain optimal number of epochs, batch size and number of convolution layers as well using hyperparameter tuning.
All in all, this was a fun project and I hope you enjoyed walking through the workflow with me. If you have any suggestions on other ways to improve model performance do let me know. The code for this work can be found here.
References
- Porter KR, McCarthy BJ, Freels S,Kim Y, Davis FG. Prevalence estimates for primary brain tumors in the United States by age, gender, behavior, and histology. Neuro-Oncology 12(6):520–527, 2010.
There’s a ‘can’ in Cancer because we CAN beat it!