The Bird Watch App is a Deep Learning Computer Vision system, that uses a Convolutional Neural Network (a type of Deep Learning model which works exceptionally well with computer vision tasks). BirdWatch was developed using Keras and TensorFlow, with Flask for the web application.
The Bird Watch model is currently built on top of the InceptionV3 deep learning model. InceptionV3 is an evolution of the GoogLeNet, which won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2015, with a top-5 error rate of just 3.5% (See more on InceptionV3 here: https://arxiv.org/abs/1512.00567). With InceptionV3 as base a convolutional model was built and trained using transfer learning and fine-tuning techniques.
The system currently has a top-1 error rate of about 15% while a top-5 error rate of about 8% (85% and 92% accuracy respectively). Our ongoing work hopes to increase these accuracy values.
One of the most challenging aspects of the Bird Watch project is to gather proper training data for the system. Our dataset is currently around 20000 – and growing – labeled images of various bird species. The training images, and the input images, are processed in 398×398 pixel resolution, which was selected after experimentation to give the current best accuracy. The model was trained and then fine-tuned over many iterations with model checkpointing and early stopping in order to get the best performing training configuration. Data augmentation was also used on the training dataset as a means to reduce overfitting of the model.
The majority of the model training was done on a single machine with an Intel Core i7 8700 and a Nvidia RTX 2070, on Windows 10 and Python 3.7 64-Bit.
The main code gist for the model training is shown below:
The full code of the project – including the code for training, inference and the Flask webapp – can be accessed from the GitHub repository of the project