Bird Watch: How it Works

Bird Watch is a Deep Learning Computer Vision application, developed using Keras and TensorFlow, with Flask for the web application.

What is deep learning
What is deep learning?
What is Deep Learning?

Deep Learning is a subset of Machine Learning which focuses on an area of algorithms inspired by our understanding of how the brain works to obtain knowledge.

Intelligent Machines: The idea that machines can one day be as intelligent as humans.

Artificial Intelligence: The formal research field for which the goal is to build intelligent machines.

Machine Learning: A subset of Artificial Intelligence which aims at providing machines the ability to learn without explicit programming.

Deep Learning: A subset of Machine Learning where Hierarchical Feature Learning (which was inspired by how our brains work) to obtain knowledge.

Read more about Deep Learning and its capabilities here: Codes of Interest: What is Deep Learning?


The history of deep learning
The history of deep learning. (Image source: Codes of Interest: What is Deep Learning?)

Bird Watch is a Deep Learning Computer Vision system, that uses a Convolutional Neural Network (a type of deep learning model which works exceptionally well with vision tasks).

The Bird Watch model is currently built on top of the InceptionV3 deep learning model. InceptionV3 is an evolution of the GoogLeNet, which won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2015, with a top-5 error rate of just 3.5% (See more on InceptionV3 here: https://arxiv.org/abs/1512.00567). With InceptionV3 as base a convolutional model was built and trained using transfer learning and fine-tuning techniques.

The system currently has a top-1 error rate of about 15% while a top-5 error rate of about 8% (85% and 92% accuracy respectively). Our ongoing work hopes to increase these accuracy values.

One of the most challenging aspects of the Bird Watch project is to gather proper training data for the system. Our dataset is currently around 20000 - and growing - labeled images of various bird species. The training images, and the input images, are processed in 398x398 pixel resolution, which was selected after experimentation to give the current best accuracy. The model was trained and then fine-tuned over many iterations with model checkpointing and early stopping in order to get the best performing training configuration. Data augmentation was also used on the training dataset as a means to reduce overfitting of the model.

The majority of the model training was done on a single machine with an Intel Core i7 8800 and a Nvidia RTX 2070, on Windows 10 and Python 3.6 64-Bit.

Sponsored

As an Amazon Associate we earn from qualifying purchases.

The main code gist for the model training is shown below, and the full code of the project - including the code for training, inference and the Flask webapp - can be accessed from the GitHub repository of the project:

Sponsored

As an Amazon Associate we earn from qualifying purchases.

The current model training history graph is shown below:

The model training graph of the current version of Bird Watch
The model training graph of the current version of Bird Watch

As we're using model checkpoints and early-stopping in Keras, the training plateau was achieved at around ~30 epochs. However, the training takes around 10+ hours in our current machine config.

The full model structure of Bird Watch is shown below:

Apart from the head layers of the network (the dense layers below the GlobalAveragePooling2D layer) the remaining architecture is currently from the InceptionV3 network.

We are constantly experimenting with the model structure in order to increase the accuracy.

The Deep Learning model structure of Bird Watch
The Deep Learning model structure of Bird Watch

The model structure image is generated using Netron, a visualizer and viewer for neural network, deep learning and machine learning models: https://github.com/lutzroeder/netron