As technology advances, we have new and improved ways of dealing with various problems. One example would be the use of high-resolution cameras in traffic. They can be placed on congestion points for the purpose of traffic reports (so commuters would know which places to avoid on their way to work).
Also we can monitor car flows for making a decision about the development of future roads, use them for quick reactions in case of accidents or similar problems. This tasks are a lot easier if cameras have high resolution. But with that improvement often we have a picture so clean that we can read license plates of the cars passing by.
Our task was to perform the detection and concealment of such readable license plates.
Why conceal license plates?
Besides running from police, one couldn't think of a good reason for concealing license plate number. Well, that was maybe the case, but lately there is a lot of concern regarding private data of individuals.
The main drive of this is the introduction of GDPR, a set of European Union regulations that governs how companies have to manage personal data. GDPR covers any data that can be used to identify a person, even indirectly. License plate numbers are personal information under GDPR, because you can theoretically identify the owner of the car if you know their license plate number.
This could be exploited to track someones whereabouts, so we need an efficient way to conceal readable license plates. To conceal them, we must first be able to detect them. This is a complex task due to the continuous flow of cars passing by the cameras.
Because of complexity and tediousness this task is not suitable for humans, but it is perfect for a machine system. A system for this task needs to be fast, precise, and generalized to do the same thing on all cameras.
How do we build such system? Well, the answer is technology called Deep Learning. Because of simplified ways to collect huge amount of data, machines that can perform fast computations and advances in the field of AI, Deep Learning today is the defacto state-of-the-art approach for Computer Vision.
Overview of the system
The system is made up of three major parts:
- Object detection
- Landmark detection
- REST API
First, we want to detect all license plates in one frame (one frame could contain multiple cars). The output here is a bounding box of detected license plates for the concealment. We have around 13.000 labeled images for this task.
We could build model from scratch, but there are other already trained models on similar tasks which we could use. For this we use the Tensorflow Object Detection API, because it makes easy to construct, train and deploy object detection models. This API also has number of different trained models, therefore we can choose best suitable model for our needs. These models are not trained for detecting license plates, but we can use them and fine-tune them on our dataset for our specific task (Transfer learning).
We mentioned that our system must be both fast and precise, so we have trade-off between speed and accuracy. The most precise model for object detection is Faster R-CNN model, but the fastest one is SSD MobileNet (source). After experiments with the both, we chose SSD MobileNet.
Although Faster R-CNN has magnificent accuracy, this model is very slow. Cameras send frames every few seconds and we must process the images and detect all license plates in short period of time so our system would be efficient. With advanced hardware this is achievable but also very expensive. Also it is not scalable - If we add more cameras, the cost will increase rapidly.
SSD MobileNet is a lot faster than Faster R-CNN but it has lower accuracy. Is there a way we could improve accuracy of our system with SSD MobileNet as object detection model, but without losing too much speed ? Well, this is task of the next part in our system.
Task of this model is to give us finer and better detections of license plates. Idea is simple: For each detection in a frame we send a bounding box of detected license plates to a landmark detection model to find corners of a specific license plate.
We built this model from scratch in KERAS with Tensorflow as backend. By using a Convolutional Neural Network (CNN) that takes image with 100x100 size of car with license plate as input, and outputs x,y coordinates for all 4 corners of license plate.
The model was trained on 20.000 images of license plates and annotations of the license plate corners. By detecting corners, we have a much better approximation of license plate location, and thus, improved precision.
The model is not complex (for CNN) - we set up simple neural network architecture with 2 convolutional layers followed by 2 fully connected dense layers and output layer (model has 3 million trainable parameters) , therefore its predictions are very fast (3 ms per license plate on 2,4 GHz Intel Core i7) and have tiny effect on the speed of detections, so this gives us precisely what we want.
But with this model we gain another huge advantage. Object detection models always output rectangular shapes of their detections. Because of the angle of the camera watching the traffic, license plates on the frames don't have proper rectangular form. With corner detection we can get much finer detection and practically get some quadrangular shapes of the detected license plates.
It is important to notice that we expand the bounding box of detected license plate because we want bigger context and more information for our landmark detection model to achieve better generalization.
Another reason is mentioned lower accuracy of SSD MobileNet - this model sometime returns bounding box which doesn’t cover whole license plate.
The purpose of the REST API is coordination between cameras and deep learning models for detection. App takes requests (video streams frame by frame) from client (traffic cameras) and delegates them to the previously mentioned modules. After detection and blurring part frame is sent back to client. We got additional significant boost in the computational speed, by building a Tensorflow package from the source code.
Summary & results
To summarize the system, this is what it all looks like:
- A camera records the road and every few seconds sends a frame to a web interface of our system
- Each frame is preprocessed and then sent to an object detection model for recognizing all license plates in the frame
- An expanded bounding box of each detected license plate is send to landmark detection model which returns predictions of each of the 4 corners of given license plate
- For each detected license plate in the frame we blur the part that is between corners
- Frame with blurred license plates gets returned to the client.
In this blog post we have shown that is possible to build a license plate concealment system that is both fast and precise. Furthermore, the system is generalized to work equally well on all cameras. All this is possible with emerging deep learning tehniques.
Need help with your Machine Learning projects? Get in touch, we'd love to help with your challenge.