Skip to main content

How to detect an object in real time using keras-yolo3?



Another post starts with you beautiful people! In the past few months I was working on a complex object detection and recognition problem. My client is from a leading winery industry and they had an existing system built on vgg19 and keras-retinanet. This system was built to help client in their sales forecasting. The problem with current system was it was inaccurate- it did not detect most of the wine bottles and brands, it did not give the result in real time. You can imagine how this bad model can affect the business!

To solve the existing issues I tried a lot of things- changing hyperparameters, increasing the datasets, different keras applications but it was not giving me satisfactory result. May be I was not doing it right but I had put a lot of time and efforts with it. Then while doing R&D, I read this fantastic blog and I came to know about a state of the art and real time object detection system- YOLO. You Only Look Once or YOLO is a custom deep learning framework written on C and you can read more about it in it's official site.

Instead of writing the code from scratch I found two github repository as third party implementation of YOLO version 3-

  1. experiencor/keras-yolo3 
  2. qqwweee/keras-yolo3
You can follow any of the above link and run this code to see how it works. In this post I will share how I tested this in my system. For testing this model on unseen pictures you need to follow below steps-

A. Prepare your virtual environment- The first step before starting your object detection and recognition journey is to install all required libraries. I recommend to create a virtual environment and install all libraries there instead of installing in base location. This saves your base location in case of any corruption while installing. To create and activate the virtual environment, open anaconda prompt with admin rights and run following two commands one by one-
conda create --name myNewEnv python=3.7.3
activate myNewEnv
Here myNewEnv is the name of my virtual environment and 3.7.3 is my Python version. Replace it with your own name and version. Once you activate it, it's time to install required libraries. Before installing these libraries make sure you have Visual Studio 2017 with C++ extension is installed. If not then please install it and add C++ extension in it otherwise you will face unnecessary issues. Here I share list of mine which you can also use-
pip install keras==2.2.4
pip install tensorflow-gpu==1.14.0
pip install scikit-learn
conda install anaconda-client
conda install -c conda-forge/label/cf201901 opencv
pip install keras-retinanet
conda install shapely
conda install -c conda-forge imgaug
conda install -c conda-forge google-cloud-vision
conda install -c pjamesjoyce imutils
conda install -c anaconda flask
conda install -c conda-forge/label/cf201901 flask-restful
pip install tqdm
pip install boto3
pip install matplotlib
pip install seaborn
pip install xlrd
pip install pytesseract
pip install apscheduler
In above list you can see I have used variation of conda and pip. It is because while installing opencv and keras-retinanet in my windows machine I faced so much issues and I resolved those issues after so many efforts. So it is recommended to use above command as exact as I have mentioned. This is one time setup and most of the required image processing related libraries are mentioned in this list. Once all the installation is successful you can proceed with next step.

B. Download pre-trained weights- Second step is to download the pre-trained weights from This Link. It's a 235 mb file with name yolov3.weights.

C. Define keras model- Our next step is to define a keras model to match with the downloaded weights. It means our keras model should have right number of layers and right types of the layers to match with Yolo weights.  This is the actual complex part but in the github repositories I have shared earlier, you can find the functions written already for this task. So just copy those but don't forget to give a Star to the original authors in github. Here is screen shot of the code snippet-

# Create block of layers

# Create the model




D. Load Model Weights- Our next step is to load the downloaded weights. But we cannot load and read that file directly in keras since downloaded weights are written in Darknet architecture and we are using keras architecture here. For this reading and parsing purpose we can use following class-


We can then easily use above functions and classes and then save result in keras format like below-

That's it. We have successfully completed the complex part. Now we can use this model like any other keras model we have used. Once you run the above code, you will see following like output in console-

Once the script is completed, in your current working directory 'model.h5' will be saved.

E. Test the model-
Like any other keras application, this model also requires input image in a defined shape. The YOLO system expects input shape of the image is in 416 x 416 pixels. You can use following code snippet to test the model-

Once you run the above code, it will display output in NumPy array format-

F. Decode the output- Currently from the output we cannot say anything because it is NumPy format, in order to understand this we need to decode it. Here decoding means in terms of bounding boxes around our object. In the github repository of experiencor there is a function called decode netout(), which takes each NumPy array from our output one by one and decode the bounding box and prediction-

Once we apply the above function it will return the bounding boxes. But these bounding boxes can be stretched back into the shape of original image. To fix this issue the experiencor script provides the correct_yolo_boxes() function-

Now we will get fixed bounding boxes but one issue with this is those bounding boxes may overlap. To fix this issue the experiencor script provides a do_nms() function that takes the list of bounding boxes and a threshold parameter-

Next, we need to assure to get only those bounding boxes which have strong presence of an object. For this we need to enumerate over all boxes and checking the class predictions. In this way we can then add class label. Following code snippet does the same-

You can test this function on a list. In our case list contains the name of various objects like below-

Our next step is to draw the bounding boxes around our detected object. That can be done using below function-

Once you summarize all above functions or run the script provided in github repo, you will see bounding boxes and name of the object detected in your image like I am getting in my input image-

For my image, the model with pre-trained weights is showing amazing results. It is correctly able to detect bottle in my image. Check yourself on different images with different objects and see how this amazing model works in real time. I have used the same repositories to do R&D for my work and after a lot of practice and trials I was able to successfully use this model with my custom dataset. For now try the above steps I have shown you, read the code many times, change it's configurable values and analyze the effect. Till then Go chase your dreams, have an awesome day, make every second count and see you later in my next post.

Comments

  1. can i run opencv and darknet at the same time?

    ReplyDelete
    Replies
    1. Can you explain 'running' here? I have trained my dataset on YOLO and then for loading test image, creating bounding boxes I am using open cv wihout any problem.

      Delete
  2. Thanks for sharing this blog.This article gives lot of information.
    Python Training in Hyderabad

    Python Training

    ReplyDelete
  3. Is it really a best way to detect object in real time.?? BTW I will check it manually because it seems workable. Regards: mstweaks

    ReplyDelete

Post a Comment

Popular posts from this blog

Building and deploying your ChatBot with Amazon Lex, AWS Lambda, Python and MongoDB

Another post starts with you beautiful people! Most of the businesses are adopting digital transformation to modernize customer communication and improve internal processes. By personalizing the user experience whether in a chatbot conversation, on a website or in email, you can make your user feel more valued and understood.  Google DialogFlow  and  Amazon Lex    are two pioneer vendors for building end to end personalized chatbot applications. In this post we are going to use Amazon Lex to build our chatbot and after the end of this post you will have your chatbot integrated with a web page and also your web page will be deployed on AWS cloud. This post is going to be long and very interesting so stay focus and keep reading the post till the end. Step 1. Creating your account in AWS To proceed with this post you must have an AWS account. If you don't have , just follow  this link   to create a free tier account there. While registration it may...

How to use opencv-python with Darknet's YOLOv4?

Another post starts with you beautiful people 😊 Thank you all for messaging me your doubts about Darknet's YOLOv4. I am very happy to see in a very short amount of time my lovely aspiring data scientists have learned a state of the art object detection and recognition technique. If you are new to my blog and to computer vision then please check my following blog posts one by one- Setup Darknet's YOLOv4 Train custom dataset with YOLOv4 Create production-ready API of YOLOv4 model Create a web app for your YOLOv4 model Since now we have learned to use YOLOv4 built on Darknet's framework. In this post, I am going to share with you how can you use your trained YOLOv4 model with another awesome computer vision and machine learning software library-  OpenCV  and of course with Python 🐍. Yes, the Python wrapper of OpenCV library has just released it's latest version with support of YOLOv4 which you can install in your system using below command- pip install opencv-pyt...

Generative AI with LangChain: Basics

  Wishing everyone a Happy New Year '24😇 I trust that you've found valuable insights in my previous blog posts. Embarking on a new learning adventure with this latest post, we'll delve into the realm of Generative AI applications using LangChain💪. This article will initially cover the basics of Language Models and LangChain. Subsequent posts will guide you through hands-on experiences with various Generative AI use cases using LangChain. Let's kick off by exploring the essential fundamentals💁 What is a Large Language Model (LLM)? A large language model denotes a category of artificial intelligence (AI) models that undergo extensive training with extensive textual data to comprehend and produce language resembling human expression🙇. Such a large language model constitutes a scaled-up transformer model, often too extensive for execution on a single computer. Consequently, it is commonly deployed as a service accessible through an API or web interface. These models are...