Skip to main content

How to train a custom dataset and make inference with Official YOLOv7 on Kaggle?

Another post starts with you beautiful people! I hope you have already learned the state-of-the-art object detection technique Darknet's YOLOv4 from my previous posts. If you are new to my blog then I recommend you to once go through this link to learn about end-to-end implementation of YOLOv4. In this post we are going to learn the successor of the official YOLOv4 : YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors 💥. Since the official implementation of YOLOv7 is still not done in the Darknet framework, in this post we are going to use the Pytorch framework to train a custom dataset and then make inferences using the same Pytorch framework. But don't worry if you are new to PyTorch 👲, we are not going to reinvent the wheel in fact as a practitioner our goal should be to learn how to use it.

For our learning, we will use a real-world dataset and the problem- detecting starfish in underwater images. This challenge was held on Kaggle. It is a perfect use case for our learning since there are many other tutorials that use public datasets. However, preparing a dataset is also an important job before applying the training. So first let's understand the given dataset format-

Goal: To predict the presence and position of crown-of-thorns starfish in sequences of underwater images taken at various times and locations around the Great Barrier Reef.

Dataset: train/ - Folder containing training set photos of the form video_{video_id}/{video_frame_number}.jpg. [train/test].csv - containes following metadata for the images: video_id, video_frame, sequence id, sequence_frame number, image_id, annotations in COCO format. 

Dataset Preparation: Let's take a look at the dataset with our Python code:


Let's verify if all the images have the annotations or not. We will later remove the images from our training dataset which don't have the annotations otherwise it will affect our model accuracy-


So we have only 20.93% annotated images in the given dataset. Let's remove un-annotated images metadata from our dataframe-

Also, let's remove the unannotated images with the help of joblib's parallel function for speedup. Here we need to copy the data to Kaggle's output directory since the input directory doesn't have the write access-


So, we have now a total of 4919 annotated images to train/validate. Our next step is to visualize the images and see what annotations look like in these images. For this purpose, we don't need to write the code from scratch instead we will use bounding box utility github 👼. Don't forget to give a start to this GitHub repo 👍. Let's import the helpful bounding box creation functions from the repo as below-


Next, let's fetch the bounding box coordinates using the above function-

Let's check the newly created column in our dataframe-

Now we have bounding box coordinates as well in COCO format. Since the given dataset has images of fixed sizes" (1280. 720); it's better to set another two columns- width and height in our dataframe as below-


Next, following the best practice of training dataset preparation, we will apply cross-validation to the dataset as below-


As you can see the number of samples is not same in each fold, so for the training we should experiment with each fold but for this post, I  will use the first fold-



Prepare YOLO format dataset: Since till now we have cleaned and finalized the training dataset; the next step is to prepare this dataset in YOLO required format. For this step, we will export the labels into a .txt file. This .txt file will contain one row per target class where each row is in the following format: target_class_name [x_center, y_center, width, height] . Here the coordinates must be in normalized form and target class numbers start with zero. All these required conversions can be done using the below code-


Let's check our converted bounding box coordinates in one of our cross folded dataframe-


Now we can visualize the training dataset images with their annotations using the matplotlib library as below-


And the sample annotated images look like the below-



After seeing the above few samples, we can imagine how complex a problem is this to solve with computer vision as compared to publically available datasets. Here the target class is very small and images are blurred. Let's see how YOLOv7 is going to detect these starfish 👀. But for now, let's create the .txt files required by YOLO-


Here 'nc' means a number of classes to detect that is 1 in our case, 'names' is the target class name that I defined as cots but you can give any name to it.

YOLOv7 training:
Till now we have prepared the dataset to start the training. For the custom training, YOLOv7 has provided the different sizes of pre-trained weights-


Here I am going to use the largest one: yolov7-e6e_training.pt. To use this pre-trained weight for our training, we also need to use its configuration file (yolov7-e6e.yaml) that we can get from this official config files link and the hyperparameter file (hyp.scratch.custom.yaml) that we can get from this custom hyperparameter file link. Now first we will create a new hyperparameter configuration file and then we will copy the content from hyp.scratch.custom.yaml file to this new file in Kaggle as below-


The same we will do to copy the content of the yolov7-e6e.yaml file to a new file in Kaggle-


Notice here in the above content, I have changed 'nc:80' to 'nc:1' since we have only one class to predict. Rest all parameters are the same. Next, we will clone the official YOLOv7 repo in Kaggle's output directory, and then we will install the dependencies as below-


Once all dependencies are installed, we will download the pre-trained weights as below-


Now, we are ready to start the training with the below command-


Here, a number of workers I have defined as 8 which you can change accordingly to the system of yours, device as zero means single GPU training, the default number of epochs are 300 but due to limited training time in Kaggle I have used 26 as epoch size, batch size I have used as 3 to avoid out of memory issue but it should be 16 and rest other parameters are the pre-trained weight file path and the config file paths. Once the training is completed you can find the best-trained weight file (best.pt) under /yolov7/runs/train/ path as below-


YOLOV7 Inference: Since our training is completed, it's time to make inferences on the test images and prepare a submission file as required by the competition. For making inferences first step is to load our trained model. For this purpose, we can use torch.hub.load() function as below-


In this function, the first parameter is the YOLOv7 repo path, the second parameter is 'custom' since we are using a custom trained model, the third parameter is the best weight file path, and the fourth parameter is given as 'local' since we are loading the model from our local file system and the rest other configurations we are setting to our loaded model are easily understandable.

After loading the model next step is to make the prediction. As required by this competition we need to predict the bounding box coordinates in COCO format and the confidence score of a given image in prediction and that can be achieved as below-


Here for bounding box conversion, I have used the helper code from this awsome repo 🙏. Now we can make the required submission as below-



Here, we have iterated the iter_test variable since test images were not given as a whole instead organizer shared the API. After iterating each test image we made predictions using our custom-trained YOLOv7 model and formatted the predictions in the required format. Let's see one test image with the prediction-


It looks good 👌 but how can we know the goodness of our model? That's why I used the Kaggle dataset so that I can submit the prediction to this challenge and see my private score. Same I did and I got the following score with my trained model-


Next, I checked for any other submission but with YOLOv5 and found the below score-


As we can see here, with default configurations of YOLOv7, my model has achieved a 0.602 private score as compared to 0.588 achieved by YOLOv5 which indicates that indeed the latest official release of YOLOv7 is faster and more accurate than other object detectors 💪. We can achieve a better result with hyperparameter tuning, different augmentation, etc.

For complete code reference please follow my training script here and the inference script here. So no need to rest guys 💣. Just forked my training notebook, do experiments, make submissions, and see how much can you score on the leaderboard 💨. This post and the shared notebook will be helpful in your journey to learn and use the state-of-the-art object detector. In my next post I will again bring you something amazing to learn till then 👉 Go chase your dreams, have an awesome day, make every second count, and see you later in my next post 😇









Comments

Post a Comment

Popular posts from this blog

How to use opencv-python with Darknet's YOLOv4?

Another post starts with you beautiful people 😊 Thank you all for messaging me your doubts about Darknet's YOLOv4. I am very happy to see in a very short amount of time my lovely aspiring data scientists have learned a state of the art object detection and recognition technique. If you are new to my blog and to computer vision then please check my following blog posts one by one- Setup Darknet's YOLOv4 Train custom dataset with YOLOv4 Create production-ready API of YOLOv4 model Create a web app for your YOLOv4 model Since now we have learned to use YOLOv4 built on Darknet's framework. In this post, I am going to share with you how can you use your trained YOLOv4 model with another awesome computer vision and machine learning software library-  OpenCV  and of course with Python 🐍. Yes, the Python wrapper of OpenCV library has just released it's latest version with support of YOLOv4 which you can install in your system using below command- pip install opencv-pyt...

How to convert your YOLOv4 weights to TensorFlow 2.2.0?

Another post starts with you beautiful people! Thank you all for your overwhelming response in my last two posts about the YOLOv4. It is quite clear that my beloved aspiring data scientists are very much curious to learn state of the art computer vision technique but they were not able to achieve that due to the lack of proper guidance. Now they have learnt exact steps to use a state of the art object detection and recognition technique from my last two posts. If you are new to my blog and want to use YOLOv4 in your project then please follow below two links- How to install and compile Darknet code with GPU? How to train your custom data with YOLOv4? In my  last post we have trained our custom dataset to identify eight types of Indian classical dance forms. After the model training we have got the YOLOv4 specific weights file as 'yolo-obj_final.weights'. This YOLOv4 specific weight file cannot be used directly to either with OpenCV or with TensorFlow currently becau...

How can I make a simple ChatBot?

Another post starts with you beautiful people! It has been a long time of posting a new post. But my friends in this period I was not sitting  where I got a chance to work with chatbot and classification related machine learning problem. So in this post I am going to share all about chatbot- from where I have learned? What I have learned? And how can you build your first bot? Quite interesting right! Chatbot is a program that can conduct an intelligent conversation based on user's input. Since chatbot is a new thing to me also, I first searched- is there any Python library available to start with this? And like always Python has helped me this time also. There is a Python library available with name as  ChatterBot   which is nothing but a machine learning conversational dialog engine. And yes that is all I want to start my learning because I always prefer inbuilt Python library to start my learning journey and once I learn this then only I move ahead for another...