InThings Technologies Pvt Ltd, Palakkad +91- 96336 87401 , +91-8978390303 info@inthings.tech

Pothole and Speed Limit Detection with YOLOv8 on Raspberry Pi: A GStreamer & NNStreamer Integration

Category : Computer Vision | Sub Category : Posted on 2024-10-14 12:34:58


Pothole and Speed Limit Detection with YOLOv8 on Raspberry Pi: A GStreamer & NNStreamer Integration

Introduction

In recent years, Raspberry Pi has emerged as a formidable platform for deploying real-time object detection solutions, thanks to its affordability, compact size, and versatility. This project investigates the capabilities of Raspberry Pi in detecting critical objects, specifically speed signs and potholes. By leveraging a custom-trained YOLOv8 model, we integrated GStreamer and NNStreamer to implement an efficient real-time object detection system on this compact device. This blog outlines our process of converting the YOLOv8 model to TensorFlow Lite and utilizing GStreamer pipelines for live camera feed processing. By combining the high accuracy of YOLOv8 with the lightweight inference capabilities of NNStreamer, we successfully achieved real-time detection of speed signs and potholes on the Raspberry Pi. Our approach not only demonstrates the potential of Raspberry Pi in enhancing smart traffic systems and road safety but also paves the way for innovative applications in autonomous vehicles.

Converting YOLOv8 Model to TensorFlow Lite (TFLite)

The first step in deploying our custom YOLOv8 model on the Raspberry Pi involved converting it to TensorFlow Lite (TFLite) format. TFLite is lightweight and optimized for edge devices, making it perfect for running on the Raspberry Pi. This conversion was necessary to utilize NNStreamer, a framework that efficiently handled real-time neural network inference.

After training our YOLOv8 model, we loaded it with the best weights and exported it directly to TFLite format using the provided code.

Introduction to Raspberry Pi

Before we deployed our YOLOv8 model and utilized NNStreamer for real-time inference, we set up our Raspberry Pi. The Raspberry Pi, a small, affordable single-board computer, gained immense popularity for its versatility and ability to perform various computing tasks. It is an excellent platform for developing IoT projects, robotics, and machine learning applications, making it an ideal choice for our object detection project. In our case, we used Ubuntu as the operating system to maximize compatibility with NNStreamer.

NNStreamer
NNStreamer is a versatile framework designed for processing and streaming data with neural networks, built on top of the widely-used GStreamer multimedia processing framework. It facilitates the integration of deep learning models into various applications, enabling efficient data handling and real-time inference capabilities. By providing a simple and effective way to implement neural network pipelines, NNStreamer allowed developers to leverage the power of machine learning seamlessly.
In our project, we utilized NNStreamer to deploy our converted YOLOv8 model for real-time object detection, focusing on identifying critical elements such as speed signs and potholes. Its ability to streamline the processing of neural networks made it an essential tool for our Raspberry Pi-based application.

Installing NNStreamer

After successfully setting up Ubuntu on the Raspberry Pi, the next step was to install NNStreamer. To do this, we opened the terminal on our Raspberry Pi and executed the following commands:

These commands added the NNStreamer repository to our system, updated our package list, and installed the necessary packages. With NNStreamer installed, we were ready to integrate our YOLOv8 model for real-time object detection, leveraging the power of GStreamer to process and stream data effectively.

Creating the label.txt File

To facilitate object detection in our pipeline, we created a label.txt file containing the labels corresponding to the objects our YOLOv8 model was trained to recognize. In our case, we focused on detecting speed signs and potholes. This file served as a reference for the model during inference, enabling it to correctly identify and classify detected objects.

Each line represented a different label that the model could detect. In this case, the labels included "pothole" as well as speed limit signs represented by the numbers 20, 30, 60, 80, and 100. We ensured that this file was saved in a location accessible to our NNStreamer pipeline so that it could utilize the labels during the inference process.

Building the GStreamer Pipeline for Object Detection

Now that we had created our label.txt file, we constructed a GStreamer pipeline to leverage our trained YOLOv8 model for object detection. The following command set up the entire process:
gst-launch-1.0 v4l2src ! videoconvert ! videoscale ! video/x-raw,width=640,height=640,format=RGB,pixel-aspect-ratio=1/1,framerate=30/1 ! tee name=t t. ! queue ! tensor_converter ! other/tensors,num_tensors=1,types=uint8,format=static,dimensions=3:640:640:1 ! tensor_transform mode=arithmetic option=typecast:float32,add:0.0,div:255.0 ! queue leaky=2 max-size-buffers=2 ! tensor_filter framework=tensorflow2-lite model=/path/to/our/best_float16.tflite custom=Delegate:XNNPACK,NumThreads:4 latency=1 ! capsfilter caps="other/tensors,num_tensors=1,types=float32,format=static,dimensions=8400:10:1" ! tensor_transform mode=transpose option=1:0:2:3 ! tensor_decoder mode=bounding_boxes option1=yolov8 option2=/path/to/our/label.txt option3=0 option4=640:640 option5=640:640 ! video/x-raw,width=640,height=640,format=RGBA ! mix.sink_0 t. ! queue ! mix.sink_1 compositor name=mix sink_0::zorder=2 sink_1::zorder=1 ! videoconvert ! autovideosink

Pipeline Breakdown

1. Video Source The pipeline kicked off with the v4l2src element, which captured video from a connected camera (either a webcam or a Raspberry Pi Camera Module). This was the first step in acquiring the real-time video stream that was processed by the object detection model.

2. Format Conversion and Scaling Once the video stream was captured, the videoconvert and videoscale elements handled format conversion and scaling. The video was converted to the RGB format and resized to 640x640 pixels, matching the input resolution typically required for YOLOv8 models. Additionally, the frame rate was set to 30 frames per second, ensuring smooth real-time video processing.

3. Fixed Input Shape (640x640) YOLOv8 models generally work with a fixed input shape of 640x640 pixels. This input size is critical for ensuring the model processes the video frames correctly, as YOLO models are optimized for specific input resolutions. Unless we specifically changed the model’s configuration during training, the input shape remained constant, making it easy to integrate into the pipeline without modification. This allows the model to effectively detect objects in real-time.

4. Tensor Conversion The next stage involved tensor_converter, which transformed the video stream into a tensor. This step is crucial for feeding the image data into the neural network for inference. The tensor format must match what the model expects, ensuring compatibility with the YOLOv8 model loaded later in the pipeline.
5. Normalization In the normalization phase, the tensor_transform element is used to scale pixel values and cast the data type to float32. The raw pixel values (0-255) are scaled to a range of 0-1, which is the format YOLOv8 models typically require. This ensures the input tensor is correctly prepared for processing by the neural network.
6. Tensor Filtering
At this point, the tensor_filter element loaded the custom YOLOv8 TensorFlow Lite model for inference. This step allows for optimizations like XNNPACK, which enhances performance by speeding up inference on the Raspberry Pi. This ensures real-time processing even with larger models or more complex object detection tasks.

7. Variable Output Shape in CapsFilter One of the key elements in the pipeline is the capsfilter, which specifies the expected output shape of the model. For YOLOv8 models, the output shape can vary depending on how our custom model was trained. In this example, the output shape is set to 8400:10:1 where: i. 8400: Number of bounding box predictions. ii. 10: Coordinates, confidence scores, and class information for each bounding box. iii. 1: Batch size. Since we are working with different custom YOLOv8 models, the output shape may vary depending on the number of objects and classes our model is trained to detect. We will need to adjust the output shape in the capsfilter to match our model’s specific configuration. Failing to do so can lead to incorrect detection results or errors in processing.

8. Bounding Box Decoding After inference, the tensor_decoder decodes the model’s output, generating bounding boxes for detected objects. The decoder uses the label.txt file to map the detected objects to their corresponding class labels, such as potholes and speed signs. This step transforms the raw model output into meaningful detection results.
9. Overlaying Results Finally, the detection results are overlaid on the original video stream using the compositor element. The bounding boxes and class labels are displayed over the live video feed, which is rendered on-screen using autovideosink. This completes the real-time object detection pipeline, allowing us to visualize the detected objects as the video plays. This pipeline enables real-time object detection for speed signs and potholes, processing video feeds through the YOLOv8 model effectively on the Raspberry Pi.

Autostart for the Object Detection Pipeline on Raspberry Pi (Ubuntu)

Once our object detection pipeline was up and running, we wanted it to start automatically when our Raspberry Pi booted up. This was done easily on the Raspberry Pi running Ubuntu by using the Startup Applications feature.
Final Step: Autostart the Pipeline
1. Open Startup Applications On our Raspberry Pi running Ubuntu, click on the Applications menu, and search for Startup Applications. This tool allows us to configure programs to run automatically upon login.
2. Add a New Startup Program Once the Startup Applications window opened, we clicked the Add button to create a new startup entry.
3. Set the Pipeline Command In the Name field, we gave the startup task a meaningful name like “Object Detection Pipeline.” In the Command field, we entered the full GStreamer pipeline command that we used for object detection. 4. Save the Startup Entry After entering the command, we clicked Save. The next time we booted up our Raspberry Pi, this command would automatically run, starting the object detection pipeline without needing manual intervention. With this final step, our object detection pipeline would automatically start whenever our Raspberry Pi was powered on, allowing for a smooth, hands-free setup. This was particularly useful for deployments where the Raspberry Pi needed to function as an edge device for real-time detection without manual setup every time it was restarted.

Real-World Detection: Test Results

During the field test, the system processed live video feeds to identify and detect potholes effectively. The combination of the Raspberry Pi's portability and NNStreamer's efficient streaming capabilities allowed for seamless integration and deployment in real-world scenarios. As illustrated in the image below, the model successfully identified a pothole, with the bounding box clearly outlining the detected area. This real-time detection capability was crucial for timely reporting and maintenance, enhancing traffic management and safety measures on our roads.



Figure: Detection of a pothole during the field test. The bounding box effectively highlights the detected pothole, showcasing the model's reliability in real-world scenarios

Conclusion

In this project, we effectively integrated YOLOv8, Raspberry Pi, and NNStreamer to develop a real-time object detection system targeting potholes and speed limit signs. YOLOv8 provided state-of-the-art accuracy and speed, enabling swift identification of critical road conditions. Utilizing the Raspberry Pi as a compact computing platform allowed for practical deployment in diverse environments, ensuring accessibility and scalability. NNStreamer optimized our data processing pipeline, facilitating seamless integration of live video streams with machine learning models while maintaining high accuracy. Our work demonstrates the potential of these technologies to address pressing infrastructure challenges, contributing to smarter, safer roadways through the application of deep learning and edge computing.


For inquiries regarding the development of a Computer Vision solution, please contact us at info@inthings.tech today.


Leave a Comment: