Welcome to the OCR, Image Detection & Object Detection Using Python course. This course is part of the Computer Vision series and focuses on various applications of Computer Vision, such as Image Recognition, Object Detection, Object Recognition, and Optical Character Recognition.
Significance of Computer Vision
Image Recognition, Object Detection, Object Recognition, and Optical Character Recognition are widely used in various applications of Computer Vision. These techniques enable computers to recognize and classify either the entire image or multiple objects within a single image, predicting the class of the objects with a certain percentage accuracy score. Additionally, Optical Character Recognition allows for the recognition and conversion of text in images to machine-readable formats, such as text or documents.
Applications of Object Detection and Recognition
Object Detection and Object Recognition have diverse applications, ranging from simple applications to complex ones like self-driving cars.
This course serves as a quick introduction for individuals interested in delving into Optical Character Recognition, Image Recognition, and Object Detection using Python, without having to grapple with the complexities and mathematics associated with typical Deep Learning processes.
The course will cover the following topics:
- An introductory theory session about Optical Character Recognition technology
- Preparing the computer for Python coding by downloading and installing the Anaconda package, and ensuring proper installation.
Introduction to Python Programming
Many of you may not have a background in Python-based programming. The upcoming sessions and examples will provide you with the fundamental Python programming skills necessary for the course. The topics covered include Python assignment, flow-control, functions, and data structures.
Installing Dependencies for Optical Character Recognition (OCR)
We will then proceed to install the required dependencies and libraries for Optical Character Recognition. This involves utilizing the Tesseract Library for OCR. Initially, we will install the library followed by its Python bindings. Additionally, we will install OpenCV, an open-source computer vision library in Python, and the Pillow library, which is the Python Image Library. Subsequently, we will introduce the steps involved in Optical Character Recognition and then move on to coding and implementing the OCR program. The program will be tested using sample images for character recognition.
Introduction to Convolutional Neural Networks (CNN) for Image Recognition
Following this, we will introduce Convolutional Neural Networks (CNN) for Image Recognition. The focus will be on classifying a full image based on the primary object within it.
Installing Keras Library for Image Recognition
The next step involves installing the Keras Library, which will be utilized for Image recognition. We will make use of the built-in, pre-trained models included in Keras. The Keras documentation provides the base code in Python. Initially, we will utilize the popular pre-trained model architecture known as VGGNet, and conduct an introductory session on its architecture. Subsequently, we will use the pre-trained VGGNet 16 Model included in Keras for Image Recognition and classification. This will be followed by testing predictions on sample images. Later, we will explore the deeper VGGNet 19 Model included in Keras for Image Recognition and classification.
Trying Pre-trained Models in Keras
We will start by experimenting with the ResNet pre-trained model included in the Keras library. In the code, we will incorporate the model and test its predictions using a few sample images. Following this, we will move on to the Inception pre-trained model, similarly including it in the code and evaluating its predictions with sample images. Subsequently, we will explore the Xception pre-trained model, integrating it into the code and assessing its predictions with sample images.
Image Recognition vs. Object Recognition
The aforementioned pre-trained models are designed for image recognition, capable of labeling and classifying complete images based on the primary object within them. We will then shift our focus to object recognition, which allows for the detection and labeling of multiple objects within a single image.
Introduction to MobileNet-SSD Pre-trained Model
Our foray into object recognition will commence with an overview of the MobileNet-SSD Pre-trained Model, a single shot detector adept at identifying multiple objects within a scene. Additionally, we will briefly discuss the dataset used to train this model.
Implementing MobileNet-SSD Pre-trained Model
Following the introduction, we will implement the MobileNet-SSD Pre-trained Model in our code, obtaining predictions and bounding box coordinates for each detected object. We will then outline the detected objects in the image with bounding boxes and include the corresponding labels and confidence values.
Object Detection from Live Video
Our next endeavor involves object detection from a live video, where we will stream real-time video from the computer's webcam and attempt to detect objects. We will draw rectangles around each detected object in the live video, alongside their labels and confidence values.
Object Detection from Pre-saved Video
In the subsequent session, we will proceed with object detection from a pre-saved video, streaming the video from our folder and endeavoring to detect objects within it. Similar to the live video detection, we will outline the detected objects in the video with rectangles, including their labels and confidence values.
Implementing the Mask-RCNN Pre-trained Model for Object Detection
Later, we will proceed with the implementation of the Mask-RCNN Pre-trained Model. Unlike the previous model, which only provided a bounding box around the object, the Mask-RCNN allows us to obtain both the box coordinates and the mask outlining the exact shape of the detected object. We will provide an introduction to this model and delve into its specifics.
Object Detection and Masking Implementation
Obtaining Predictions and Bounding Box Coordinates
Initially, we will incorporate the Mask-RCNN Pre-trained Model into our code. The first step involves acquiring predictions and bounding box coordinates for each detected object. Subsequently, we will draw the bounding box around the objects in the image and display the label alongside the confidence value.
Utilizing Object Masks
Following this, we will receive the mask for each predicted object. We will process this data and utilize it to overlay translucent, multi-colored masks onto each detected object. Additionally, we will display the label along with the confidence value.
Real-time Object Detection
Our next endeavor involves implementing object detection from a live video using Mask-RCNN. We will stream real-time video from the computer's webcam and attempt to detect objects within it. Subsequently, we will overlay a mask around the perimeter of each detected object in the live video, accompanied by the label and confidence.
Introduction to Object Detection Techniques
In line with our previous model, we will proceed with object detection from a pre-saved video using Mask-RCNN. The video will be streamed from our folder, and we will attempt to detect objects within it. We will draw colored masks for the detected objects, along with their labels and confidence scores.
Considerations for Object Detection Models
While Mask-RCNN offers high accuracy and a wide class list, it is slow when processing images on low-power CPU-based computers. On the other hand, MobileNet-SSD is faster but less accurate and has a smaller number of classes. To achieve a balance between speed and accuracy, we will explore Object Detection and Recognition using the YOLO pre-trained model. A detailed overview of the YOLO model will be provided in the upcoming session, followed by the implementation of YOLO object detection from a single image.
Application of YOLO Model
Building upon this, we will apply the YOLO model for object detection in real-time webcam videos to evaluate its performance. Subsequently, we will utilize it for object recognition in pre-saved video files.
Enhancing Speed with Tiny YOLO
To further enhance the speed of processed frames, we will employ Tiny YOLO, a lightweight version of the YOLO model. Initially, we will use Tiny YOLO for the pre-saved video and analyze its accuracy and speed. Following this, we will apply the same methodology to a real-time video from the webcam to compare its performance to the standard YOLO model.
These topics constitute the current content of this course. Upon completion of the course, participants will receive a certificate, adding value to their portfolio.
That concludes the overview for now. We look forward to seeing you in the classroom. Happy learning and have a great time!
Who this course is for
Beginners or who wants to start with Python based OCR, Image Recognition and Object Recognition.
- Awesome course to understand basic of OCR AND DETECTION OF IMAGE, I will suggest to follow this course ~ Jaswant A
- Every topic was very well explained audio-visually with relevant details. I would highly recommend this course to all those who are thinking of object detection & work on Yolo modal. Thank you Mr Abhilash Sir for this exceptionally informative course which includes a lot of knowledge regarding object detection ~ Pramod G
- Good and Clear Course with Professional AI and Computer Vision Instructor! ~ E Shapira
- Yes, I am satisfied, the course accomplished all my expectations ~ Federico M
- Good understanding of concepts and great and calm instructor ~ Ashish V
What you'll learn
Optical Character Recognition with Tesseract Library, Image Recognition using Keras, Object Recognition using MobileNet SSD, Mask R-CNN, YOLO, Tiny YOLO.
A decent configuration computer (preferably Windows) and an enthusiasm to dive into the world of OCR, Image and Object Recognition using Python.