DAY 75-100 DAYS MLCODE: YOLO Object Detection

My Tech World

DAY 75-100 DAYS MLCODE: YOLO Object Detection

January 24, 2019 100-Days-Of-ML-Code blog 2

In the pervious few blogs, we discussed the Object detection using ImageAI library or TensorFlow Object detection library, in this blog, we’ll discuss YOLO object detection.

We’ll use the YOLO object detector to detect the objects in the Image.

Object Detection:

Mainly there are three famous object detection technique:

  • R-CNN and their variants, including the original R-CNN, Fast R- CNN, and Faster R-CNN
  • SSDs (Single Shot Detector )
  • YOLO ( You only look once )

R-CNN

R-CNN is one of the first Deep learning based object detection technique and paper Rich feature hierarchies for accurate object detection and semantic segmentation first appeared in 2013. This was a two-stage detector, the first part was responsible to draw the bounding boxes using selective search and 2nd part was responsible to classify the boxes.

But R-CNN technique was very slow and a new paper 2015, entitled Fast R- CNN appeared. Fast R-CNN has better accuracy and fast but it was still using the external selective search for boundary boxes.

later a follow-up paper  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks appeared. This time there was no Selective Search requirement for bounding boxes and a new Region Proposal Network (RPN) that is fully convolutional and can predict the object bounding boxes and “objectness” scores (i.e., a score quantifying how likely it is a region of an image may contain an image) was used. The outputs of the RPNs are then passed into the R-CNN component for final classification and labeling.

R-CNN was now more accurate but R-CNN family of networks is not as faster as some of the new technique— they were incredibly slow, obtaining only 5 FPS on a GPU.

One-Stage detector

R-CNN family was great in detecting the object and their class but it was slow. To help to improve the speed of deep learning-based object detectors, both Single Shot Detectors (SSDs) and YOLO use a one-stage detector strategy.

In this type of detector, Object detection is treated as a regression problem. Detector takes an input image and simultaneously learning bounding box coordinates and corresponding class label probabilities. These type of detector is fast but less accurate compared to two-stage detector like R-CNN

YOLO:

You Only Look Once: Unified, Real-Time Object Detection was first published in 2015 and had the speed of about 45 FPS on GPU compare to 5 FPS of R-CNN. YOLO has different veresions over the years, including YOLO9000: Better, Faster, Stronger (i.e., YOLOv2), capable of detecting over 9,000 object detectors and YOLOv3: An Incremental Improvement(2018). YOLOv3 is larger than previous models and better also.

YOLO Architecture:

YOLO detection network has 24 convolutional layers followed by 2 fully connected layers. Alternating 1 × 1 convolutional layers reduce the features space from preceding layers. A pre-train the convolutional layers on the ImageNet classification task at half the resolution (224 × 224 input image) and then double the resolution for detection.

YOLO Architecture ( Source YOLO Paper)

Now let’s try to develop a small program to detect the image using YOLO.

First, let’s download the pre-trained YOLO V3 model from  Darknet team website. Download the YOLOv3-416 weight and config file and download the COCO dataset names from using this link.

!wget “https://pjreddie.com/media/files/yolov3.weights”

Now we have three files, yolo3.cfg, yolov3.weights and coco.names. Load the COCO class labels which we have downlaoded

# load the COCO class labels our YOLO model was trained on
labelsPath = os.path.sep.join([os.getcwd(),”coco.names”])
LABELS = open(labelsPath).read().strip().split(“\n”)

# initialize a list of colors to represent each possible class label
np.random.seed(42)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),
dtype=”uint8″)

Now load the pre-trained models:

# derive the paths to the YOLO weights and model configuration
weightsPath = os.path.sep.join([os.getcwd(), “yolov3.weights”])
configPath = os.path.sep.join([os.getcwd(), “yolov3.cfg”])

# load our YOLO object detector trained on COCO dataset (80 classes)
print(“[INFO] loading YOLO from disk…”)
detector = cv2.dnn.readNetFromDarknet(configPath, weightsPath)

Now its time to read the output layer of the Detector and perform a forward pass:

# load our input image and grab its spatial dimensions
img = cv2.imread(“image1.jpg”)
(Height, Weight) = img.shape[:2]

# determine only the *output* layer names that we need from YOLO
ln = detector.getLayerNames()
ln = [ln[i[0] – 1] for i in detector.getUnconnectedOutLayers()]

# construct a blob from the input image and then perform a forward
# pass of the YOLO object detector, giving us our bounding boxes and
# associated probabilities
blob = cv2.dnn.blobFromImage(img, 1 / 255.0, (416, 416),
swapRB=True, crop=False)
detector.setInput(blob)
start = time.time()
layerOutputs = detector.forward(ln)
end = time.time()

# show timing information on YOLO
print(“[INFO] YOLO took {:.6f} seconds”.format(end – start))

Output of the above code is: [INFO] YOLO took 1.523733 seconds

Below code will help to remove the unwanted class which are detected and scale bounding box coordinates so we can display them properly on our original image.

# loop over each of the layer outputs
for output in layerOutputs:
# loop over each of the detections
for detection in output:
# extract the class ID and confidence (i.e., probability) of
# the current object detection
scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]

# filter out weak predictions by ensuring the detected
# probability is greater than the minimum probability
if confidence > confThreshold:
# scale the bounding box coordinates back relative to the
# size of the image, keeping in mind that YOLO actually
# returns the center (x, y)-coordinates of the bounding
# box followed by the boxes’ width and height
box = detection[0:4] * np.array([Weight, Height, Weight, Height])
(centerX, centerY, width, height) = box.astype(“int”)

# use the center (x, y)-coordinates to derive the top and
# and left corner of the bounding box
x = int(centerX – (width / 2))
y = int(centerY – (height / 2))

# update our list of bounding box coordinates, confidences,
# and class IDs
boxes.append([x, y, int(width), int(height)])
confidences.append(float(confidence))
classIDs.append(classID)

Now Applying non-maxima suppression, this will suppress significantly overlapping bounding boxes, keeping only the most confident ones.

# apply non-maxima suppression to suppress weak, overlapping bounding
# boxes
idxs = cv2.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold)

Now draw the Boxes and print the class prediction on input image.

# ensure at least one detection exists
if len(idxs) > 0:
# loop over the indexes we are keeping
for i in idxs.flatten():
# extract the bounding box coordinates
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])

# draw a bounding box rectangle and label on the image
color = [int(c) for c in COLORS[classIDs[i]]]
cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
text = “{}: {:.4f}”.format(LABELS[classIDs[i]], confidences[i])
cv2.putText(img, text, (x, y – 5), cv2.FONT_HERSHEY_SIMPLEX,
0.5, color, 2)

# show the output image
plt.imshow(img)
plt.axis(“off”)

Output:

Output of Object Detector
Output of Object detector

In conclusion, Yolo3 is the fasted algorithm for Object detection while R-CNN is the most accurate. You can find the entire code here.

 

2 Responses

  1. Satyaki Mukherjee says:

    In the google colab Day 75.ipnyb while loading the YOLO object detector trained on COCO data set, I am getting an error

    Here is the code snippet

    # load our YOLO object detector trained on COCO dataset (80 classes)
    print(“[INFO] loading YOLO from disk…”)
    detector = cv2.dnn.readNetFromDarknet(‘/content/yolov3 (1).cfg’, ‘/content/yolov3.weights’)
    #detector = cv2.dnn.readNetFromDarknet(configPath, weightsPath)

    Here is the output

    [INFO] loading YOLO from disk…
    —————————————————————————
    error Traceback (most recent call last)
    in ()
    1 print(“[INFO] loading YOLO from disk…”)
    —-> 2 detector = cv2.dnn.readNetFromDarknet(‘/content/yolov3 (1).cfg’, ‘/content/yolov3.weights’)
    3 #detector = cv2.dnn.readNetFromDarknet(configPath, weightsPath)

    error: OpenCV(3.4.3) /io/opencv/modules/dnn/src/darknet/darknet_importer.cpp:207: error: (-212:Parsing error) Failed to parse NetParameter file: /content/yolov3 (1).cfg in function ‘readNetFromDarknet’

    Please help me with this issue

    • Hi Satyaki,

      This error is saying that you do not have yolov3 (1).cfg. When I run the first time, this file was downloaded as yolov3.cfg. But during testing time, I run one more time which results in a new file yolov3 (1).cfg.
      I just changed into the original name and its working fine. I added code to download the image directly from the internet as image1.jpg. If you want to use your local image, please upload using the cell which is there to load the file from the local system. Hope this will help you. Thanks.
      Best Regards,
      Pavan

Comments are closed.