DAY 80-100 DAYS MLCODE: Live Object Detection and Segmentation

In the previous blog, we discussed Object detection and segmentation using Mask R-CNN for video, in this blog, we’ll try to implement Object Detection and Segmentation in Live video feed using Mask R-CNN.

Like previous blog, download the required stuff :

!git clone https://github.com/waleedka/coco
!pip install -U setuptools
!pip install -U wheel
!make install -C coco/PythonAPI

Now time to clone the Mask_RCNN repo from GitHub .

!git clone https://github.com/matterport/Mask_RCNN

Change to the directry ./Mask_RCNN and download the weights from the github

!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5

Configurations

We’ll be using a model trained on the MS-COCO dataset. The configurations of this model are in the CocoConfig class in coco.py.

For inferencing, modify the configurations a bit to fit the task. To do so, sub-class the CocoConfig class and override the attributes you need to change.

class InferenceConfig(coco.CocoConfig):
# Set batch size to 1 since we’ll be running inference on
# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1

config = InferenceConfig()
config.display()

Output of the above cell will look like below:
Configurations:
BACKBONE resnet101
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 5
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 1
IMAGE_CHANNEL_COUNT 3
IMAGE_MAX_DIM 1024
IMAGE_META_SIZE 93
IMAGE_MIN_DIM 800
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {‘rpn_class_loss’: 1.0, ‘rpn_bbox_loss’: 1.0, ‘mrcnn_class_loss’: 1.0, ‘mrcnn_bbox_loss’: 1.0, ‘mrcnn_mask_loss’: 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME coco
NUM_CLASSES 81
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
PRE_NMS_LIMIT 6000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 1000
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001

Class Names

Download the labels.txt and store in the class_name. Or we can we download the COCO Data like below

!wget “https://raw.githubusercontent.com/nightrome/cocostuff/master/labels.txt”

# load the COCO class labels our YOLO model was trained on
labelsPath = os.path.sep.join([os.getcwd(),”labels.txt”])
LABELS = open(labelsPath).read().strip(‘:’).split(“\n”)

Create a class_name list which we’ll pass to the model for predictions.

class_name = []
for data in LABELS:
try:
head, tail = data.split(“:”)
class_name.append(tail.strip())
except Exception as e:
print(f”Error for : {data}”)

Initialize the variables before starting reading the image from webcam:

execution_path = os.getcwd()
cap=cv2.VideoCapture(0) # 0 stands for very first webcam attach
filename= os.path.join(execution_path, ‘output.avi’) #[place were i stored my output file]
codec=cv2.VideoWriter_fourcc(‘M’,’J’,’P’,’G’)#fourcc stands for four character code
framerate=10
resolution=(640,480)
# Default resolution of the frame is obtained.The default resolution is system dependent.
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))

VideoFileOutput=cv2.VideoWriter(filename,cv2.VideoWriter_fourcc(‘M’,’J’,’P’,’G’), 10, (frame_width,frame_height))
frames = []

Create Model and Load Trained Weights

# Create model object in inference mode.
model = modellib.MaskRCNN(mode=”inference”, model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)

Prediction – Object detection and segmentation

Now read each frame and pass to the model to predict the Object and run the object detection algoritham:

while True:
# read the next frame from the file
(found, img) = cap.read()

# if the frame was not grabbed, then we have reached the end
# of the stream
if not found:
break

frames.append(img)
predict = model.detect(frames, verbose=0)
if len(frames) != 1:
print(“Error in framges”)
print(f”Frames size is : {len(frames)}”)
for i, item in enumerate(zip(frames, predict)):
frame = item[0]
r = item[1]
frame = display_instances(
frame, r[‘rois’], r[‘masks’], r[‘class_ids’], class_name, r[‘scores’]
)

# check if the video writer is None
# write the output frame to disk

VideoFileOutput.write(frame)
cv2.imshow(‘live_detection’,frame)
frames = []
if cv2.waitKey(25) & 0xFF==ord(‘q’):
break
# release the file pointers
# Clear the frames array to start the next batch

print(“[INFO] cleaning up…”)
cv2.destroyAllWindows()
cap.release()
VideoFileOutput.release()

Output

Output of the wbecam was like below attached file, this was slow as I was running on my local CPU instead of GPU.

Live Object detection and segmentation

In conclusion, Object detection and segmentation using Mask R-CNN was working as expected. You can find entire code here. Please note that I was running this code on my local PC as this need to access my webcam.

#100DaysofMLCode #Mask R-CNN #Object Segmentation #ObjectDetection

DAY 80-100 DAYS MLCODE: Live Object Detection and Segmentation