Mediapipe is open-source cloud platform from google offering different Deep Learning models for Computer Vision for different devices. In this tutorial, we will use mediapipe to get landmarks for person and plot on image. We will also get segmentation results and extract person using mediapipe tools. Install mediapipe using pip in command prompt/shell.

pip install mediapipe

After installation, we will use mediapipe models for pose estimation. Here is the list of points it returns from an image.

Image From Mediapipe:

Mediapipe offers 3 different models by complaxity where higher model takes more time to process image and also provide more accurate results. Mediapipe also provide very good segmentation results for a person image. So, lets try mediapipe and compare results for different options.

We will use opencv python to read images and input to mediapipe and write back to storage.

import cv2
import mediapipe as mp

Now we need to initialize a mediapipe pose estimation model and we will also use mediapipe drawing utils to easily draw points on image. Here we provide different options while creating a pose model object.

  • static_image_mode If we have only single image input, we set it to false. If there is a video input, we can set it to true so that it will try to track from previous frames and will increase performance.
  • model_complexity Could be either 0, 1 or 2 where 2 is higher model
  • enable_segmentation If set to true, it will also predict mask for given image
  • min_detection_confidence Detection results confidence, could be adjusted according to requirements and input image

First, we try with larger model and no segmentation output.

# utils for drawing on image
mp_drawing =
mp_drawing_styles =

# mediapipe pose model
mp_pose =

Now we read an image from directory using opencv and provide as input to mediapipe. Opencv reads image as BGR so we convert image to RGB format before input to model.

image = cv2.imread("ronaldo.jpg")
#convert image to RGB (just for input to model)
image_input = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# get results using mediapipe
results = pose.process(image_input)

Now, we can check if there are any results and draw results using mediapipe drawing utils.

if not results.pose_landmarks:
    print("no results found")

# write image to storage
cv2.imwrite("./ronaldo-processed.jpg", image)

It draws landmarks on image and write back to storage. 

My alt text
Now we can use different models for comparison and how they provide different results. Here is a comparison of other two models (0 and 1) respectively.

My alt text

We cal also enable segmentation by changing segmentation to true.

Pose Segmentation

Initialize model with segmentation to true and we again get results from model.

pose = mp_pose.Pose(
# get both mask and landmarks for input image
results = pose.process(image_input)

Now we can use numpy to apply segmentation mask on image.

import numpy as np

BG_COLOR = (192, 192, 192) # gray

# apply mask on image with gray
condition = np.stack((results.segmentation_mask,) * 3, axis=-1) > 0.1
bg_image = np.zeros(image.shape, dtype=np.uint8)
bg_image[:] = BG_COLOR
image = np.where(condition, image, bg_image)

My alt text

Now, if you also want to draw landmarks on image, you can use previous code to draw landmarks on this image.

For videos, this works same as defined, we just change static image mode to false for better performance. For more details on this, check mediapipe official documentation.