Using MediaPipe in TouchDesigner

MediaPipe is a powerful library for performing computer vision tasks such as pose detection, hand tracking, and more. In this post, we will explore how to integrate MediaPipe with TouchDesigner for real-time pose detection and segmentation.

1. Installing Python

macOS

Download Python

Go to the official Python website: https://www.python.org/downloads/.
Download the latest macOS installer (for example, python-3.x-macosx10.9.pkg).

Run the Installer

Double-click the .pkg file.
Follow the on-screen prompts and accept the defaults unless you have specific needs.
Python typically installs to a location like /Library/Frameworks/Python.framework/Versions/3.x/.

Add Python to Your System Path

Often, the Python installer adds a symbolic link to /usr/local/bin/python3. You can verify your installation by opening Terminal (Applications > Utilities > Terminal) and typing:

python3 --version

This should print the version of Python you installed.

Installing Python Packages on macOS

To make sure you install packages to the correct Python environment (the one TouchDesigner will use), run:

/usr/local/bin/python3 -m pip install --upgrade pip
/usr/local/bin/python3 -m pip install numpy opencv-python mediapipe

This ensures that your packages install into /usr/local/lib/python3.x/site-packages (or a similar folder).

Locating Site-Packages

After installing packages, you can check where your site-packages reside. Run:

/usr/local/bin/python3 -m site

The output lists possible site-packages paths. Use the one that matches the Python installation above when you configure TouchDesigner.

Windows

Download Python

Go to https://www.python.org/downloads/.
Download the latest Windows installer.

Run the Installer

Double-click the .exe file.
Tick “Add Python to PATH.”
Click “Install Now” and follow any prompts.

Verify Installation

Open Command Prompt (Win+R, type cmd, press Enter). Type:

python --version

This should print your installed Python version.

Install Required Packages

In Command Prompt, type:

pip install numpy opencv-python mediapipe

If you have multiple Python versions or if pip is not recognised, try:

python -m pip install numpy opencv-python mediapipe

This installs everything into your user’s Python site-packages folder (often located in C:\Users\YourUsername\AppData\Local\Programs\Python\Python3x\Lib\site-packages).

2. Connecting Python to TouchDesigner

Once Python is installed and you have confirmed that the required packages are installed, you need to tell TouchDesigner where it can find Python modules.

Open TouchDesigner Preferences

In TouchDesigner, go to Edit > Preferences > Python 64-bit Module Path.
Enter the path to your Python installation’s site-packages folder.

Examples:

macOS:
```
/usr/local/lib/python3.x/site-packages
```

Windows:

C:\Users\YourUsername\AppData\Local\Programs\Python\Python3x\Lib\site-packages

Restart TouchDesigner to ensure the path is recognised.

3. Essential Dependencies

Your Python environment needs several key libraries to support MediaPipe integration. If you have not already installed them, run:

pip install numpy opencv-python mediapipe

numpy: A library for numerical operations.
opencv-python: A Python binding for the OpenCV (Open Source Computer Vision) library.
mediapipe: A framework by Google for building multimodal (e.g., image and video) machine learning pipelines.

If you are on macOS and using a specific Python path, remember to specify the full path to python3 when you run pip install.

4. Implementation in TouchDesigner

Below is the full Python script that implements MediaPipe Pose detection with segmentation. Copy and paste this code into a TouchDesigner Script node.

import numpy as np
import cv2 as cv
import mediapipe as mp

# Initialise MediaPipe Pose with segmentation enabled
mp_pose = mp.solutions.pose
pose = mp_pose.Pose(
    static_image_mode=False,
    model_complexity=1,
    enable_segmentation=True,  # Enable segmentation
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5
)

def onSetupParameters(scriptOp):
    pass

def onCook(scriptOp):
    """
    1) Acquire webcam feed from 'null1'
    2) Run Pose detection with segmentation
    3) Output an RGBA image where the person is solid white, background transparent
    """
    try:
        # 1) Get input TOP
        inputTOP = op('null1')
        if not inputTOP:
            print("No TOP named 'null1' found.")
            return

        # 2) Fetch image data
        img = inputTOP.numpyArray()
        if img is None or img.size == 0:
            print("No valid image data from 'null1'.")
            return

        # Determine if the input image is already in the expected format
        if img.dtype != np.float32 and img.dtype != np.float64:
            img = img.astype(np.float32) / 255.0  # Normalise if necessary

        # Convert input to RGB uint8 for MediaPipe
        if len(img.shape) == 3:
            if img.shape[2] == 4:
                # RGBA -> RGB
                rgb_img = (img[:, :, :3] * 255).astype(np.uint8)
            elif img.shape[2] == 3:
                # Already RGB
                rgb_img = (img * 255).astype(np.uint8)
            else:
                print(f"Unexpected number of channels: {img.shape[2]}")
                return
        elif len(img.shape) == 2:
            # Greyscale -> RGB
            gray_8u = (img * 255).astype(np.uint8)
            rgb_img = cv.cvtColor(gray_8u, cv.COLOR_GRAY2RGB)
        else:
            print(f"Unexpected image shape: {img.shape}")
            return

        height, width = rgb_img.shape[:2]

        # 3) Run MediaPipe Pose (includes segmentation)
        results = pose.process(rgb_img)

        # Prepare an RGBA output: fully transparent
        out_img = np.zeros((height, width, 4), dtype=np.uint8)  # Use uint8 for compatibility

        # If segmentation_mask is available, fill the person region white
        if results.segmentation_mask is not None:
            seg_mask = results.segmentation_mask
            # seg_mask is normalised [0..1], ensure it's the same size as the input
            if seg_mask.shape != (height, width):
                seg_mask_resized = cv.resize(seg_mask, (width, height))
            else:
                seg_mask_resized = seg_mask

            # Choose a threshold (0.5) for binary mask
            mask = seg_mask_resized > 0.5

            # Set the pixels where mask == True to white with alpha=255
            out_img[mask] = [255, 255, 255, 255]

        # 4) Output the RGBA image
        scriptOp.copyNumpyArray(out_img)

    except Exception as e:
        print(f"Error in onCook: {e}")

def onPulse(par):
    pass

def onDestroy():
    # Release MediaPipe resources if necessary
    pose.close()

# If the environment supports it, register the onDestroy method
if hasattr(op, 'run'):
    op.run(onDestroy, delayFrames=0, priority=1000)

How the Script Works

Import Libraries: We import numpy, cv2, and mediapipe.
Initialise MediaPipe Pose: We create an instance of mp.solutions.pose.Pose with segmentation enabled.
Fetch Input Image: The script reads image data from the null1 operator.
Convert to Proper Format: It converts the input image to an RGB array (uint8), because MediaPipe needs an 8-bit RGB format.
Pose Detection and Segmentation: MediaPipe processes the frame and generates a segmentation mask.
Generate RGBA Output: We create an output image where pixels belonging to a detected person are white (255,255,255) with an alpha channel of 255. The rest of the image is transparent (alpha = 0).
Clean Up: We close the MediaPipe Pose instance in onDestroy().

5. Final Setup and Testing

Add a Video Device In

In TouchDesigner, add a Video Device In TOP. This node will capture your webcam feed.

Add a Null

Connect the Video Device In to a null node, and name the null node “null1”.

Create a Script Node

Add a Script TOP.
Go to File > Edit… to open the external editor and paste the Python script above.

Connect the Script Node

The script node can stand alone, because it fetches the image from null1 directly.

Run TouchDesigner

You should see the Script node output an RGBA image. Wherever the person is detected, the output should be solid white on top of a transparent background. You can tweak the detection threshold in mask = seg_mask_resized > 0.5 to refine your segmentation.