Visual Tracking API

The Visual Tracking API is used as an intermediary between Auterion Mission Control and a computer vision app that is running the tracking logic on the video stream and producing a tracking result

Related C++ Header in the Auterion SDK: <auterion_sdk/tracking/tracking_interface.hpp>

The Auterion SDK offers a programming interface for subscribing to incoming tracking requests issued by the user and camera images. In return it facilitates sending out tracking updates in the form of "tracking results" and optionally modified camera frames which can be used to draw annotations into the video feed displayed to the user. The Tracking Interface does not provide a tracking algorithm itself.

Inputs of the Tracking Interface:

  • Live camera feed as a series of images

  • User actions in Auterion Mission Control, such as the "track" action with pixel coordinates

Outputs of the Tracking Interface:

  • Tracking result, which includes the center coordinates as well as a bounding box of the tracked object

  • Image frames that are visualized in Auterion Mission Control. These frames may be altered by the app performing the tracking to visualize certain results or draw an information overlay.

Tracking Interface API Documentation

To instantiate the Tracking Interface in Auterion SDK use the following C++ class:

auterion::TrackingInterface tracking_interface(sdk);

The constructor of TrackingInterface requires an instance of the Auterion SDK as an argument.

The screenshot above illustrates a simulated multicopter actively tracking a tree in a virtual 3D environment. The Tracking Interface is used inside of an AuterionOS app (see App Framework) on the simulated drone; the Tracking Interface enables the app to receive tracking requests from AMC, e.g. object selection using the blue "Track" tool in the sidebar on the left, and images from the camera. The app uses the interface to provide a visual feedback to AMC, including the location of the tracked object in the frame (green crosshair and rectangle).

Subscribing to Tracking Requests

The TrackingInterface provides several subscription methods to register callbacks for the following tracking requests:

  • subscribeTrackingRectangle(callback): Register a callback function that will be called with the center and size of the tracking rectangle in pixels.

  • subscribeTrackingPoint(callback): Register a callback function that will be called with the tracking point in pixels.

  • subscribeTrackingZoomChange(callback): Register a callback function that will be called with the change of zoom level.

  • subscribeTrackingOff(callback): Register a callback function that will be called when tracking is disabled.

The expected callback signatures for the subscription methods are illustrated here:

Tracking Request: Rectangular Area Tracking

tracking_interface.subscribeTrackingRectangle([this](const cv::Point& center, const cv::Size& size) {
    std::cout << "Tracking rectangle: Center(" << center.x << ", " << center.y << "), Size(" 
              << size.width << "x" << size.height << ")" << std::endl;

Tracking Request: Point Tracking

tracking_interface.subscribeTrackingPoint([this](const cv::Point& point) {
    std::cout << "Tracking point at: (" << point.x << ", " << point.y << ")" << std::endl;

Tracking Request: Zoom Change Events

tracking_interface.subscribeTrackingZoomChange([this](const int change) {
    std::cout << "Zoom level changed by: " << change << std::endl;

Tracking Request: Tracking Off Events

tracking_interface.subscribeTrackingOff([this]() {
    std::cout << "Tracking has been turned off." << std::endl;

Subscribing to Camera Data

Similarly, one can register callbacks to incoming camera images and camera information containing calibration data.

Camera Images

tracking_interface.subscribeImage([this](const auterion::Image& image_in) {
    const cv::Mat image_cv = image_in.asOpenCVMat();
    std::cout << "Received image with dimensions: " << image_cv.cols << "x" << image_cv.rows << std::endl;

Camera Information

tracking_interface.subscribeCameraInfo([this](const sensor_msgs::msg::CameraInfo::SharedPtr& camera_info) {
    std::cout << "Width: " << camera_info->width
              << ", Height: " << camera_info->height << std::endl;
    std::cout << "Camera Matrix (K): [" 
              << camera_info->k[0] << ", " << camera_info->k[1] << ", ...]" << std::endl;
    if (!camera_info->d.empty()) {
            std::cout << "Distortion Coefficients: [" 
                      << camera_info->d[0] << ", " << camera_info->d[1] << ", ...]" << std::endl;

Publishing Tracking Updates

Once you are subscribed to an image stream and receive a tracking request, you can send the result of your tracking algorithm for every new image by populating a TrackingResult struct:

cv::Point object_center(320, 240);    // Center of object pixels coords
cv::Size object_size(50, 50);         // Size of object bounding box in pixels
float confidence = 0.85f;             // Tracking confidence

auterion::TrackingResult result(object_center, object_size, confidence);

Then send an update containing your tracking result via the Tracking Interface:


This will make the tracking result available to other apps.

Publishing Annotated Images

To publish annotated images, such as those containing the location of the tracked object within the frame, and share them with a ground control station, one can do the following:

// Retrieve incoming image encoding
auterion::ImageHeader::Encoding encoding = image_in.header().encoding;

// Convert to cv::Mat for drawing
const cv::Mat image_cv = image_in.asOpenCVMat();
cv::Mat image_out = image_cv.clone();
... // optionally draw onto image_out

// Publish annotated image
tracking_interface.updateImage(image_out, encoding);

Whether annotating the image or not, the vision tracking app should call updateImage as otherwise the ground station application won't receive image frames and the user will not be able to see a video stream.

Publishing Zoomed-In Images

In some cases, it may be useful to publish a zoomed-in or cropped version of the original image. To do this one must simply provide information about which region of the original image is displayed in the zoomed image as shown below:

// Define upper-left image quadrant
cv::Rect displayed_region(0, 0, image_cv.cols / 2, image_cv.rows / 2);

// Crop the image
cv::Mat cropped_image = image_cv(displayed_region);

// Resize the cropped image back to the original image size
cv::Mat image_out;
cv::resize(cropped_image, image_out, image_cv.size());

// Update using zoomed image
_tracking_interface.updateImage(image_out, encoding, displayed_region);

The output image given to updateImage should have consistent dimensions across consecutive frames. If the original image is cropped, it must be resized to match the original image dimensions.

Rendering the Tracking Result on the Ground Station

One can additionally provide the tracking result to updateImage to allow the result to be rendered by the ground station:

// Update using zoomed image, and share the tracking result with the GCS
_tracking_interface.updateImage(image_out, encoding, displayed_region, result);

The displayed_region argument is optional and can be set to std::nullopt if the displayed region corresponds to the full image.

If using Auterion Mission Control, sending the tracking result this way will directly display the bounding box of the tracked result in the video feed, as illustrated in the image below.

Last updated