## Overview

We furthermore provide with the data also a benchmark suite covering different aspects of semantic scene understanding at different levels of granularity. To ensure unbiased evaluation of these tasks, we follow the common best practice to use a server-side evaluation of the test set results, which enables us to keep the test set labels private.

Test set evaluation is performed using CodaLab competitions. For each task, we setup a competition handling the submissions and scoring them using the non-public labels for the test set sequences. See the individual competition websites for further details on the participation process. Here, we will only provide a short task description and the leaderboards.

Important: We will not approve accounts on CodaLab with email addresses from free email providers, e.g., gmail.com, qq.com, web.de, etc. Only university or company email addresses will get access. If you need to use freemail accounts, contact us.

## Semantic Segmentation

See our competition website for more information on the competition and submission process.

Tasks. In semantic segmentation of point clouds, we want to infer the label of each three-dimensional point. Therefore, the input to all evaluated methods is a list of coordinates of the three-dimensional points along with their remission, i.e., the strength of the reflected laser beam which depends on the properties of the surface that was hit. Each method should then output a label for each point of a scan, i.e., one full turn of the rotating LiDAR sensor.

We evaluate two settings for this tasks: methods using a single scan as input and methods using multiple scans as input. With single scan evaluation, we don't distinguish between moving and non-moving objects, i.e., moving and non-moving are mapped to a single class. With multiple scan evaluation, we distinguish between moving and non-moving, which makes the task harder, since the method has to decide if something is dynamic.

Metric. We use mean Jaccard or so-called intersection-over-union (mIoU) over all classes, i.e., $$\frac{1}{C} \sum^C_{c=1} \frac{TP_c}{TP_c + FP_c + FN_c},$$ where $TP_c$ , $FP_c$ , and $FN_c$ correspond to the number of true positive, false positive, and false negative predictions for class $c$, and $C$ is the number of classes.

Leaderboard. Following leaderboard contains only published approaches, where we at least can provide an arXiv link. (Last updated: .)

To avoid confusion between the numbers reported in the paper and post-publication results, we report here the numbers from the paper. Please contact us if we missed an updated version with different numbers.

## Panoptic Segmentation

See our competition website for more information on the competition and submission process.

Tasks. In panoptic segmentation of point clouds, we want to infer the label of each three-dimensional point and the instance of so-called thing classes. Therefore, the input to all evaluated methods is a list of coordinates of the three-dimensional points along with their remission, i.e., the strength of the reflected laser beam which depends on the properties of the surface that was hit. Each method should then output a label for each point of a scan, i.e., one full turn of the rotating LiDAR sensor.

Metric. We use the panoptic quality (PQ) proposed by Kirillov et al. defined by $$\frac{1}{C} \sum^C_{c=1} \frac{ \sum_{(\mathcal{S}, \hat{\mathcal{S}}) \in \text{TP}_c} \text{IoU}(\mathcal{S}, \hat{\mathcal{S}})}{|\text{TP}_c| + \frac{1}{2}|\text{FP}_c| + \frac{1}{2}|\text{FN}_c|}$$ where $TP_c$ , $FP_c$ , and $FN_c$ correspond to the set of true positive, false positive, and false negative matches for class $c$, and $C$ is the number of classes. A match between segments is a true positive if their IoU (intersection-over-union) is larger than 0.5. To account for segments of stuff classes that have multiple connected components, Porzi et al. proposed a modified metric PQ$^\dagger$ that uses just the IoU for stuff classes without distinguishing between different segments.

Leaderboard. Following leaderboard contains only published approaches, where we at least can provide an arXiv link. (Last updated: .)

## 4D Panoptic Segmentation

See our competition website for more information on the competition and submission process.

Tasks. In 4D panoptic segmentation of point cloud sequences, one has to provide instance IDs and semantic labels for each point of the test sequences 11-21. The instance ID needs to be uniquely assigned to each instance in both, space and time. Therefore, the input to all evaluated methods is a list of coordinates of the three-dimensional points along with their remission, i.e., the strength of the reflected laser beam which depends on the properties of the surface that was hit. Each method should then output a label and instance for each point in a sequence of scans. The class-wise instance ID does only count for so-called thing classes that are "instantiable" and will be ignored for the stuff classes.

Metric. We use the LiDAR Segmentation and Tracking Quality (LSTQ) metric proposed by Aygün et al., which generalizes panoptic quality to the tracking scenario. The metric decouples the semantic quality and the association quality of the tracks. Please see the paper (Link) for more information.
Important: In this metric, instances have to have unique instance ids, since we account for changes in the class in the evaluation of the association quality. Therefore, the same instance ids of different classes are treated as a single (tracked) object in the metric!
As the classes other-structure and other-object have either only a few points and are otherwise too diverse with a high intra-class variation, we decided to not include these classes in the evaluation. Thus, we use 25 instead of 28 classes, ignoring outlier, other-structure, and other-object during training and inference.

@inproceedings{ayguen2021cvpr,
author = {M. Ayg\"un and A. Osep and M. Weber and M. Maximov and C. Stachniss and J. Behley and L. Leal-Taixe},
title = {{4D Panoptic Segmentation}},
booktitle = {In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = 2021
}

Leaderboard. Following leaderboard contains only published approaches, where we at least can provide an arXiv link. (Last updated: .)

## Moving Object Segmentation

See our competition website for more information on the competition and submission process.

Tasks. In moving object segmentation of point cloud sequences, one has to provide motion labels for each point of the test sequences 11-21. Therefore, the input to all evaluated methods is a list of coordinates of the three-dimensional points along with their remission, i.e., the strength of the reflected laser beam which depends on the properties of the surface that was hit. Each method should then output a label for each point of a scan, i.e., one full turn of the rotating LiDAR sensor. Here, we only distinguish between static and moving object classes.

Metric. To assess the labeling performance, we rely on the commonly applied Jaccard Index or intersection-over-union (mIoU) metric over moving and non-moving parts of the environment. We map all moving-x classes of the original SemanticKITTI semantic segmentation benchmark to a single moving object class.

@article{chen2021ral,
title={{Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data}},
author={X. Chen and S. Li and B. Mersch and L. Wiesmann and J. Gall and J. Behley and C. Stachniss},
year={2021},
journal={IEEE Robotics and Automation Letters(RA-L)},
doi = {10.1109/LRA.2021.3093567}
}

Leaderboard. Following leaderboard contains only published approaches, where we at least can provide an arXiv link. (Last updated: .)

## Semantic Scene Completion

See our competition website for more information on the competition and submission process.

Task. In semantic scene completion, we are interested in predicting the complete scene inside a certain volume from a single initial scan. More specifically, we use as input a voxel grid, where each voxel is marked as empty or occupied, depending on whether or not it contains a laser measurement. For semantic scene completion, one needs to predict whether a voxel is occupied and its semantic label in the completed scene.

Metric. For evaluation, we follow the evaluation protocol of Song et al. and compute the Intersection-over-Union (IoU) for the task of scene completion, which only classifies a voxel as being occupied or empty, i.e., ignoring the semantic label, as well as mIoU for the task of semantic scene completion over the same 19 classes that were used for the single scan semantic segmentation task.

Leaderboard. Following leaderboard contains only published approaches, where we at least can provide an arXiv link. (Last updated: .)