R3ST

Abstract

Datasets are essential to train and evaluate computer vision models used for traffic analysis and to enhance road safety. Existing real datasets fit real-world scenarios, capturing authentic road object behaviors, however, they typically lack precise ground-truth annotations. In contrast, synthetic datasets play a crucial role, allowing for the annotation of a large number of frames without additional costs or extra time. However, a general drawback of synthetic datasets is the lack of realistic vehicle motion, since trajectories are generated using AI models or rule-based systems. In this work, we introduce R3ST (Realistic 3D Synthetic Trajectories), a synthetic dataset that overcomes this limitation by generating a synthetic 3D environment and integrating real-world trajectories derived from SinD, a bird's-eye-view dataset recorded from drone footage. The proposed dataset closes the gap between synthetic data and realistic trajectories, advancing the research in trajectory forecasting of road vehicles, offering both accurate multimodal ground-truth annotations and authentic human-driven vehicle trajectories.

Real Trajectories in a synthetic environment

R3ST has been generated by rendering the virtual intersections created with Blender. Unlike typical synthetic datasets where vehicle motion is dictated by AI-driven or rule-based algorithms, R3ST incorporates real-world vehicle trajectories derived from two of the four scenarios proposed by SinD, a bird’s-eye-view dataset with precise annotation of vehicle positions extracted from real drone footage.

In the figure, a visualization of clustered vehicle trajectories in a R3ST crossroad. Each colored trajectory represents a cluster of similar paths, while the shaded regions indicate variance within each cluster.

Multimodal Annotations

To enhance the usability of R3ST we leveraged Vision Blender to compute additional multimodal annotations that can be used in a range of computer-vision applications. Additionally, we directly derive the 3D bounding box of each object in the scene from the Blender World Environment and project them on the image plane to get the 2D bounding boxes.

Qualitative results for instance segmentation (top two rows) and monocular depth estimation (bottom two rows). The first column shows RGB frames, the second column contains ground truth annotations, while the third and fourth columns present model predictions. Instance segmentation results are obtained using YOLO-Seg and SAM2 online demo, while monocular depth estimation is performed with AnyDepth and Pixelformer Large pre-trained on KITTI.

BibTeX

@InProceedings{10.1007/978-3-032-05060-1_30,
              author="Teglia, Simone
              and Melis Tonti, Claudia
              and Pro, Francesco
              and Russo, Leonardo
              and Alfarano, Andrea
              and Pentassuglia, Matteo
              and Amerini, Irene",
              editor="Castrill{\'o}n-Santana, Modesto
              and Travieso-Gonz{\'a}lez, Carlos M.
              and Deniz Suarez, Oscar
              and Freire-Obreg{\'o}n, David
              and Hern{\'a}ndez-Sosa, Daniel
              and Lorenzo-Navarro, Javier
              and Santana, Oliverio J.",
              title="R3ST: A Synthetic 3D Dataset with Realistic Trajectories",
              booktitle="Computer Analysis of Images and Patterns",
              year="2026",
              publisher="Springer Nature Switzerland",
              address="Cham",
              pages="351--360",
              abstract="Datasets are essential to train and evaluate computer vision models used for traffic analysis and to enhance road safety. Existing real datasets fit real-world scenarios, capturing authentic road object behaviors, however, they typically lack precise ground-truth annotations. In contrast, synthetic datasets play a crucial role, allowing for the annotation of a large number of frames without additional costs or extra time. However, a general drawback of synthetic datasets is the lack of realistic vehicle motion, since trajectories are generated using AI models or rule-based systems. In this work, we introduce R3ST (Realistic 3D Synthetic Trajectories), a synthetic dataset that overcomes this limitation by generating a synthetic 3D environment and integrating real-world trajectories derived from SinD, a bird's-eye-view dataset recorded from drone footage. The proposed dataset closes the gap between synthetic data and realistic trajectories, advancing the research in trajectory forecasting of road vehicles, offering both accurate multimodal ground-truth annotations and authentic human-driven vehicle trajectories. We publicly release our dataset here (https://r3st-website.vercel.app/).",
              isbn="978-3-032-05060-1"
              }