Towards Practical Human Motion Prediction with LiDAR Point Clouds

Xiao Han1, Yiming Ren1, Yichen Yao1, Yujing Sun2, Yuexin Ma1,

1ShanghaiTech University, 2The University of Hong Kong

Visualization of our motion prediction performance. The top figure demonstrates the comparison of our LiDAR-HMP and two-stage method on LIPD test set. The bottom figure highlights LiDAR-HMP's practicality in real-world deployment, unfettered by lighting conditions, where markers 1, 2, and 3 indicate the current moment and predicted poses for the future 0.4s and 1.0s, respectively. With online captured LiDAR point cloud, our method achieves real-time promising prediction results, which is significant for real-world applications.

Demo video of LiDAR-HMP

Abstract

Human motion prediction is crucial for human-centric multimedia understanding and interacting. Current methods typically rely on ground truth human poses as observed input, which is not practical for real-world scenarios where only raw visual sensor data is available. To implement these methods in practice, a pre-phrase of pose estimation is essential. However, such two-stage approaches often lead to performance degradation due to the accumulation of errors. Moreover, reducing raw visual data to sparse keypoint representations significantly diminishes the density of information, resulting in the loss of fine-grained features. In this paper, we propose LiDAR-HMP, the first single-LiDAR-based 3D human motion prediction approach, which receives the raw LiDAR point cloud as input and forecasts future 3D human poses directly. Building upon our novel structure-aware body feature descriptor, LiDAR-HMP adaptively maps the observed motion manifold to future poses and effectively models the spatial-temporal correlations of human motions for further refinement of prediction results. Extensive experiments show that our method achieves state-of-the-art performance on two public benchmarks and demonstrates remarkable robustness and efficacy in real-world deployments.

Method

The pipeline of our LiDAR-HMP. First, we obtain the structure-aware body feature descriptor from the observed LiDAR point cloud frames. Then, we adaptively predict the human motion with learnable queries for initial predictions and explicitly model the spatial-temporal correlations among them to refine the predicted motions. Finally, we decode the joint-wise results and point-wise results for auxiliary supervision.

Quantitative comparisons on Short-term Motion Prediction

Quantitative comparisons on Long-term Motion Prediction

BibTeX

@article{han2024towards,
        title={Towards Practical Human Motion Prediction with LiDAR Point Clouds},
        author={Han, Xiao and Ren, Yiming and Yao, Yichen and Sun, Yujing and Ma, Yuexin},
        journal={arXiv preprint arXiv:2408.08202},
        year={2024}
      }