LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

ShanghaiTech University
Figure 1. We propose a novel single-LiDAR-based approach for 3D HPS in large-scale scenarios, which is not limited to fixed studios, light conditions, and wearable devices. Our method predicts full human SMPL parameters(pose, shape, translation) from consecutive LiDAR point clouds and performs well for challenging poses and occlusion situations.

Abstract

For human-centric large-scale scenes, fine-grained modeling for 3D human global pose and shape is significant for scene understanding and can benefit many real-world applications. In this paper, we present LiveHPS, a novel single-LiDAR-based approach for scene-level Human Pose and Shape estimation without any limitation of light conditions and wearable devices. In particular, we design a distillation mechanism to mitigate the distribution-varying effect of LiDAR point clouds and exploit the temporal-spatial geometric and dynamic information existing in consecutive frames to solve the occlusion and noise disturbance. LiveHPS, with its efficient configuration and high-quality output, is well-suited for real-world applications. Moreover, we propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses, shapes and translations. It consists of multi-modal and multi-view acquisition data from calibrated and synchronized LiDARs, cameras, and IMUs. Extensive experiments on our new dataset and other public datasets demonstrate the SOTA performance and robustness of our approach.

Method

The pipeline of LiveHPS. With sequential LiDAR point clouds as input, LiveHPS consists of three critical modules to obtain human SMPL parameters, including a point-based body tracker to distill the pose-prior information, a consecutive pose optimizer to refine the pose via utilizing joint-wise features, and a multi-head SMPL solver to regress parameters of human models.

Dataset

The capture systems of FreeMotion. In (a), we use a dense-camera capture system with LiDARs for accurate pose and shape capture. In (b), we set LiDARs and cameras at three views to capture human motions.

Dataset Struture.

Dataset file structure

FreeMotion_Indoor
|——LiDAR_info
|   |——FM_Indoor_train.pkl
|   |   |——pc_x(Point cloud data in view x)
|   |   |——T_x(Ground truth of translation in view x)
|   |   |——shape(Ground truth of shape)
|   |   |——gt(Ground truth of SMPL local pose)
|   |   |——motion_id
|   |——FM_Indoor_test.pkl
|   |——...
|——Camera_info
|   |——camera18_train.pkl
|   |   |——shape(Ground truth of shape)
|   |   |——body_pose(Ground truth of body pose)
|   |   |——transl_cam(Ground truth of transl)
|   |   |——K(Camera calibration matrix)
|   |   |——root_pose_cam(Ground truth of global rotation)
|   |   |——motion_id
|   |   |——images
|   |   |——bbox
|   |   |——kp2d
|   |——camera18_test.pkl
|   |——...
|── images.tar.gz(Image data)
|── livehps.t7(Pretrained Model)

Specification

  1. Point clouds are stored in LiDAR_info. Use np.load(file_path, allow_pickle=True) to load the file.
  2. We provide point cloud data from three perspectives at the same time.
  3. The all data is 10 fps.

Quantitative comparisons

The results of LiveHPS on real-time-captured scenes

BibTeX

@inproceedings{ren2024livehps,
        title={LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment},
        author={Ren, Yiming and Han, Xiao and Zhao, Chengfeng and Wang, Jingya and Xu, Lan and Yu, Jingyi and Ma, Yuexin},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        pages={1281--1291},
        year={2024}
      }