Human-centric scene understanding is significant for real-world applications, but it is extremely challenging due to the existence of diverse human poses and ac- tions, complex human-environment interactions, severe oc- clusions in crowds, etc. In this paper, we present a large- scale multi-modal dataset for human-centric scene under- standing, dubbed HuCenLife, which is collected in diverse daily-life scenarios with rich and fine-grained annotations. Our HuCenLife can benefit many 3D perception tasks, such as segmentation, detection, action recognition, etc., and we also provide benchmarks for these tasks to facili- tate related research. In addition, we design novel mod- ules for LiDAR-based segmentation and action recognition, which are more applicable for large-scale human-centric scenarios and achieve state-of-the-art performance.
Figure 3.The architecture of our segmentation method. Especially, the HHOI module extracts the correlation within different persons and the human-object relationships, which can benefit the point-wise and instance-wise classification.
Figure 5. Pipeline of our method for human-centric action recognition. We first utilize 3D detector to obtain a set of bounding boxes of persons. Then, for each person, we extract multi-resolution features and get a hierarchical fusion feature FHF . Next, we leverage the relationship with neighbors to enhance the ego-feature and obtain a comprehensive feature FIE for the final action classification.
HCL_Full |── 09-23-13-44-53-1 | |── bin (LiDAR) | | |── 1663912046.036171264.bin | | |── 1663912046.135965440.bin | | |── ... | |── imu_csv (IMU) | | |── 1663912046015329873.csv | | |── 1663912046025090565.csv | | |── ... | |── img_blur (Camera) | |── cam1 | | |── 1663912046.036171264.jpg | | |── 1663912046.135965440.jpg | | |── ... | |── cam2 | |── cam6 |── 09-23-13-44-53-2 |── ...
np.fromfile(file_path, dtype=np.float32).reshape(-1, 5)
to load the file. Columns 0-4 represent x, y, z, reflectivity, and timestamp (t) respectively.09-23-13-44-53-1.json { "data": "09-23-13-44-53-1",//corresponding data folder "frames_number": 44,//frame number "frame": [ { "frameId": 0,//frame id "timestamp": 1663912047.0359857,//timestamp "pc_name": "09-23-13-44-53-1/bin/1663912047.035985664.bin",//point cloud path "instance_number": 5,//instance number "instance": [ { "id": "a5f9185a-5719-4414-9ce5-1ba9316d7050",//unique uuid "number": 1,//globel id "category": "person",//category "action": "moving boxes,walking",//action "pointCount": 358,//number of point "seg_points": [ 119855, 119856, ... ]//index for points "occlusion": 0,//occlusion level(0-1) "position": { "x": 8.642838478088379, "y": -1.170599341392517, "z": -0.4981747269630432 },//bbox position "rotation": 1.1941385296061557,//bbox rotation(Yaw) "boundingbox3d": { "x": 0.7874413728713989, "y": 0.4814544916152954, "z": 1.5814100503921509 }//bbox dimensions }, {...}, ... ] }, {...}, ... ] }
HCL_split.json { "train": [ "10-01-18-42-05-2.json", "10-01-18-55-50-2.json", ... ], "test": [ "10-03-16-35-25-1.json", "10-03-16-35-25-2.json", ... ] }
@article{xu2023human,
title={Human-centric Scene Understanding for 3D Large-scale Scenarios},
author={Xu, Yiteng and Cong, Peishan and Yao, Yichen and Chen, Runnan and Hou, Yuenan and Zhu, Xinge and He, Xuming and Yu, Jingyi and Ma, Yuexin},
journal={arXiv preprint arXiv:2307.14392},
year={2023}
}
}