Abstract
Critical care for newborns admitted to the neonatal intensive care unit (NICU) requires continuous monitoring of vital signs. Using an RGB-D camera placed above the patient and a pressure sensitive mat (PSM) below, we are presented with three different data sources that capture different aspects of the scene: RGB video, depth video, and pressure data. The use of these modalities in conjunction with one another can reveal scene information that can’t be derived from studying one data source alone. For this purpose, we demonstrate a few registration techniques to align or otherwise combine the data from multiple sources and quantify the error of the registration techniques.
Up to six hours of RGB video, depth video, and PSM recordings from 33 neonatal patients were collected from the NICU of the Children’s Hospital of Eastern Ontario. The RGB-D cameras were placed above the patient’s bed (incubator, crib, or overhead warmer), often at an unspecified angle.
Registration between video and PSM requires manual detection of distinct landmarks within the patient scene. These landmarks must be identifiable in both the video and PSM streams. To this end, a high-visibility instrument (coloured pen) was used to apply pressure at distinct locations on the PSM. The pen tip is clearly visible in the video and in the PSM area, as an area of focused high contact pressure. During patient data collection, the experimenter was instructed to briefly press on four different areas of the PSM with a pen in order to produce landmarks that would be detectable in both streams. The landmarks can then be used to compute a geometric transformation between the video imaging plane and the PSM surface.
The RGB-D cameras were not placed in such a way that the resulting scene is a direct overhead view, nor were the placements of the cameras consistent between beds and patients. Therefore, a transformation on the depth data adjusting the angle of projection of the camera is required before the data can be utilized meaningfully. The rotation matrix can be calculated from three coplanar points selected manually on the image. The image array is deprojected into a pointcloud array, the transformation is applied and the transformed pointcloud is projected back into a 2D image array. Cross-sections of the scene can then be taken to slice and segment a subject from its background, allowing for the exclusion of occluding objects in the scene from the region-of-interest being studied.
From video data, fusion of color and depth video can be useful to overcome lighting variations or occlusions that are often perceived in the NICU. In fact, incubator patients are often projected to complete darkness to promote their growth, projected to phototherapy lighting to treat jaundice, or fully occluded by beddings for heat retention. In such cases, it can be difficult to visualize the patient from the color data, while the depth data is more robust to these vision-based challenges. Multiple RGB-D fusion schemes are explored to identify bed occupancy, on-going interventions, and patient coverage using the VGG-16 deep neural network [1]. The RGB and depth data are fused directly as a 3-channel or 4-channel RGB-D image (image fusion), or separately at different layers in the early, middle, or late stages in the network (network fusion).
Full Presentation
The poster for this exhibit was presented at the CASCONxEVOKE 2021 conference for advanced studies in computer science and software engineering, sponsored by IBM Canada. Due to the conference being held online, the poster was presented in a video format. You can watch it online below:
Copyright Information
© 2021 IBM.