![]() ![]() The models’ objective is to use point clouds (preprocessed from RGB-D images) and estimate oriented 3D bounding boxes as well as semantic classes of objects. These methods go beyond existing methods in that they fully account for available depth information without increasing compute cost in a prohibitive manner. In this blogpost, we explore Meta (Facebook)’s 3DETR and its predecessor Votenet which present a clever approach to recognizing objects in a 3D point cloud of a scene (see, and for the research articles). 3D data allows a rich spatial representation of the sensor’s surroundings and has applications in robotics, smart home devices, driverless cars, medical imaging and many other industries. Some smartphones now feature Lidar sensors (acronym for “light detection and ranging”, sometimes called “laser scanning”) while other cameras use RGB-D cameras (an RGB-D image is the combination of a standard RGB image with its associated “depth map”) like Kinect or Intel RealSense. With 3D data becoming more and more widely available that time may very well be now. The world is not flat, however, and adding a third dimension promises to not only increase performance but also make possible entirely new applications. ![]() ![]() Most computer vision applications today work with ‘flat’ two-dimensional images like the ones you find in this medium blogpost, to great success. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |