TransRVNet: LiDAR Semantic Segmentation With Transformer

Effective and efficient 3D semantic segmentation from large-scale LiDAR point cloud is a fundamental problem in the field of autonomous driving. In this paper, the authors present Transformer-Range-View Network (TransRVNet), a novel and powerful projection-based CNN-Transformer architecture to infer point-wise semantics. First, a Multi Residual Channel Interaction Attention Module (MRCIAM) is introduced to capture channel-level multi-scale feature and model intra-channel, inter-channel correlations based on attention mechanism. Then, in the encoder stage, they use a well-designed Residual Context Aggregation Module (RCAM), including a residual dilated convolution structure and a context aggregation module, to fuse information from different receptive fields while reducing the impact of missing points. Finally, a Balanced Non-square-Transformer Module (BNTM) is employed as fundamental component of decoder to achieve locally feature dependencies for more discriminative feature learning by introducing the non-square shifted window strategy. Extensive qualitative and quantitative experiments conducted on challenging SemanticKITTI and SemanticPOSS benchmarks have verified the effectiveness of their proposed technique. Their TransRVNet presents superior performance over most existing state-of-the-art approaches. The source code and trained model are available at https://github.com/huixiancheng/TransRVNet.

Language

  • English

Media Info

Subject/Index Terms

Filing Info

  • Accession Number: 01894601
  • Record Type: Publication
  • Files: TRIS
  • Created Date: Sep 26 2023 9:08AM