EPRNet: Efficient Pyramid Representation Network for Real-Time Street Scene Segmentation

Current scene segmentation methods suffer from cumbersome model structures and high computational complexity, impeding their applications to real-world scenarios that require real-time processing. This paper proposes a novel Efficient Pyramid Representation Network (EPRNet), which strikes an innovative record on segmentation accuracy, model lightness and inference efficiency. Unlike existing methods delivering transfer learning based on pixel features of limited receptive fields encoded by shallow image classification backbones, EPRNet distributes multi-scale representations throughout the feature encoding flow to quickly enlarge and enrich receptive fields. Specifically, the authors introduce an extremely lightweight and efficient Multi-scale Processing Unit (MPU) that encodes multi-scale features through parallel convolutions of different kernels. By combining MPU and residual learning, they propose a core Pyramid Representation Module (PRM) to correctly acquire and aggregate region-based contexts in both shallow and deep layers. In this way, EPRNet can encode discriminative and comprehensive representations of multi-scale objects with a compact structure. They conduct extensive experiments on Cityscapes and CamVid datasets, demonstrating the superiority. Without any extra and coarse labeled data, EPRNet obtains mIoU 73.9% on the Cityscapes test set with only 0.9 million parameters at a speed of 42 FPS.

Language

  • English

Media Info

Subject/Index Terms

Filing Info

  • Accession Number: 01860114
  • Record Type: Publication
  • Files: TRIS
  • Created Date: Sep 30 2022 2:27PM