Developing in-vehicular noise robust children ASR system using Tandem-NN-based acoustic modelling

Processing of children's speech is always challenging due to data scarcity and inefficient modelling input feature vectors. Accuracy of the modelling phase is always dependent upon extracted input features. In this paper, posterior probabilities are estimated over a phone set using first discriminatively trained model through neural-net pre-processor. This Neural Network (NN) classifier is first trained on original speech and then context-independent phone posterior probabilities are estimated on Tandem-NN system. The output vectors are employed as default features which are processed on Deep Neural Network-Hidden Markov Model (DNN-HMM) models. The original data-based system performance is improved by extending it using data augmentation. To see the robustness of the augmented speech various in-vehicle data are investigated and found that it is superior to that of other systems. Finally, the authors combine all augmented data to overcome data scarcity challenges to enhance system performance. It gives a relative improvement of 23.77% over the baseline system.

Language

  • English

Media Info

Subject/Index Terms

Filing Info

  • Accession Number: 01784234
  • Record Type: Publication
  • Files: TRIS
  • Created Date: Oct 6 2021 9:40AM