Audio Tagging with Noisy Labels and Minimal Supervision

This is a task of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2019. It aims to develop a well-performed audio tagging system using a small amount of manually-labeled data and a large quantity of noisy-labeled data. We took part in this competition and won the 2nd place. To achieve state-of-the-art performance, we mainly adopted the following strategies:

  1. We used mixup and SpecAugment for data augmentation.
  2. We designed a sigmoid-softmax activation structure to deal with sparse multi-label classification.
  3. We proposed a staged training strategy to learn from noisy data.
  4. We applied post-processing method that normalizes output scores for each sound class.
  5. We adopted ensemble method that averages models learned with multiple neural networks and acoustic features.
Avatar
Yuhan Shen
沈宇寒

PhD student

My research interests include machine learning, computer vision and natural language processing.