Audio Tagging with Noisy Labels and Minimal Supervision
This is a task of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2019. It aims to develop a well-performed audio tagging system using a small amount of manually-labeled data and a large quantity of noisy-labeled data. We took part in this competition and won the 2nd place. To achieve state-of-the-art performance, we mainly adopted the following strategies:
- We used mixup and SpecAugment for data augmentation.
- We designed a sigmoid-softmax activation structure to deal with sparse multi-label classification.
- We proposed a staged training strategy to learn from noisy data.
- We applied post-processing method that normalizes output scores for each sound class.
- We adopted ensemble method that averages models learned with multiple neural networks and acoustic features.