Audio Tagging with Noisy Labels and Minimal Supervision

Last updated on Jan 14, 2025

PDF

This is a task of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2019. It aims to develop a well-performed audio tagging system using a small amount of manually-labeled data and a large quantity of noisy-labeled data. We took part in this competition and won the 2nd place. To achieve state-of-the-art performance, we mainly adopted the following strategies:

We used mixup and SpecAugment for data augmentation.
We designed a sigmoid-softmax activation structure to deal with sparse multi-label classification.
We proposed a staged training strategy to learn from noisy data.
We applied post-processing method that normalizes output scores for each sound class.
We adopted ensemble method that averages models learned with multiple neural networks and acoustic features.

Audio

Yuhan Shen

沈宇寒

PhD student

My research interests include machine learning, computer vision and natural language processing.

Audio Tagging with Noisy Labels and Minimal Supervision

Yuhan Shen

沈宇寒 PhD student

沈宇寒

PhD student