Publications

. Learning to Segment Referred Objects from Narrated Egocentric Videos. CVPR (oral), 2024.

. Exploring the Role of Audio in Video Captioning. CVPR MULA Workshop, 2024.

PDF

. Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection. Interspeech (oral), 2019.

PDF DOI

. Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection. Interspeech, 2019.

PDF DOI