Publications

. Understanding Multi-Task Activities from Single-Task Videos. CVPR (to appear), 2025.

. Learning to Segment Referred Objects from Narrated Egocentric Videos. CVPR (oral), 2024.

PDF Poster Slides Supplementary

. Exploring the Role of Audio in Video Captioning. CVPR MULA Workshop, 2024.

PDF Slides Supplementary

. Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection. Interspeech (oral), 2019.

PDF DOI

. Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection. Interspeech, 2019.

PDF DOI