Yuhan Shen

沈宇寒

Applied Scientist

Amazon

Biography

Yuhan Shen is an Applied Scientist at Amazon AGI in Boston, MA. He received his PhD in Computer Science from the Khoury College of Computer Sciences at Northeastern University, where his research spanned weakly-supervised and unsupervised machine learning, computer vision, and multimodal learning. His doctoral work focused on egocentric video understanding, procedural learning, and action segmentation under the supervision of Professor Ehsan Elhamifar, and he was co-advised by Professor Lu Wang.

During his PhD, he enriched his expertise in video understanding through internships at Facebook AI Research (FAIR) with Dr. Effrosyni Mavroudi and Dr. Lorenzo Torresani, and at TikTok with Dr. Heng Wang.

Before Northeastern, he earned his bachelor’s degree from the Department of Electronic Engineering at Tsinghua University in China in 2018 and worked as a research assistant in the Speech and Audio Technology Lab under the guidance of Professor Wei-Qiang Zhang.

Interests

Video Understanding
Multimodal Learning
Computer Vision
Audio and Speech

Education

PhD in Computer Science, 2019 - 2025

Northeastern University
BEng in Electronic Engineering, 2014 - 2018

Tsinghua University

Experience

Applied Scientist

Amazon

Jul 2025 – Present Boston, MA

Research Scientist Intern

Meta AI

May 2023 – Aug 2023 New York

Worked on narration-based video object segmentation with Dr. Effrosyni Mavroudi and Dr. Lorenzo Torresani at FAIR.

Research Intern

ByteDance/TikTok

May 2022 – Aug 2022 Remote

Worked on multi-modal video captioning with Dr. Heng Wang, Dr. Linjie Yang, Dr. Longyin Wen, and Dr. Haichao Yu at Intelligent Creation - Vision and Graphics team.

Graduate Research Assistant

Northeastern University

Sep 2019 – Jul 2025 Boston, MA

Research projects include:

Unsupervised procedure learning via visual and language instructions
Semi-weakly supervised learning from instructional videos
Streaming video action segmentation
AI/AR task assistant for procedural guidance

Research Assistant

Tsinghua University

Jul 2018 – Jul 2019 Beijing, China

Research projects include:

Sound event detection and audio tagging
Keyword search from speech
Query-by-example spoken term detection

Posts

[2025/07] Successfully defended my thesis!

[2025/06] One paper is accepted to ICCV 2025

[2025/06] Accepted to CVPR Doctoral Consortium

[2025/04] One paper is accepted to CVPR 2025 as highlight presentation

[2024/12] Invited talk at Tsinghua University on Understanding Complex Activities in Egocentric Videos for AI Task Assistants

See all posts

Publications

Quickly discover relevant content by filtering publications.

(* indicates equal contribution)

Yuhan Shen, Ehsan Elhamifar . Understanding Multi-Task Activities from Single-Task Videos. CVPR (highlight), 2025.

PDF Poster Supplementary

Yuhan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, Lorenzo Torresani, Effrosyni Mavroudi . Learning to Segment Referred Objects from Narrated Egocentric Videos. CVPR (oral), 2024.

PDF Poster Slides Supplementary

Yuhan Shen, Ehsan Elhamifar . Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos. CVPR, 2024.

PDF Code Poster Supplementary

Yuhan Shen, Linjie Yang, Longyin Wen, Haichao Yu, Ehsan Elhamifar, Heng Wang . Exploring the Role of Audio in Video Captioning. CVPR MULA Workshop, 2024.

PDF Slides Supplementary

Yuhan Shen, Ehsan Elhamifar . Semi-Weakly-Supervised Learning of Complex Actions from Instructional Task Videos. CVPR, 2022.

PDF Code Poster Slides Supplementary

Yuhan Shen, Lu Wang, Ehsan Elhamifar . Learning to Segment Actions from Visual and Language Instructions via Differentiable Weak Sequence Alignment. CVPR (oral), 2021.

PDF Code Poster Slides Supplementary

Kexin He, Yuhan Shen, Wei-Qiang Zhang, Jia Liu . Staged Training Strategy and Multi-Activation for Audio Tagging with Noisy and Sparse Multi-Label Data. ICASSP, 2020.

PDF DOI

Kexin He*, Yuhan Shen*, Wei-Qiang Zhang . Multiple Neural Networks with Ensemble Method for Audio Tagging with Noisy Labels and Minimal Supervision. DCASE Workshop, 2019.

PDF DOI

Yu-Han Shen, Ke-Xin He, Wei-Qiang Zhang . Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection. Interspeech (oral), 2019.

PDF DOI

Ke-Xin He*, Yu-Han Shen*, Wei-Qiang Zhang . Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection. Interspeech, 2019.

PDF DOI

Yu-Han Shen, Ke-Xin He, Wei-Qiang Zhang . SAM-GCNN: A Gated Convolutional Neural Network with Segment-Level Attention Mechanism for Home Activity Monitoring. ISSPIT, 2018.

PDF DOI

Contact

55 Pier 4 Blvd., Boston, MA, 02110, United States

Yuhan Shen

沈宇寒

Applied Scientist

Amazon

Biography

Interests

Education

Experience

Applied Scientist

Amazon

Research Scientist Intern

Meta AI

Research Intern

ByteDance/TikTok

Graduate Research Assistant

Northeastern University

Research Assistant

Tsinghua University

Posts

Publications

Projects

Semi-Weakly Supervised Learning of Complex Actions

Multi-modal Procedure Learning from instructional videos

Audio Tagging with Noisy Labels and Minimal Supervision

Research on Sound Event Detection

Contact

Yuhan Shen

沈宇寒 Applied Scientist

Amazon

Biography

Interests

Education

Experience

Applied Scientist

Amazon

Research Scientist Intern

Meta AI

Research Intern

ByteDance/TikTok

Graduate Research Assistant

Northeastern University

Research Assistant

Tsinghua University

Posts

Publications

Projects

Semi-Weakly Supervised Learning of Complex Actions

Multi-modal Procedure Learning from instructional videos

Audio Tagging with Noisy Labels and Minimal Supervision

Research on Sound Event Detection

Contact

沈宇寒

Applied Scientist