Hello! I am currently a direct Ph.D. student (2020-) at Image and Visual Representation Lab, part of the School of Computer and Communication Sciences at EPFL, Switzerland. Under the guidance of Prof. Sabine Süsstrunk and Dr. Tong Zhang, I work on 3D vision, Neural Rendering, and Diffusion Models.
Prior to my Ph.D., I earned my bachelor's degree from Zhejiang University, Hangzhou, where I was honored to receive the Chu Kochen Award. Those four years were some of the happiest of my life.
I'm passionate about the recent advancement in generative models (Diffusion Models) and large language models (chatGPT), and I'm always eager to engage in discussions on these topics. If you're a master's student at EPFL and share similar interests, I'm currently seeking a research assistant to work with me. Some of my available master projects are on the lab's website here, and don't hesitate to reach out if you have any questions or want to chat!
Google Scholar  /
DiffusionPCR: Diffusion Models for Robust Multi-Step Point Cloud Registration
ArXiv'23, arXiv, Project Page, Code (Coming soon)
Point Cloud Registration (PCR) estimates the relative rigid transformation between two point clouds. We propose formulating PCR as a denoising diffusion probabilistic process, mapping noisy transformations to the ground truth.
VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction
CVPR'23, arXiv, Project Page, Code
We introduce VolRecon, a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF). To reconstruct the scene with fine details and little noise, VolRecon combines projection features aggregated from multi-view features, and volume features interpolated from a coarse global feature volume.
Learning V1 Simple Cells with Vector Representation of Local Content and Matrix Representation of Local Motion
Ying Nian Wu
We propose a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components: (1) the vector representations of local contents of images and (2) the matrix representations of local pixel displacements caused by the relative motions between the agent and the objects in the 3D scene.
Blendedmvs: A large-scale dataset for generalized multi-view stereo networks
CVPR'20, arXiv, Dataset
We introduce BlendedMVS, a novel large-scale dataset, to provide sufficient training ground truth for learning-based MVS. To create the dataset, we apply a 3D reconstruction pipeline to recover high-quality textured meshes from images of well-selected scenes. Then, we render these mesh models to color images and depth maps. To introduce the ambient lighting information during training, the rendered color images are further blended with the input images to generate the training input.
Selected Projects and Activities
International Computer Vision Summer School 2023
Università di Catania,
The school aims to provide a stimulating opportunity for young researchers and Ph.D. students. The participants will benefit from direct interaction and discussions with world leaders in Computer Vision. Participants will also have the possibility to present the results of their research, and to interact with their scientific peers, in a friendly and constructive environment.
Multimodal Fake Media Detection: AI Singapore Trusted Media Challenge
AI Singapore 2022, EPFL News
Peter Grönquist and I did this challenge and won the 100,000 USD prize (incl. grant). In this challenge, we design machine learning models to detect three types of fakeness, i.e., fake faces (DeepFakes), manipulated audio, and mis-synchronization (lip-sync), and use engineering tricks to make it fast.
Thanks for the awesome template of Jon Barron.