Yufan Ren
Hello! I am currently a direct Ph.D. student (2020-) at Image and Visual Representation Lab, part of the School of Computer and Communication Sciences at EPFL, Switzerland. Under the guidance of Prof. Sabine Süsstrunk and Dr. Tong Zhang, I work on 3D vision, Neural Rendering, and Diffusion Models.
During my Ph.D., I did two internships, one NVIDIA Zurich on learning-based robot perception under Dr. Alexander Millane, another at Meta London under Dr. Filippos Kokkinos.
Before my Ph.D. study, I earned my bachelor's degree from Zhejiang University, Hangzhou, where I was honored to receive the Chu Kochen Award.
I am actively seeking opportunities for industry jobs and postdoctoral positions in 2025.
Email  / 
Google Scholar  / 
LinkedIn  / 
GitHub
|
|
|
Text-Guided Latent Diffusion Image Editing
Yufan Ren,
Zicong Jiang,
Tong Zhang,
Zheng Dang,
Søren Otto Forchhammer,
Sabine
Süsstrunk
ArXiv'24, Project Page
In this paper, we analyze these failure cases of Text-guided image editing and introduce a simple yet effective approach that enables selective optimization of specific frequency bands within spatially localized regions.
|
|
DiffusionPCR: Diffusion Models for Robust Multi-Step Point Cloud Registration
Zhi Chen*,
Yufan Ren*,
Tong Zhang,
Zheng Dang,
Wenbing Tao,
Sabine
Süsstrunk,
Mathieu Salzmann
ArXiv'23, Project Page
Point Cloud Registration (PCR) estimates the relative rigid transformation between two point clouds. We propose formulating PCR as a denoising diffusion probabilistic process, mapping noisy transformations to the ground truth.
|
|
VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction
Yufan Ren*,
Fangjinhua Wang*,
Tong Zhang,
Marc Pollefeys,
Sabine Süsstrunk
CVPR'23, arXiv, Project Page, Code
We introduce VolRecon, a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF). To reconstruct the scene with fine details and little noise, VolRecon combines projection features aggregated from multi-view features, and volume features interpolated from a coarse global feature volume.
|
|
Learning V1 Simple Cells with Vector Representation of Local Content and Matrix Representation of Local Motion
Ruiqi Gao,
Jianwen Xie,
Siyuan Huang,
Yufan Ren,
Song-Chun Zhu,
Ying Nian Wu
AAAI'22, arXiv
We propose a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components: (1) the vector representations of local contents of images and (2) the matrix representations of local pixel displacements caused by the relative motions between the agent and the objects in the 3D scene.
|
|
Blendedmvs: A large-scale dataset for generalized multi-view stereo networks
Yao Yao,
Zixin Luo,
Shiwei Li,
Jingyang Zhang,
Yufan Ren,
Lei Zhou,
Tian Fang,
Long Quan
CVPR'20, arXiv, Dataset
We introduce BlendedMVS, a novel large-scale dataset, to provide sufficient training ground truth for learning-based MVS. To create the dataset, we apply a 3D reconstruction pipeline to recover high-quality textured meshes from images of well-selected scenes. Then, we render these mesh models to color images and depth maps. To introduce the ambient lighting information during training, the rendered color images are further blended with the input images to generate the training input.
|
Selected Projects and Activities
|
|
International Computer Vision Summer School 2023
Università di Catania,
The school aims to provide a stimulating opportunity for young researchers and Ph.D. students. The participants will benefit from direct interaction and discussions with world leaders in Computer Vision. Participants will also have the possibility to present the results of their research, and to interact with their scientific peers, in a friendly and constructive environment.
|
|
Multimodal Fake Media Detection: AI Singapore Trusted Media Challenge
AI Singapore 2022, EPFL News
Peter Grönquist and I did this challenge and won the 100,000 USD prize (incl. grant). In this challenge, we design machine learning models to detect three types of fakeness, i.e., fake faces (DeepFakes), manipulated audio, and mis-synchronization (lip-sync), and use engineering tricks to make it fast.
|
Thanks for the awesome template of Jon Barron.
|