Enhanced Monocular 3D Object Reconstruction

Adam Deryło, Małgorzata Gwiazda, Emin Sadikhov

February 24, 2025

PDF

Splatter Image, a recent approach for monocular 3D object reconstruction, achieves high efficiency using Gaussian splatting while maintaining state-of-the-art performance. In this work, we propose enhancements to this project through three contributions: (i) incorporating semantic embeddings from pre-trained vision-language models to provide richer contextual understanding, (ii) integrating monocular depth estimation to improve geometric accuracy, and (iii) enhancing loss calculations by using Total Variation and Edge losses to refine reconstruction details. Our experiments show that semantic conditioning, particularly using DINO embeddings, significantly improves view consistency and generalization. Depth information further enhances reconstruction quality by constraining the solution space, but loss modifications do not bring substantial improvements. Code is available at https://github.com/splatter-works/splatter-image.