https://github.com/mhamilton723/FeatUp
https://arxiv.org/pdf/2403.10516

- The paper introduces FeatUp, a framework to enhance the resolution of vision model features by leveraging multiview consistency of low-resolution signals, resulting in significant improvements across various computer vision tasks.
- FeatUp significantly improves the spatial resolution of deep features without altering their semantics, making them suitable as drop-in replacements for existing features to enhance performance on various tasks.
- The methodology involves proposing FeatUp as a framework to improve the resolution of vision model features by utilizing multiview consistency of low-resolution signals and learning an upsampling network with a multiview consistency loss. This method is inspired by 3D reconstruction frameworks like NeRF and has shown improvements in tasks such as semantic segmentation and depth prediction.
Is it possible to use it in our pipeline?
In particular, it can improve the quality of feature maps when there are small/blurry objects or overlaps between them
here is the result on one of our samples which is a fast moving ball :

The feature map's quality was improved based on upsampled features.
There is no code for training on custom datasets yet, but the corresponding author told me that they intend to release one in the next few days (according to him). Therefore, I will watch the repo to retrain it as soon as possible.