"Few-Shot Object Detection by Second-Order Pooling" by Shan

"Few-Shot Object Detection by Second-Order Pooling" by Shan Zhang, Dawei Luo et al.


Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
In this paper, we tackle a challenging problem of Few-shot Object Detection rather than recognition. We propose Power Normalizing Second-order Detector consisting of the Encoding Network (EN), the Multi-scale Feature Fusion (MFF), Second-order Pooling (SOP) with Power Normalization (PN), the Hyper Attention Region Proposal Network (HARPN) and Similarity Network (SN). EN takes support image crops and a query image per episode to produce covolutional feature maps across several layers while MFF combines them into multi-scale feature maps. SOP aggregates them per support image while PN detects the presence of visual feature instead of counting its frequency of occurrence. HARPN cross-correlates the PN pooled support features against the query feature map to match regions and produce query region proposals that are then aggregated with SOP/PN. Finally, support and query second-order descriptors are passed to SN. Our approach performs well because: (i) HARPN leverages SOP/PN for cross-correlation of detected rather than counted support features with query features which improves region proposals, (ii) SOP/PN capture second-order statistics per region proposal and factor out spatial locations, and (iii) PN limits the complexity of the space of functions over which HARPN and SN learn. These properties lead to the state of the art on the PASCAL VOC 2007/12, MS COCO and the FSOD datasets.

Related Keywords

, Similarity Network , Hyper Attention Region Proposal Network , Few Shot Object Detection , Power Normalizing Second Order Detector , Encoding Network , Multi Scale Feature Fusion , Second Order Pooling , Power Normalization ,

© 2025 Vimarsana