1200字范文 > Fine-Grained Generalized Zero-Shot Learning viaDense Attribute-Based Attention

Fine-Grained Generalized Zero-Shot Learning viaDense Attribute-Based Attention

时间：2021-10-20 12:45:28

这是一篇发表在CVPR上关于Zero-shot Learning的文章。

论文的进步

①本文主要针对 fine-grained 分类

②基于attribute的dense attention，为每个attribute定位最准确的图像区域，生成 attribute feature

③使用第a个 attribute中的words的平均 GloVe representations（GloVe model trained on Wikipedia articles.）获取 attribute semantic vector；对齐 attribute feature和 attribute semantic vector，得到一个vector of attribute scores；而不是直接对齐 class semantic vector和 global features；以捕捉更加细节的信息。

④特别的，用一个attention调整attribute scores以更好捕捉各个attribute的discriminative power，使得本模型能处理好classes that are different in only a few attributes

总体框架图

具体模型

①将一张 input图像分成 R个 regions ，分别抽取特征得到 region features ，记，通过本文提出的 attention module g(.)为每个 attribute计算其 attention feature，其中αr为选取第r个的比例。即 attribute-based spatial attention模块

②含A个attributes的类别c的semantic vector 表示为 , 其中 za 指类别c中含有第a个attribute的score

③对齐上面得到的 attention features和 attribute semantic vectors ，计算图像中某attribute的存在与否。得到一个 vector of attribute scores ei，对应图片中是否出现各个attribute. 将该vector与 class semantic vector相似度最大化，计算分类到第i类中的得分 si

④上面（5）的问题，每个 attribute都对class score 有影响。fine-grained分类中大多数 attributes相同，仅个别 attributes有影响。用一个 attention over attributes调整各 attribute 贡献

⑤ 用 cross-entropy loss 最小化 model prediction 和 the ground-truth label 的距离（8）。为了解决bias towards seen classes的问题，用一个self-calibration loss调整unseen classes的probability来弥补（10）。但是（10）有个问题：训练图片一定是 seen，（10）降低了seen概率，提高unseen概率，不想要这样的效果。改进版（11）：使得训练时在unseen上的概率非0的同时使它很低