注意力机制驱动深度学习的多标签图像分类系统设计与实现
摘要 随着多媒体技术的快速发展和互联网的迅速普及,多标签图像数据的规模在不断扩大, 多标签图像分类逐渐成为计算机视觉领域中的重要分支。与单标签分类任务不同,多标签图像中具有类别复杂多样,类标数量不确定等难题,因此如何解决这些问题成为多标签图像分类研究的关键。
多标签分类任务主要处理两个方面的问题,即如何增强标签与图像区域之间的映射关系和标签与标签之间的共现关系。考虑到以上的两个方面,本文基于注意力机制和图卷积神经网络, 设计了一个端到端的模型架构,它主要由两个部分组成。一个是注意力模块,用于加强语义区域和特定标签的关联,并生成基于图像内容的标签类别表示;第二个模块是动态图卷积模块, 用于学习所感知类别之间的相关性,最终依据这两个模块进行最终的多标签分类。具体而言, 我们首先通过注意力模块将图像的基本特征图转换为基于图像内容的标签类别表示,然后将这些标签类别表示输入到图卷积神经网络模块中,该模块由静态图和动态图组成,两种图依次进行特征传播和更新。最终,由动态图卷积网络构建出新的基于图像内容感知的标签类别表示的深层关联。在Microsoft COCO 2014数据集上的实验表明,该方法能有效提升多标签图像分类的综合性能。根据与其它方法的对比可以知道,本文所设计的方法能够更加准确地进行多标签图像分类。
基于本文设计的模型,借助PyQt设计实现了多标签图像分类系统。此系统用户交互简单, 主要可分为模型训练和图像分类两部分。经过系统测试可以知道,该模型的多标签图像分类的效果较好,验证了本文设计方法的实用性。
关键词:多标签图像分类; 注意力机制; 图卷积神经网络; 特征提取
Design and Implementation of Multi-lable Image Classification System Based on Deep Learning Driven by Attention Mechanism
Abstract With the rapid development of multimedia technology and the fast popularity of the Internet, the scale of multi-lable image data is expanding, and multi-label image classification has gradually become an important research direction in the field of computer vision. Different from the
traditional single-lable classification task, there are more complex and changeable semantic relations in multi-lable images, which makes the task of multi-lable classification relatively difficult. Therefore, for multi-lable classification, we need to mine the classification method with fast classification speed and high accuracy.
The task of multi-label classification mainly deals with two aspects, namely, how to enhance the mapping relationship between lables and image regions and the co-occurrence relationship between lables and lables. Based on the above two aspects, this thesis designs an end-to-end attention-driven dynamic graph convolution network, which is mainly composed of two modules. One is the semantic attention module, which is used to locate the semantic region and generate a content-aware category representation for each image; the second module is the dynamic graph convolution module, which is used to learn the correlation between perceived categories, and finally carry on the final multi-label classification according to these two modules. Specifically, we first decompose the convolution feature graph into multiple content-aware category representations through the semantic attention module, and then input these representations into the dynamic GCN module, which propagates features through two joint graphs: static graph and dynamic graph.
Finally, the subtle dependencies represented by these content-aware categories are captured by dynamic GCN and output classification. Experiments on Microsoft-COCO2014 data sets show that this method can effectively improve the comprehensive performance of multi-label image classification.According to the comparison with other methods, we can know that the method designed in this thesis can classify multi-label images more accurately.
Based on the multi-lable classification algorithm in this thesis, a multi-lable classification system is designed and implemented by PyQt . The system includes the training of dataset and the function module of multi-label image prediction and classification. The user interaction of this system is simple, and the effect of multi-lable image classification is good, which verifies the practicability of the design method in this thesis.
Keywords: Multi-label image classification; Attention mechanism; Graph convolution network; Feature extraction
目录
第一章 绪论 1
1.1 课题研究背景和意义 1
1.2 国内外研究现状 2
1.2.1 基于单标签图像改进后的多标签分类方法研究现状 2
1.2.2 基于标签语义关联和标签区域关联的多标签分类方法研究现状 2
1.3 论文的主要工作 3
1.4 论文的组织安排 4
第二章 相关理论与网络模型的介绍 6
2.1 多标签图像分类的概念 6
2.2 特征提取相关网络模型的介绍 7
2.2.1 卷积神经网络模型 7
2.2.2 VGG 网络模型 8
2.2.3 ResNet 残差网络模型 8
2.2.4 注意力机制 9
2.2.5 图卷积神经网络 10
第三章 联合注意力的动态图卷积多标签图像分类模型 12
3.1 模型的网络结构 12
3.1.1 整体网络结构 12
3.1.2 使用深度神经网络来获取基本图像特征 13
3.1.3 使用注意力机制获取显著图像特征 14
3.1.4 图像内在关联分离器的学习 15
3.2 多标签图像分类模型的训练 17
3.3 使用训练好的模型实现多标签分类 17
3.4 实验与分析 18
3.4.1 实验环境 18
3.4.2 评估数据集合和评价指标 18
3.4.3 实验参数设置 19
3.4.4 COCO2014数据集上实验结果与分析 19
第四章 多标签图像分类系统分析设计 21
4.1 需求分析 21
4.2 可行性分析 21
4.3 总框架设计 22
4.4 总功能模块设计 22
第五章 多标签图像分类系统实现 24
5.1 开发环境简介 24
5.2 系统界面实现 24
5.3 系统测试 27
第六章 总结与展望 31
6.1 结论 31
6.2 对未来工作的展望 31
致 谢 33
参考文献 34