基于图像回答算法的智能回复系统设计与实现
摘要:智能回复系统是近年来比较火热的研究领域之一。现有的智能回复系统大部分只能对文字相关的问题才能做出回复,但是对于图像相关的问题就无能为力了。所以本文实现了一个图像问答算法,并且设计并实现了一个智能回复系统app软件。该图像问答算法主要包括问题处理、图像处理和答案生成三个部分。卷积神经网络用来提取图像特征,将问题转换成词向量,然后将问题的词向量和图像特征同时作为长短期记忆网络的输入,输出即为答案。该图像问答算法适用于一些简单的问题,例如”What is ” “How many” 进行回答。算法在数据集VQA (v1)下的准确率达到了52.84%。基于图像问答算法和图灵机器人的Web接口,本文设计并实现了一个智能回复系统软件。该智能回复系统拥有服务端和安卓客户端两个部分。其中服务端利用Python的Web.py框架实现。该智能回复系统实现了图像问答、闲聊、开放领域问答、查询天气、讲笑话、讲故事等功能。
关键词:图像问答;智能回复系统;卷积神经网络;长短期记忆网络;
Design and implementation of intelligent reply system based on image answer algorithm
Abstract: Chat Robot are one of the hot research fields in recent years .Now, most of the chat robots can only deal with the text questions. However, when it comes to something about images, they can not have any response. So we implement a visual question answering algorithm (VQA) which based on the paper --”Visual Question answering”. And then we design and implement a chat robot software. The visual question answering algorithm is based on the convolutional neural network(CNN) and the long-short memory(LSTM). The CNN is used to extract the image features. We translate the natural language question into word vectors. Then we put the word vectors of the question and the vector of image feature into LSTM, and the output is the answer to the question. To train the model,we use the pretrained COCO image features and GloVe which published by Stanford.The algorithm we implemented can answer some simple questions, such as "What is" "How many",etc. The accuracy of the algorithm in the dataset VQA (v1) can reach to 52.84%.However, when we use the model we should extract the image feature by ourslves.So we implement the VGG-16 network to extract image feature. Based on this image question answering algorithm and the Turing robot's web interface, we design and implement a chat robot. The chat robot has both the server and the android client. Where the server is implemented by using Python's Web.py framework. Also, We use the Baidu Translation to transfer the Chinese question to English question. Then we can use the VQA model to answer the question written by Chinese. The main role of the server is to receive messages sent by the client and make the appropriate response at the same time. The function of the client is to send messages to the server and receive messages. The android app can also take photos and see the system gallery. The chat robot has the image question answering, chatting, open field question answering, query the weather, tell jokes, tell stories and other functions.
Key words: Visual Question Answering; Chat Robot; CNN; LSTM;
目 录
1绪论
1.1研究背景及意义
1.2问答系统研究综述
1.2.1智能回复系统软件
1.2.2问答系统国内外研究现状
1.2.3图像问答综述
1.3存在的主要问题
1.4研究内容
1.5技术路线
1.6论文组织结构
2相关理论知识
2.1问答系统
2.1.1问答系统的一般处理流程
2.1.2问答系统研究的基本问题
2.1.3问答系统的主要方法
2.2智能智能回复系统
2.2.1基于检索的智能回复系统
2.2.2基于生成对话的智能回复系统的结构
2.3图像问答
2.3.1图像问答算法框架
2.3.2卷积神经网络
2.3.3长短期记忆网络
3系统设计
3.1整体功能设计
3.2本文图像问答算法框架
3.3数据集
3.4数据集预处理
3.5 Keras框架搭建
3.5.1Keras简介
3.5.2Keras平台搭建
3.5.3Keras使用示例
4系统实现
4.1图像问答算法实现
4.1.1图像特征提取
4.1.2Embedding层
4.1.3模型的建立
4.2图像问答算法模型的应用
4.3算法测试
4.4服务端设计与实现
4.4.1开发环境
4.4.2服务端功能设计
4.4.3web.py简介及安装
4.4.4图灵机器人api简介及使用
4.4.5服务器端功能实现
4.5客户端设计与实现
4.5.1开发环境
4.5.2客户端功能设计
4.5.3客户端主要类图
4.5.4用例图
4.5.4客户端顺序图
4.5.5界面设计
4.5.6关键功能实现
5系统测试
5.1界面测试
5.2图片功能测试
5.3常用语发送测试
5.4聊天功能测试
6结论
参考文献
致谢