目录
摘要 1
Abstract 2
前言 3
第一章 概述 4
1.1引言 4
1.2 课题的主要任务和目标 4
1.3本文的组织 4
第二章 相关技术背景 6
2.1 知识图谱 6
2.2文本情感分析 6
第三章 数据整理和算法设计 8
3.1数据的清洗 8
3.2中文分词 9
3.3停用词处理 10
3.4知识图谱提取设计 10
3.5本章小结 11
第四章 实验设计分析和评估 12
4.1实验设计 12
4.2实验结果分析 20
4.3实验评估 25
4.4本章小结 25
第五章 总结和展望 27
5.1课题总结 27
5.2课题展望 27
参考文献 29
致谢 30
摘要
自从google公司推出旗下的产品Knowledge Graph以来,知识图谱这个概念越来越受到学术与工业界的关注。如何以质量参差不齐的网页数据作为原始数据源,构建知识图谱已经成为了一个热门的研究课题。
互联网技术的迅速发展导致了网民数量的快速增长。愈来愈多的人热衷于在网络论坛上发表想法。在这个背景下,投诉类型论坛吸引了政府和个人的广泛关注。他们都希望能从投诉类型论坛的帖子中获取一些有价值的信息。
本文首先介绍本课题的研究背景,主要任务与目标:构建基于投诉文本的知识图谱;然后介绍本课题相关的技术背景:知识图谱以及文本情感分析;接着介绍实验的数据处理和构建知识图谱的算法设计:本文使用Beautiful Soup进行文本提取,使用jieba工具进行分词,使用流水线方法进行实体关系抽取;随后本文介绍具体的实验过程,给出实验结果和实验评估;最后本文对本课题的工作做一个总结和展望。
关键词:知识图谱;情感分析;命名实体识别;关系抽取
Abstract
Since google company introduced its product Knowledge Graph, the concept of knowledge graph has been increasingly concerned by academic and industrial circles. How to use the Web page data with uneven quality as the original data source to build a knowledge graph has become a hot research topic.
The rapid development of Internet technology has led to a rapid increase in the number of Internet users. More and more people are keen to express ideas in online forums. Under such circumstances, the complaint type forum attracted wide attention from the government and individuals. They all hope to get some valuable information from the complaint type forum post.
This paper first introduces the research background, main tasks and goals of the project: constructs a text-based knowledge map; then introduces the related technical background of this topic: knowledge graph and text sentiment analysis; then introduces the experimental data processing and algorithmic construction of the knowledge graph design: This paper uses Beautiful Soup for text extraction, and uses jieba tools for word segmentation, and the use of pipeline method for entity relationship extraction; then this paper describes the specific experimental process, gives experimental results and experimental evaluation; Finally, this paper summarizes and prospects the work of this topic.
Keywords: knowledge graph; sentiment analysis; named entity recognition; relation extraction