摘 要
随着大数据时代到来,人们逐渐从信息缺乏的时代走向了信息过载的时代。如今,学术资源每年以亿数量级增长,这给用户(科研人员)的学术研究提供了宝贵的资源财富。但与此同时,大量的资源也使得如何使用户获取自己感兴趣的资源以及如何高效地获取这些资源成为目前亟待解决的一个问题。学术资源推荐系统是解决这一问题的最有效的方法之一,基于此,本文开发一个基于知识图谱的学术资源推荐系统。
传统的推荐系统方法分为协同过滤推荐、基于内容推荐,而本文所采用的基于知识图谱的学术资源推荐系统则是在传统推荐方法的基础上,将知识图谱作为辅助工具,以知识图谱的三元组信息使推荐系统更精确、更具解释性和多样性。
本文基于DBLP学术资源网站所提供的原始数据,完成了以下主要工作:
(1)对原始数据进行了解析并进行数据预处理,包括数据清洗等工作;
(2)导入neo4j图数据库,并通过neo4j实现了知识图谱的展示;
(3)使用推荐算法itemCF,UserCF以及Cypher语言实现了论文的推荐;
(4)使用python库py2neo实现了前后端交互;
(5)使用python库tkinter实现了用户界面可视化。
测试表明,本文最终实现的原型系统很好地完成了上述功能
关键词:推荐系统,知识图谱,neo4j,itemCF算法,UserCF算法
ABSTRACT
With the advent of the era of big data, people have gradually moved from an era of lack of information to an era of information overload. Today, academic resources grow at an annual order of billions, which provides valuable resources for academic research for users (i.e. researchers). However, a large number of resources also make it an urgent problem to be solved how to make the users access the resources they are interested in and how to obtain them efficiently. The recommender system for academic resource is one of the most effective ways to solve this problem. In this case, a recommender system for academic resources based on knowledge graph is designed and implemented in this thesis.
The traditional method of recommender system is divided into collaborative filtering recommendation and content-based recommendation. The recommender system for academic resource with knowledge graph adopted in this thesis is based on the traditional recommendation method which is using knowledge graph as an auxiliary tool, and the recommendation system is more accurate, more explanatory and more diversity based on the triple of the knowledge graph.
Based on the original data provided by the DBLP Academic Resources website, this project completed the following main work.
(1) Parses the original data and performing data preprocessing, including data cleaning.
(2) Imports the neo4j graph database and display the knowledge graph through neo4j.
(3) Applies the recommended algorithm itemCF ,userCF and Cypher language to implement the paper's recommendation.
(4) Uses the python library py2neo for front-end interaction.
(5) Uses the python library Tkinter to visualize the user interface.
The test results to the prototype implemented by this thesis show that above functions have been completed well.
Key Words: Recommender System, Knowledge Graph, Neo4j, ItemCF, UserCF
目 录
摘要 III
ABSTRACT V
1 绪论
1.1 课题的研究背景及意义 1
1.2 国内外研究现状 2
1.3 本文研究内容 4
1.4 本文组织结构 4
2 相关理论与技术
2.1 数据获取 7
2.2 Neo4j数据库介绍 7
2.3 前后端交互python库py2neo介绍 8
2.4 前端界面python库Tkinter介绍 8
2.5 知识图谱 8
2.6 推荐算法 8
2.7 本章小结 10
3 系统分析
3.1 系统任务概述 11
3.2 系统运行环境 11
3.3 系统开发环境 11
3.3.1 程序语言 11
3.3.2 数据库 11
3.4 数据来源 12
3.5 功能性需求描述 13
3.6 非功能性需求描述 18
3.7 本章小结 18
4 系统设计
4.1 系统架构设计 19
4.2 数据库设计 20
4.2.1 概念设计 20
4.2.2 逻辑设计 21
4.2.3 物理设计 21
4.3 系统功能结构 23
4.3.1 数据获取模块 24
4.3.2 数据预处理模块 24
4.3.3 用户登录模块 24
4.3.4 用户功能模块 24
4.3.5 知识图谱构建模块 24
4.3.6 论文查询模块 24
4.3.7 论文推荐模块 25
4.4 本章小结 25
5 系统实现
5.1 DBLP数据获取 27
5.2 数据预处理 28
5.3 知识图谱构建与展示 30
5.4 用户界面编写 32
5.5 用户功能实现 34
5.6 论文查询功能实现 36
5.7 论文推荐功能实现 39
5.7.1 用户端推荐功能实现 39
5.7.2 作者端推荐功能实现 42
5.8 本章小结 43
6 结论
6.1 本文工作总结 45
6.2 进一步展望 45
参考文献 47
致谢 49