摘 要
现在电影资源是网络资源的重要组成部分,随着网络上电影资源的数量越来越庞大,设计电影个性化推荐系统迫在眉睫。所以本文旨在为每一个用户推荐与其兴趣爱好契合度较高的电影。
本系统包含电影前端展示界面、电影评分板块、推荐算法的实现以及后端数据库的设计。其中实现推荐算法是整个电影推荐系统的核心,系统采用爬虫技术和协同过滤算法,是推荐领域最出名也是应用最广泛的推荐算法。所以系统拟采用两种协同过滤算法给出两种不同的推荐结果,一种是基于用户的协同过滤算法,另一种是基于物品的协同过滤算法,用户可以根据两种推荐结果更加合理的选择合适的电影。系统采用了改进之后的ItemCF-IUF和UserCF-IIF算法,对计算用户相似度和物品相似度的计算都做出了改进。最后通过计算两种算法的准确率(Precision)、召回率(Recall)和流行度从而对系统进行评测、并比较了两种算法各自的优势和劣势。实验证明,改进后的算法比原始的协同过滤算法推荐效果要好,准确率更高。整个系统涉及到的编程语言包含Python、Html5、JQuery、CSS3以及MySQL数据库编程。用到的框架是Django重量级web框架,通过该框架连接系统的前、后端.用户首先需要 填写用户名、密码以及邮箱注册系统,然后才能登陆推荐系统。
论文首先阐述推荐系统的研究现状以及意义,随后介绍了相关的推荐算法,重点介绍协同过滤算法,并对系统实现所需技术进行了研究,接着介绍了整个推荐系统的实现,最后对整个项目进行了回顾与总结。
关键词:爬虫技术;电影推荐系统;协同过滤;基于邻域推荐;个性化服务
Abstract
Now film resources are an important part of network resources. With the increasing number of film resources on the network, it is urgent to design a personalized recommendation system for movies. So the purpose of this article is to recommend movies with high fit with each user's interests.
The system includes the film front-end display interface, film scoring board, recommendation algorithm implementation and back-end database design. The implementation of recommendation algorithm is the core of the whole film recommendation system, the system uses crawler technology and collaborative filtering algorithm, is the most famous and widely used recommendation algorithm in the field of recommendation. So the system uses two kinds of collaborative filtering algorithm to give two different recommendation results, one is based on the user's collaborative filtering algorithm, the other is based on the object's collaborative filtering algorithm, the user can choose the appropriate film more reasonably according to the two recommended results. An improved ItemCF-IU was used in the system F and UserCF-IIF algorithms, the calculation of user similarity and object similarity is improved. The system is evaluated by calculating the accuracy (Precision), recall (Recall) and popularity of the two algorithms, and the advantages and disadvantages of the two algorithms are compared. Experiments show that the improved algorithm is more effective and accurate than the original collaborative filtering algorithm. The programming languages involved in the whole system include Python、Html5、JQuery、CSS3 and MySQL database programming. Dj is the framework used ango heavyweight web framework through which the front and back ends of the system are connected. Users first need to fill in the user name, password and mailbox registration system before landing the recommendation system.
This paper first describes the research status and significance of the recommendation system, then introduces the relevant recommendation algorithm, focuses on the collaborative filtering algorithm, and studies the technology needed to implement the system, then introduces the implementation of the whole recommendation system, and finally reviews and summarizes the whole project.
Keywords:crawler technology;movie recommendation system;collaborative filtering; neighborhood-based recommendation; personalized service
目 录
摘 要 I
Abstract II
1绪论 1
1.1选题背景及意义 1
1.2国内外研究现状 2
1.3推荐算法研究 5
1.3.1协同过滤算法 5
1.3.2基于内容的推荐算法 5
1.3.4基于标签的推荐算法 6
1.4本文研究目标和研究内容 6
2相关技术介绍 7
2.1系统实现相关技术的研究 7
2.2 Python语言研究 9
2.3 Django框架研究 10
2.4 MySQL数据库研究 11
3系统分析 13
3.1需求分析 13
3.2可行性分析 13
3.2.1社会可行性分析 13
3.2.2 技术可行性分析 13
3.3用户功能需求 14
4系统设计 15
4.1系统总体架构 15
4.2电影爬虫的模型设计 16
4.3数据库介绍与设计 17
4.3.1实验数据集介绍 17
4.3.2数据库逻辑结构设计 19
4.4.3系统E-R图 20
4.4.4系统数据表设计 21
5系统实现 24
5.1数据集处理模块 24
5.2注册登录模块 24
5.3电影分类模块 25
5.4用户评分反馈模块 27
5.5用户评分记录模块 29
5.6推荐算法模块 31
5.7显示推荐模块 33
6系统测试 35
6.1测试方法 35
6.2系统功能测试 35
7总结与展望 36
7.1总结 36
7.2不足之处及未来展望 36
参考文献 38
致 谢 40