基于Hadoop协同过滤的商品推荐
摘要:目前单机推荐系统己经不能计算海量数据和存储等需求,分布式的推荐系统现在已经成为今年来的热点。
基于Hadoop的电商的分布式大型推荐系统这一课题,我通过阅读大量的文献和研究,分析了国内外研究现状及面临问题。对电子商务推荐系统常见的推荐算法讨论,还有本文还对Hadoop平台的两大核心技术HDFS和MapReduce的工作流程和原理进行了介绍。对于电子商务传统推荐系统存在的很多问题,我论文基于Hadoop的电商推荐系统,系统有Hadoop的优点能具备良好的可扩展性伸缩性,可以方便快捷的对系统的计算能力和存储能力调大调小根据业务需求进行调整。对数据预处理、基于物品的协同过滤推荐算发还有混合推荐算法的相似性计算等算法设计,使之完美的运行在Hadoop平台上,实现分布式的推荐计算。
关键词:电子商务;推荐系统;协同过滤;Hadoop
Product recommendation based on Hadoop collaborative filtering
Abstract: At present, the single recommendation system has not been able to calculate the mass data and storage requirements, and the distributed recommendation system has become a hot topic this year.
Hadoop-based e-commerce recommendation system of this subject, by reading a lot of literature, analysisThe status quo and problems at home and abroad. This paper also introduces the workflow and principles of HDFS and MapReduce, the two core technologies of Hadoop platform, which are the common recommendation algorithms for E-commerce recommendation system. For the existing problems of traditional e-commerce recommendation system, we designed Hadoop-based e-commerce recommendation system, the system has good scalability and scalability, you can easily calculate the system and storage capacity according to business needs to adjust. The data preprocessing, based on the collaborative filtering of items, is also based on the similarity calculation of the hybrid recommendation algorithm, so that it can run perfectly on the Hadoop platform to realize the distributed recommendation calculation
Key Words: e-commerce; recommendation system; collaborative filtering; Hadoop
目 录
摘要
Abstract I
1绪论
1.1研究背景与意义
1.1.1信息过载与个性化服务
1.2现阶段关于推荐系统的现状
1.3论文的主要工作及结构安排
2.Hadoop简介
2.1 Hadoop简介
2.2 Hadoop生态简介
2.3详细介绍HDFS和MapReduce
2.3.1 HDFS
2.3.2 MapReduce
3.商品推荐系统的设计
4.具体的算法实现和效果
4.1数据切分
4.2 用户向量
4.3 共现矩阵
4.4 用户矩阵乘以共现矩阵
4.5 按商品ID做累加
4.6 除去用户已经购买过商品
4.7 验证
5.实验
5.1 数据集的介绍
5.2实验平台简介
附录A
附录B
附录C
附录D
附录E
附录F
参考文献