摘要
网易云音乐是由网易(中国)开发的一款音乐软件,拥有着极其庞大的用于群体。网易云音乐的一大特色和吸引用户的地方是有着比较温和的听众群体,有许多的用户在一些经典的歌曲下面留下了很多能够引起共鸣的评论和小故事,这些东西也会吸引用户之间的相互关注,同时增强了网易云音乐的用户粘性。歌曲的评论数也同时成为了歌曲受欢迎程度的一个参考指标。同时,网易云音乐根据用户的听歌记录,借助于强大的推荐系统,为用户提供了每日歌单的功能,所推荐的歌曲也有很大的可能符合用户的口味。并且,网易云音乐为用户提供了一定的社交功能,用户拥有这自己的个人信息,这些信息从某种程度上来说可以作为用户的个体特征来进行分类和分析。比如,通过分析用户的年龄信息,可以知道用户群体在年龄这个维度上的分布情况;通过分析用户所在地址,可以知道用户群体在地理位置上的分布情况等等。通过这些分析,再结合数据可视化的技术,可以较为清晰的展示和观察用户群体的特征和分布。在整个用户群体中, 存在着许多的关联关系. 这些用户可以随意的互相关注。用户之间相互关注的情况可以用一个很大的有向图来表示, 用户是有向图中的点, 每一个用户对另外一个用户的关注就是有向图的一条边. 这篇论文对部分用户信息进行了采集,分析了用户的个体特征, 用户之间相互关注, 用户之间间接性的相互关联(基于floyd算法)等等在地域上的分布情况,最后进行可视化展现, 使得结果直观明了.
关键字:数据可视化,网络爬虫,网易云音乐,D3.js
Abstract
Netease Cloud Music is a music software developed by NetEase (China), which has a very large group for use. One of the major features of Netease Cloud Music and the place where it attracts users is a relatively moderate audience group. Many users have left behind a number of classic songs that have attracted a lot of resonating comments and stories. These things will also attract users. The mutual attention between them has enhanced the user stickiness of Netease Cloud Music at the same time. The number of reviews of songs also became a reference indicator of song popularity. At the same time, Netease Cloud Music, based on the user's song recording, uses a powerful recommendation system to provide users with daily song list functions, and the recommended songs are also likely to meet the user's taste. In addition, NetEase Cloud Music provides users with certain social functions. Users own this personal information. To some extent, these information can be categorized and analyzed as the individual characteristics of users. For example, by analyzing the age information of the user, the distribution of the user group in the age can be known. By analyzing the address of the user, the geographical distribution of the user group can be known. Through these analysis, combined with the data visualization technology, you can clearly demonstrate and observe the characteristics and distribution of user groups. In the entire user group, there are many associations. These users can freely pay attention to each other. The situation of mutual concern between users can be represented by a large directed graph, where the user is a point in the directed graph, and the attention of each user to another user is an edge of the directed graph. The user information is collected, the user's individual characteristics are analyzed, the users are concerned with each other, and the indirect correlation between the users (based on the floyd algorithm) and the like are distributed in the region, and finally the visualization is performed, so that the result is intuitive and clear.
Keywords:Data Visualization , Network Spider , Netease Cloud Music ,D3.js
目 录
第1章 绪论 1
1.1 研究的意义 1
1.2 内容安排 1
第2章 数据可视化概述 2
2.1 概述 2
2.2 基于D3的数据可视化技术 2
2.3 基于地图可视化在D3中的实现的数据可视化技术 3
第3章 数据的爬取 6
3.1 原理 6
3.2 调用官方后台API的具体方法 6
3.3反爬虫策略 12
3.4代理程序 12
3.5 Node.js基于事件循环的异步非阻塞并发机制 16
3.6 并发锁 16
3.7 异常处理 17
第4章 数据的结构 17
4.1 JSON格式与JSON文件 17
4.2 User的结构 18
4.3 Follow的结构 19
第5章 数据的存储 20
5.1 数据库的选择 20
5.2 数据库的搭建 20
5.3 数据库的管理 21
5.4 数据库的连接 21
5.5 数据库的优化 22
第6章 数据的分析与可视化 22
6.1 用户的总体分布情况 22
6.2 用户在不同省份的分布 25
6.3 用户之间相互关注的情况的对比分析 36
6.4 采用floyd算法计算用户之间的距离 40
第7章 结论 43
参考文献 44
致 谢 45