摘 要
无论何时何地,信息都很重要。随着万维网的飞速发展,信息以指数的形式出现爆炸式增长。当传统的信息处理延伸到互联网领域时,往往需要下载分布在各个网站本地的信息进行进一步处理。但是,当收集到大量数据时,现有方法自然不适用,而现有方法在网上搜索信息时无法避免的问题之一是难以区分和选择令人眼花缭乱的信息.为了更高效、更准确地获取想要的信息,大量的时间信息是网络爬虫的绝佳手段。自定义规则让您挖掘特定网站的相关信息,筛选后得到更准确的信息。
本设计主要对外卖饮料数据进行数据挖掘和分析。我国外卖市场经过几十年的发展,外卖行业规模不断扩大。外卖行业的规模可以持续增长,外卖平台依托当今领先的外卖平台强大的大数据技术,依靠大数据分析和强大的运营优化和机器学习,在线完成所有环节。线下送货单。外卖平台是一个数据精细化的平台,每一个数据都是一个整体质量的纽带,同时每一个数据都反馈一个当下的问题,通过调整目标来提升质量不再是难事。
网络爬虫主要使用Python 脚本语言。使用Tkinter 库构建图形界面,操作简单。换句话说,单击该按钮会触发该功能。数据存储不使用SQL和NoSQL,网络爬虫的爬取结果直接存储在xlsx文件中,方便数据读取和数据可视化。使用matplolib库,用pandas库读取xlsx文件,通过读取获取的数据创建散点图或直方图,可以轻松观察数据分析。本项目是一款外卖饮料数据分析系统的设计与实现。最后,用户可以登录系统完成饮料数据分析的可视化。
关键词:网络爬虫;Python;数据挖掘;数据分析 ;外卖饮品
ABSTRACT
Information is important wherever and everywhere.With the rapid development of the World Wide Web, information has exploded in an exponential form.When the traditional information processing extends to the Internet field, it is often necessary to download the information distributed in each website for further processing.But, when large amounts of data are collected, existing methods naturally do not apply, and one of the problems that existing methods cannot avoid when searching for information online is the difficulty in distinguishing and choosing dazzling information. In order to obtain the desired information more efficiently and accurately, a large amount of time information is the perfect means for the web crawler.Custom rules allow you to mine the information about a particular site and filter it to get more accurate information.
This design mainly conducts data mining and analysis of external beverage sales data.After decades of development in China's takeout market, the scale of the takeout industry continues to expand.The scale of the takeout industry can continue to grow. Relying on the powerful big data technology of today's leading takeout platform, big data analysis, powerful operation optimization and machine learning, the takeout platform can complete all the links online.Offline delivery order.The delivery platform is a platform for data refinement, and each data is an overall quality link. At the same time, each data feeds back a current problem. It is no longer difficult to improve the quality by adjusting the target.
Web crawlers mainly use the Python scripting language.Building a graphical interface using the Tkinter library is easy to operate.In other words, clicking the button triggers the feature.Data storage does not use SQL and NoSQL, and crawling results of network crawlers are stored directly stored in the xlsx file to facilitate data reading and data visualization.Using the matplolib library, the xlsx file is read with the pandas library, and a scatter plot or histogram is created by reading the acquired data to easily observe the data analysis.This project is the design and implementation of a takeaway beverage data analysis system.Finally, users can log into the system to complete the visualization of the beverage data analysis.
Key words: web crawler; Python; data mining; data analysis; takeaway drinks
目录
第1章 绪论 1
1.1选题背景 1
1.1.1课题的国内外的研究现状 1
1.1.2课题研究的必要性 2
1.2课题研究的内容 3
第2章 开发软件平台介绍 4
2.1 软件平台 4
2.2 开发语言 4
2.2.1 Python 4
2.2.2 html+css+js 4
第3章 网络爬虫总体方案 7
3.1 系统组成 7
3.2 工作原理 7
第4章 模块化设计 8
4.1 Tkinter图形界面模块介绍 8
4.1.1图形模块的略解 8
4.1.2图形界面模块与其他模块的交互 8
4.2 爬虫模块 10
4.2.1 requests库的说明及使用 10
4.2.2 bs4库的说明及使用 10
4.2.3 json模块的说明及使用 11
4.2.4爬虫模块的流程解析 11
4.3 数据分析模块 13
4.4 反爬虫模块 13
第5章 实验结论与发展前景 14
5.1网络爬虫主要实现代码 15
5.2 xlsx文件 15
致 谢 19
参考文献 20