摘 要
Python为网页数据爬取和数据分析提供了很多工具包.基于Python的BeautifulSoup可以快速高效地爬取网站数据,Pandas工具能方便灵活地清洗分析数据,调用Python的Matplotlib工具包能便捷地把数据分析结果图形可视化.该文借助Python功能完备的标准库,强大的第三方库requests,BeautifulSoup以及正则表达式,通过编程完成对文件film.csv中电影信息数据的读取;对读取的数据进行清洗和整理;利用Bar函数编程输出影片的周平均票房(周平均票房指文件中的所有涉及城市周票房总平均),Y轴表示票房收入,单位万元;X轴表示电影名称。
通过matplotlib图形库以图形化的方式直观地展示数据结果,并加以分析,得出相关结论。该文研究为培养学生数据处理能力和可视化分析能力奠定了基础。
关键词:Python;爬虫;爬取;电影;数据
Abstract
Python provides many toolkits for web data crawling and data analysis. Python - based BeautifulSoup can quickly and efficiently crawl Web data, Pandas tools can easily and flexibly clean and analyze data, Calling the Python Matplotlib toolkit can easily visualize the data analysis results. Based on Python functional standard library, Powerful third-party library requests, BeautifulSoup and regular expressions, Complete the reading of movie information data in file film.csv by programming; Clean and organize the read data; By using the Bar function to program the output of the weekly average box office of the film (the weekly average box office refers to the total average of all the weekly box offices involved in the city), Y axis represents box office revenue, Unit 10,000 yuan; X axis represents the film name.
By matplotlib the graphic library to visualize the data results and analyze them, the relevant conclusions are drawn. This paper lays a foundation for cultivating students' data processing ability and visual analysis ability.
Keywords: Python; crawler; crawling; movie; data