基于Hadoop的一卡通消费数据分析
摘 要
现今,大多数学校的一卡通消费数据分析技术是在校园一卡通数据库标准基础上,建立数据仓库系统并在此基础上进行OLAP(联机分析处理)和数据挖掘。通过对学生消费数据的分析,对学生的学习、生活进行分析。但是校园卡一年的记录数据就有万条左右,文件数据量过大,如此庞大的数据对挖掘算法的复杂度、计算机性能要求很高,大大增加了系统管理难度和使用成本。
如此巨大而又庞大的数据量必然无法用单台的计算机进行处理,必须采用分布式架构。 而Hadoop是一个能够对大量数据进行分布式处理的软件框架。 Hadoop 以一种可靠、高效、可伸缩的方式进行数据处理;它以并行的方式工作,通过并行处理加快处理速度;并且他的成本比较低,任何人都可以使用,可以轻松地在Hadoop上开发和运行处理海量数据的应用程序。
关键词:Hadoop;一卡通;学生消费;数据分析
Abstract
Today, most schools card consumption data analysis techniques in the campus card is based on the standard database, data warehouse systems and OLAP (Online Analytical Processing) On this basis, and data mining. Through analysis of consumption data for students, student learning, life analysis. But one year, there are around campus card data recording million, the file data is too large, such a large data mining algorithms complexity, high performance requirements of the computer, greatly increasing the difficulty and cost management systems.
So huge and massive amounts of data can not necessarily be treated with a single computer, you must use a distributed architecture. The Hadoop is capable of processing large amounts of data distributed software framework. Hadoop in a reliable, efficient and scalable approach to data processing; it is a parallel manner, through parallel processing for faster processing; and his relatively low cost, anyone can use, you can easily develop on Hadoop and running massive data processing applications.
Keywords: Hadoop; card; student consumption; data analysis
目 录