摘 要
近年来,大数据引领的风潮不断向各大领域推进,推动了各个领域的发展和变革,教育领域也毫不例外地迈入了大数据时代。教育信息化的快速发展积累了大量的数据,而其中最为重要的就是考试成绩。因此,我们如何利用科学的方法对这些数据进行挖掘与分析,不断革新学生的学习模式、教师的教学模式以及教育政策制定的方法,这个问题引起了广大教育工作者们的关注。
目前,大部分学生成绩分析的研究方向一方面是基于关联规则算法展开的,另一方面是基于决策树算法展开的。尽管关联规则能够挖掘出课程之间的关联性,但是却没有考虑学生、教师自身等个性化因素对学生成绩造成的影响,因此关联规则得到的结果通常在个体分析时有失偏颇。而决策树算法虽然能够实现个体学生成绩的预测,但是由于没有考虑到个体学生的课程间的关联性,因而决策树算法得到的预测结果准确度不高。如果能结合关联规则和决策树两者的优势,弥补两者的劣势,就能实现高效的挖掘分析。
针对目前研究方向的缺陷,本文提出一种高效的关联规则和决策树组合算法,综合考虑学生课程间的关联性和学生、教师自身等个性化因素,以期提高学生成绩分析结果的准确性。
首先,本文基于本校信息与安全工程学院信管专业的学生课程成绩设计以学生成绩为主题的数据仓库,为后续的成绩分析提高可靠的数据支持。其次,利用关联规则算法挖掘分析课程间的关联性,并生成用于构造决策树的新属性。最后,通过信息增益率的思想将生成的新属性和原有属性构造成决策树, 实现学生成绩分析预测。
综上所述,本文提出的组合算法应用于学生成绩分析是可行的,其得到的分析预测结果更全面,个体课程成绩预测更为准确,具有一定的实用性。
关键词:数据挖掘;关联规则;决策树;成绩分析;数据仓库
Abstract
In recent years, the trend which big data leads has continued to advance in all major areas, promoting development and changes in various fields, and the educational field has entered the era of big data without exception. The rapid development of educational informatization has accumulated a large amount of data, and the most important one is the test scores. Therefore, how do we use scientific methods to mine and analyze these data, and constantly innovate students' learning modes, teachers' teaching modes, and educational policy-making methods? This issue has become the focus of educators.
At present, most of the research on student achievement data mining is based on association rule algorithm or decision tree algorithm. Although the association rules can excavate the relevance between courses, it does not consider the impact of individual factors such as students and teachers on scores. Therefore, the results obtained by association rules often have an inaccuracy in individual analysis. However, although the decision tree algorithm can achieve the prediction of individual scores, the accuracy of the prediction results obtained by the decision tree algorithm is not high because the relevance between individual students' courses is not taken into account. If we can combine the advantages of both the association rules and the decision tree to make up for the disadvantages of both, we can achieve efficient mining analysis.
In view of the shortcomings of the current research direction, this paper proposes an efficient combination algorithm of association rules and decision trees, which comprehensively considers the correlation between students' courses and individual factors such as students and teachers themselves, in order to improve the accuracy of student performance analysis results.
First of all, this paper designs a data warehouse based on students’ scores as a result of the student's course performance of the College of Information and Security Engineering College of Information Management so that we can provide reliable data support for subsequent scores’ analysis. Secondly, the association rule algorithm is used to mine and analyze the correlation between courses and generate new attributes for constructing a decision tree. Finally, through the idea of information gain rate, the generated new attributes and original attributes are constructed into a decision tree to achieve students’ scores analysis and prediction.
In summary, the combination algorithm proposed in this paper is applicable to the analysis of students’ scores. The analysis and prediction results obtained by it are more comprehensive, and the individual course performance is more accurate and has certain practicality.
Key words: Data mining; Association rules; Decision tree algorithm; Performance analysis; Data warehouse
目 录
一、绪论 1
(一)选题背景 1
(二)国内外研究现状 1
(三)论文的主要研究内容与意义 3
(四)论文的架构 3
二、数据仓库在成绩分析中的应用与设计 4
(一)数据仓库 4
(二)学生成绩数据仓库概述 7
(三)学生成绩数据仓库的概念模型设计 7
(四)学生成绩数据仓库的逻辑模型设计 8
(五)学生成绩数据仓库的物理模型设计 10
(六)学生成绩数据仓库的数据加载 11
三、数据挖掘技术综述 12
(一)数据挖掘 12
(二)关联规则 13
(三)决策树算法 15
(四)其他数据挖掘算法 18
四、关联规则和决策树组合算法在学生成绩分析中的应用 18
(一)关联规则和决策树组合算法概述 18
(二)关联规则在学生成绩分析中的应用 19
(三)关联规则与决策树组合算法在学生课程成绩分析中的应用 24
五、总结与展望 28
主要参考文献 29
附录 31