基于贝叶斯分类器对网络有效评论的判断
【摘要】 随着互联网和计算机广泛运用和普及之后,人们获取信息的速度越来越快和信息交流越来越方便,而随之崛起的电子社交网络正在冲击着传统的市场。随着电子社交网络的日益壮大,其中微博在业界的影响力越来越大,在中国网民几乎都拥有自己的微博账号,这样的规模和流量,产生了许多微博博主和微博评论者,因此随之而来的是大量爆炸的评论信息。贝叶斯分类算法是机器学习和数据挖掘领域的一种重要算法,朴素贝叶斯算法是贝叶斯分类算法中一种基本而简单的分类算法。朴素贝叶斯算法具有稳定性高、简单、高效、理论基础强等优点。朴素贝叶斯算法的分类质量在很大程度上取决于构造方法的选择,以及要分类的数据的特征和数量。朴素贝叶斯的基本要求是数据必须是独立事件。然而,在现实生活中,往往很难获得满足独立性的事件,因此朴素的贝叶斯算法在实际应用中有很大的局限性。在学习了贝叶斯分类器后,我从一些观点中受到启发,构建一个关于网络有效评论的分类器,通过这个分类器可以分类一些拥有某些特征的网络评论,根据模型最后的结果来进行评估,判断模型的性能、效率。
【关键词】贝叶斯分类器,朴素贝叶斯,有效评论分类模型,贝叶斯算法,机器学习
Judgment of Effective Network Comments Based on Bayesian Classifier
【Abstract】 With the widespread use and popularization of the Internet and computers, people get information faster and faster, and information exchange is more and more convenient, while the rise of electronic social networks is impacting the traditional market. With the growing of the electronic social network, the influence of micro-blog in the industry is growing. In China, almost all netizens have their own micro-blog accounts. This scale and flow has produced many micro-bloggers and micro-blog commentators, so a large number of explosive comments are followed. Bayesian classification algorithm is an important algorithm in the field of machine learning and data mining. Naive Bayesian algorithm is a basic and simple classification algorithm in Bayesian classification algorithm. Naive Bayesian algorithm has the advantages of high stability, simplicity, high efficiency and strong theoretical basis. The classification quality of Naive Bayesian algorithm depends to a great extent on the choice of construction method, as well as the characteristics and quantity of data to be classified. The basic requirement of NaiveBayes is that data must be independent events. However, in real life, it is often difficult to obtain events satisfying independence, so the naive Bayesian algorithm has great limitations in practical application. After learning Bayesian classifier, I was inspired by some viewpoints to construct a classifier about effective comments on the network. Through this classifier, we can classify some network comments with certain characteristics, and evaluate the performance and efficiency of the model according to the final results of the model.
【Key Words】Bayesian Classifier, Naive Bayesian, Effective Review Classification Model, Bayesian Algorithms, Machine Learning
图目录
图2.1 朴素贝叶斯结构
图4.1 多项式模型、伯努利模型、混合模型关系图
表目录
表4.1传统朴素贝叶斯与混合模型朴素贝叶斯的比较
表4.2传统朴素贝叶斯与拉普拉斯平滑朴素贝叶斯的比较
表4.3混合模型朴素贝叶斯与拉普拉斯平滑朴素贝叶斯的比较