基于贝叶斯分类器的淘宝用户分类模型研究
【摘要】 随着互联网和计算机广泛运用和普及之后,人们获取信息的速度越来越快和信息交流越来越方便,而随之崛起的电子商务正在冲击着传统的市场。电商市场的逐步扩大竞争日益激烈,从而产生了一些恶性竞争和诈骗骚扰,破坏了公平的竞争环境,比如淘宝中垃圾小号的产生、漫天的旺旺广告。大量垃圾小号的涌入使得人们完全不能简单靠人工来处理,针对这一现象我们需要一些工具或者方法,能帮助我们更好地发现、过滤和管理这些控制着垃圾小号的淘宝用户。在研究贝叶斯分类器后,我从一些学者的对垃圾邮件分类的模型中受到了启发,通过改进,建立了淘宝用户分类模型,并借助编程软件设计通过若大量的数据训练出一个分类器。通过对垃圾小号的会员名和支付宝账号中某些字词的组合,注册行为,邮件后缀,垃圾小号的异常行为等特征参数的分析,去判断是否为垃圾小号。模型最后的结果再通过召回率和准确率来进行评估,判断模型的性能、效率。
【关键词】贝叶斯分类器,朴素贝叶斯,用户分类模型,召回率,准确率。
The construction of financial condition index
based on L1-LSSVR model
【Abstract】 With the wide application and popularization of Internet and computer, people get more and more information faster and more conveniently. And the rising e-commerce is attacking the traditional market. The gradual expansion of competition between the e-commerce market and the increasingly fierce competition. So some vicious competition and fraudulent harassment. Disrupt a fair competitive environment. For example, the emergence of Taobao's garbage trumpet and the advertising of all over the world. The influx of a large number of garbage trumpets makes it impossible for people to simply rely on manual labor. We need some tools or methods to deal with this phenomenon. It helps us better identify, filter and manage Taobao users who control garbage trumpets. After studying the Bias classifier, I have been inspired by the scholars’ models of spam classification. After improving the models, our Taobao user classification model is established. With the help of programming software, a classifier can be trained through a large amount of data. According to the analysis of the parameter characteristics of the abnormal behavior of the small spam alts, such as the combination of certain words, small spam member names and Alipay in the account registration, mail suffixes, we judge whether it is a spam alt. The final result of the model is evaluated by recall rate and accuracy rate, for judging the performance and efficiency of the model.
【Key Words】 Bayes classifier,Naive Bayes,User classification model,Recall Rate,accuracy.
表目录
表4-1 变量选取 11
表4-2 淘宝用户分类模型数据的特征选择结果 12
表4-3 样本 13