摘 ;要:针对目前互联网上关于页面自动登录环节出现的难点,由于部分登录界面有验证码的存在,自动登录的时长被增加,并且有的验证码难以识别,这就提出了基于Python和卷积神经网络(CNN)相结合的验证码识别。首先本文對三千多张验证码的样本集进行图片预处理,分别有灰度化处理、二值化处理和去噪点处理三步操作。然后利用三个池化层和一个全连接层的结构设计卷积神经网络,随后训练样本集,并对随机的十个样本进行预测。
关键词:验证码;Python;二值化;卷积神经网络
Abstract:The paper focuses on the current difficulties in the automatic login of pages on the Internet.Due to the existence of verifications code in some login interfaces,the duration of automatic login is increased,and some verification codes are difficult to identify.Accordingly,based on Python and Convolutional Neural Network (CNN),a combined identification of verification codes is proposed in this study.Firstly,this paper preprocesses the sample set of more than 3,000 verification codes,including the three processing steps of graying,binarization and denoising.The Convolutional Neural Networ is then designed using three pooled layers and a fully connected layer structure,followed by training the sample set and predicting ten random samples.
Keywords:verification code;Python;binarization;Convolutional Neural Network
目录
1 ; 引言(Introduction) 3
2 ; 图片预处理(Image preprocessing) 3
2.1 ; 灰度化 3
2.2 ; 二值化处理 3
2.3 ; 求连通域面积去噪点处理 4
3 ;利用CNN进行样本训练(Sample training using CNN ) 4
3.1 ; 卷積神经网络概述 4
3.2 ; 模型训练 4
3.2.1 ; 图片信息代码化 5
3.2.2 ; 定义函数 5
3.2.3 ; 生成采样集 5
3.2.4 ; 定义卷积神经网络结构 5
3.2.5 ; 训练样本集 5
3.3 ; 模型预测 6
4 ; 结论(Conclusion) 15
参考文献(References) 16