摘 要
随着互联网技术的迅猛发展,微博等社交媒体平台已成为人们获取信息、表达意见和分享生活不可或缺的一部分。然而,谣言在这些平台上的迅速蔓延对个人、社会乃至国家均构成了潜在的负面影响。因此,如何有效识别并控制谣言的传播成为了一个亟待解决的关键问题。本课题聚焦于基于卷积神经网络(CNN)和门控循环单元网络(GRU)的微博谣言识别研究,旨在通过先进的机器学习和深度神经网络模型,探索一种高效且准确的谣言识别方法。
在导师的精心指导以及学院相关科研项目的支持下,本课题充分利用了国内外在谣言检测领域的最新研究成果和丰富的开源数据资源。我们基于来源于新浪微博的140,664条标注谣言数据集进行了深入的建模分析。首先,通过细致提取谣言数据中的文本信息和时间信息,我们构建了门控循环单元网络(GRU)模型,实现了基于序列特征的微博谣言检测算法,该模型在测试集上的准确率达到了89.36%。为了进一步提升检测效果,我们创新性地引入了卷积神经网络(CNN)来提取文本的深层特征,并构建了CNN-GRU复合网络模型,该模型在测试集上的准确率提升至92.43%。
此外,本课题还进行了早期谣言检测实验,验证了所提两种模型的有效性和实用性。我们不仅深入分析了谣言事件与非谣言事件在传播过程中的差异,还优化了序列划分方式,分别对源微博文本与转发评论信息进行特征提取,从而显著提升了谣言检测的准确率与性能。实验结果表明,所提出的模型在中文微博数据集上展现出了卓越的性能,具有较高的实际应用价值。
关键词:微博谣言识别;卷积神经网络(CNN);门控循环单元网络(GRU);Python;数据分析
Abstract
With the rapid development of Internet technology, social media platforms such as Weibo have become an indispensable part of people to obtain information, express opinions and share their lives. However, the rapid spread of rumors on these platforms has a potential negative impact on individuals, society and even countries. Therefore, how to effectively identify and control the spread of rumors has become a key problem to be solved urgently. This topic focuses on the research of microblog rumor identification based on convolutional neural network (CNN) and gated cycle unit network (GRU), aiming to explore an efficient and accurate rumor identification method through advanced machine learning and deep neural network model.
Under the careful guidance of the tutor and the support of the relevant scientific research projects of the college, this project makes full use of the latest research results in the field of rumor detection and rich open source data resources. We conducted an in-depth modeling analysis based on 140,664 annotated rumor datasets derived from Sina Weibo. First of all, by carefully extracting the text information and time information in the rumor data, we built a gating cycle unit network (GRU) model, and realized the microblog rumor detection algorithm based on sequence features. The accuracy of this model on the test set reached 89.36%. In order to further improve the detection effect, we innovatively introduced the convolutional neural network (CNN) to extract the deep features of the text, and built a CNN-GRU composite network model, and the accuracy of the model in the test set increased to 92.43%.
In addition, this topic also conducted early rumor detection experiments to verify the validity and practicability of the two proposed models. We not only deeply analyzed the differences between rumor events and non-rumor events in the propagation process, but also optimized the sequence division method, and extracted the features of the source microblog posts and forwarded comments, respectively, thus significantly improving the accuracy and performance of rumor detection. The experimental results show that the proposed model shows excellent performance in the Chinese microblog data set and has high practical application value.
Key words: Weibo rumor recognition; Convolutional neural network (CNN); gated cycle unit network (GRU); Python; data analysis