一种基于Viterbi算法的汉语切词方法毕业论文+任务书+开题报告+文献综述+外文翻译及原文-毕业作品网站

一种基于Viterbi算法的汉语切词方法

【摘要】：针对汉语切词问题，本文主要从粗糙分割和识别、分段消歧、部分语音标记等未知字，首先在预处理阶段，结合最短路径方法和分割方法，在N—统计数据的基础上，进行最短路径分析，得到最佳的一组粗点结果；然后基于词典和隐马尔可夫切分词分割方法的结合，进一步优化切割词的结果，通过暂存词典不断扩充基本词典，改进隐马尔科夫模型主要通过改进维特比算法求解序列问题和改进的Baum Welch -算法求解参数问题；识别的未注册的话说，引入语法规则库，降低了主体尺寸的要求，提高了识别的准确性。为了进一步提高识别的效率，根据多个活跃代理的理论，增加了匹配的监控机制，这篇论文提出了一个基于多个活跃代理的中文的实体识别方法。最后，设计并实现了基于语法的中文词切分系统。通过分析实验结果，系统能够更好地识别和消除未注册的单词。在相关领域也得到了广泛运用

【关键词】：汉语切词；N-最短路径方法；隐马尔科夫模型

【Abstract】:According to Chinese word automatic cutting problem, this article mainly from the unknown words from coarse segmentation and recognition, segmentation disambiguation and part-of-speech tagging, first in the pretreatment stage,combined with the shortest path method and the segmentation method, the shortest path based on N - statistics a met model, N the optimal set of coarse points results; Then based on dictionary and hidden Markov cut method,combining with the result of cutting word further optimization, through the temporary dictionary constantly expanding the basic dictionary, improved hidden Markov model is mainly by improving the Viterbi algorithm solving the problem of sequence and improved Baum Welch algorithm solving the problem of parameter; For the recognition of unregistered words,the grammar rules library is introduced to reduce the requirements of the statistical method on corpus scale and improve the accuracy of recognition.For named entity recognition,in order to further improve the efficiency of recognition, according to the theory of multiple active agent, increased the matching monitoring mechanism, this paper proposes a Chinese named entity recognition method based on multiple active agent。 Finally,designed and implemented based on the grammar of Chinese word cutting system, by analyzing the experimental results,the unknown words from the system has good recognition ability and ambiguity resolution ability, is a good performance of cutting word system.

【Key Words】: Chinese word cut； N - the shortest path method； Hidden Markov model