基于pyparsing的tex文件处理
【摘要】 Tex是一种在数学、物理学和计算机科学界十分流行的排版系统,而python是一种解释型计算机程序设计语言。Pyparsing是纯python编写的,易于使用。本文用python的解析工具pyparsing对tex 文件进行解析。主要是解析数学公式。首先从数学类tex文件中总结出tex数学公式的上下文无关文法。然后用pyparsing 写出解析程序。利用pyparsing通过语法分析,可以提取tex文件的主要内容。论文运用经验与逻辑结合的分析方法。逻辑是指tex文件固定的文法;经验主要对tex文件个例的分析,总结作者的编写习惯,提出一些化繁为简的方案。全面解析tex文件有些困难,主要是解析数学表达式。最后得出结论。因此要用pyparsing设计出针对tex文件的解析语法。解析结果可以表示tex中数学公式的逻辑结构。这对学术研究非常有价值。
【关键词】上下文无关文法, pyparsing模块, tex文件解析,正则表达式
Tex file processing based on pyparsing
【Abstract】Tex is a very popular typesetting system in the fields of mathematics, physics and computer science, while python is an interpreted computer programming language. Pyparsing is written in pure python that is easy to use. This article mainly analyzes the tex documents with the pyparsing, especially the mathematical formula. Firstly, the context free grammar of tex mathematical formula is summed up from the tex documents. And we write the parsing program by pyparsing. With syntax parser of pyparsing we can extract the main contents of the tex files. This article employs the analyzing method of combining experience and logic. Logic refers to the Tex file fixed grammar; experience refers to the case analysis of tex files, considering the author's writing habits, simplifying the programming. A completed analysis of the whole tex file is difficult, hence we analyze mathematical expressions mianly. Finally we come to the conclusion. So we design an analytical syntax for the tex file with pyparsing. Parsing results can be expressed in the logic structure of mathematical formulas in tex file. This is very valuable for academic research.
【Key Words】 context free grammar,pyparsing module,tex file parsing,regular expression
目 录
1 绪论
1.1 研究意义
1.2 研究现状
1.2.1 python发展现状
1.2.2 Tex文件处理的现状
1.3 研究内容
1.3.1 研究的重点和难点
1.3.2 拟解决的关键问题
1.3.3 研究的方法及措施
1.3.4 成果
2 理论基础
2.1 Python特点与基本语法
2.1.1代码块与缩进
2.1.2控制语句
2.1.3数据类型和运算
2.1.4函数
2.1.5其他
2.2 形式文法基本原理
2.3 数学类tex文件基本内容
2.4 数学公式文法
3 pyparsing 程序实现
3.1 pyparsing解析器设计原理
3.2 实验
结 论
参考文献
附 录
致 谢