WeHack BBS
[笔记] 用ANTLR可以根据语法直接从源码生成S表达式形式的语法树 - 可打印的版本

+- WeHack BBS (https://bbs.wehack.space)
+-- 版块: 计算机技术 (https://bbs.wehack.space/forum-5.html)
+--- 版块: 程序设计讨论区 (https://bbs.wehack.space/forum-14.html)
+--- 主题: [笔记] 用ANTLR可以根据语法直接从源码生成S表达式形式的语法树 (/thread-366.html)



[笔记] 用ANTLR可以根据语法直接从源码生成S表达式形式的语法树 - vimacs - 05-04-2023

我想直接从语法产生一个语法分析器,直接从源码生成语法树。问了一下Bing AI,发现ANTLR可以做这件事。

vimacs: Hmm, what I want is a tool that makes a parser for me, when I have the grammar like "expr : expr '+' expr". I don't want to write any C snippets, instead I want this tool to make a parser that just emits the parsed source to some structured format like JSON or SXML. For example, when the input is an expr "a + b", the output is something like ('expr (+ ('expr "a") ('expr "b"))).

AI: If you want a tool that makes a parser for you from a grammar without writing any C snippets, you might want to try **PackCC**¹. PackCC is a parser generator for C that uses a PEG (Parsing Expression Grammar) to describe the grammar. It supports direct and indirect left recursion and generates easy-to-understand parser source codes. You can also use **$$** and **$n** symbols to build a parse tree, and then output it to any format you like, such as JSON or SXML. For example, if you have a rule like this:

`expr <- expr '+' expr / IDENTIFIER`

This means that an expression can be composed of two expressions separated by a plus sign, or an identifier. The value of this expression is the same as the matched text by default, but you can change it by using **$$**. For example, you can write something like this:

`expr <- expr '+' expr { $$ = json_object("type", "expr", "op", "+", "left", $1, "right", $3); } / IDENTIFIER { $$ = json_object("type", "expr", "value", $1); }`

This will output a JSON object with the type, operator, and operands of the expression. You can find more details and examples on how to use PackCC on its GitHub page¹.

Source: Conversation with Bing, 5/4/2023
(1) GitHub - arithy/packcc: A parser generator for C. https://github.com/arithy/packcc.
(2) Good tools for creating a C/C++ parser/analyzer - Stack Overflow. https://stackoverflow.com/questions/526797/good-tools-for-creating-a-c-c-parser-analyzer.
(3) Free Grammar Checker - QuillBot AI. https://quillbot.com/grammar-check.
(4) Types of Parsers in Compiler Design - GeeksforGeeks. https://www.geeksforgeeks.org/types-of-parsers-in-compiler-design/.

vimacs: Hmm, it still requires writting some C code to work.

AI: Yes, it does. PackCC generates a parser in C, so you need to write some C code to use it. However, you don't need to write any C snippets in the grammar file, unless you want to customize the value of each rule. You can also use some helper functions to output the parse tree in different formats, such as json_object or sxml_object. If you want a tool that doesn't require any C code at all, you might want to look for some alternatives that are not parser generators, but parsers themselves. For example, you can try **ANTLR**, which is a powerful tool that can generate parsers in different languages, such as Java, Python, C#, etc. You can also use **ANTLRWorks**, which is a graphical user interface for ANTLR that allows you to create and test grammars interactively. ANTLR also uses a PEG-like syntax to describe the grammar, and supports direct and indirect left recursion. You can find more information and examples on how to use ANTLR on its website and documentation.

If you want a tool that makes a parser for you from a grammar without writing any C code at all, you might want to try **ANTLR**¹. ANTLR is a powerful parser generator that uses a LL(*) algorithm for parsing. It can generate parsers in different languages, such as Java, C#, Python, JavaScript, and more. You can use ANTLR to describe your grammar in a PEG-like notation, and then use the ANTLR tool to generate the parser code. You can also use ANTLR to output the parse tree in different formats, such as JSON or XML. For example, if you have a rule like this:

`expr : expr '+' expr | ID ;`

This means that an expression can be composed of two expressions separated by a plus sign, or an identifier. You can use the **-tree** option to output the parse tree in a Lisp-like notation. For example, if the input is an expr "a + b", the output is something like this:

`(expr (expr a) + (expr b))`

You can find more details and examples on how to use ANTLR on its website¹ and GitHub page².

Source: Conversation with Bing, 5/4/2023
(1) ANTLR. https://www.antlr.org/.
(2) Antlr Project · GitHub. https://github.com/antlr.
(3) ANTLR - Wikipedia. https://en.wikipedia.org/wiki/ANTLR.

vimacs: Thanks. The ANTLR -tree option is the thing I want. Are there more tools with similar features?

You're welcome. There are some other tools that can output parse trees from grammars, but they might not have the same features as ANTLR. For example, you can try:

- **PHP Syntax Tree**¹, which is an online tool that can draw parse trees from bracketed notation. You can use it to visualize your grammar rules and their output, but it doesn't generate any code or format the output in JSON or SXML.
- **Parse Tree Formatter**², which is a Python script that can format parse trees in plain or fancy mode. You can use it to collapse or expand your parse trees, but it doesn't handle any grammar notation or generate any code or output format.
- **Parseviz**³, which is another online tool that can draw parse trees from bracketed notation. It can also output the parse tree in DOT format, which can be used to generate graphs with Graphviz. However, it doesn't generate any code or format the output in JSON or SXML.
- **Syntax Tree Generator**⁴, which is yet another online tool that can draw parse trees from bracketed notation. It can also output the parse tree in LaTeX format, which can be used to generate documents with LaTeX. However, it doesn't generate any code or format the output in JSON or SXML.
- **Parse Tree Generator**⁵, which is a Java program that can generate parse trees from context-free grammars. It can also output the parse tree in XML format, which can be converted to JSON or SXML with other tools. However, it doesn't handle any PEG notation or left recursion.

Source: Conversation with Bing, 5/4/2023
(1) Parse Tree Formatter Tools? - Linguistics Stack Exchange. https://linguistics.stackexchange.com/questions/32006/parse-tree-formatter-tools.
(2) nlp - Tool for drawing parse trees - Software Recommendations Stack .... https://softwarerecs.stackexchange.com/questions/74012/tool-for-drawing-parse-trees.
(3) Generate a parse tree for the following sentence - Kaggle. https://www.kaggle.com/general/208640.
(4) Syntax Tree - Natural Language Processing - GeeksforGeeks. https://www.geeksforgeeks.org/syntax-tree-natural-language-processing/.
(5) Ch. 10: Parsing WHAT IS PARSING? Output = a tree for the sentence .... https://faculty.cs.niu.edu/~freedman/657/parse03.pdf.

让AI总结一下:

我向你介绍了两种可以从语法生成解析器的工具:**PackCC**和**ANTLR**。PackCC是一个用C语言编写的解析器生成器,它使用PEG(解析表达式语法)来描述语法。它支持直接和间接的左递归,并生成易于理解的解析器源代码。你可以使用**$$**和**$n**符号来构建解析树,并输出到你喜欢的格式,比如JSON或SXML。ANTLR是一个强大的解析器生成器,它使用LL(*)算法进行解析。它可以生成不同语言的解析器,比如Java,C#,Python,JavaScript等。你可以使用ANTLR来用类似PEG的符号描述你的语法,并使用ANTLR工具来生成解析器代码。你也可以使用ANTLR来输出不同格式的解析树,比如JSON或XML。我还向你介绍了一些其他可以从语法输出解析树的工具,但它们可能没有PackCC和ANTLR那么强大。例如,你可以尝试:

- **PHP Syntax Tree**¹,这是一个在线工具,可以从括号表示法绘制解析树。你可以用它来可视化你的语法规则和它们的输出,但它不会生成任何代码或格式化输出为JSON或SXML。
- **Parse Tree Formatter**²,这是一个Python脚本,可以以简单或花哨的模式格式化解析树。你可以用它来折叠或展开你的解析树,但它不会处理任何语法表示法或生成任何代码或输出格式。
- **Parseviz**³,这是另一个在线工具,可以从括号表示法绘制解析树。它还可以将解析树输出为DOT格式,这可以用于生成Graphviz图形。然而,它也不会生成任何代码或格式化输出为JSON或SXML。
- **Syntax Tree Generator**⁴,这是又一个在线工具,可以从括号表示法绘制解析树。它还可以将解析树输出为LaTeX格式,这可以用于生成LaTeX文档。然而,它也不会生成任何代码或格式化输出为JSON或SXML。
- **Parse Tree Generator**⁵,这是一个Java程序,可以从上下文无关语法生成解析树。它还可以将解析树输出为XML格式,这可以用其他工具转换为JSON或SXML。然而,它不会处理任何PEG表示法或左递归。


在Arch Linux下,ANTLR可以不通过Python脚本运行:

java -cp /usr/share/java/antlr-complete.jar org.antlr.v4.gui.Interpreter Expr.g4 prog -tree

不过感觉ANTLR输出的语法树需要细化,它没法把类似"20*30"这样的表达书输出成S表达式"(* (expr:3 20) (expr:3 30))",而是直接按顺序输出为"((expr:3 20) * (expr:3 30))",不过这个也算是比较容易解析的了。