Description
Contents
Assignment 1 version 1.0
After completing this assignment, you will be able to
- define formally lexicon of a programming language.
- use ANTLR to implement a lexer for a programming language.
- define formally grammar of a programming language.
- use ANTLR to implement a recognizer for a programming language.
1 Specification
In this assignment, you are required to write a lexer and a recognizer for a program written in BKIT. To complete this assignment, you need to:
- Install Python 3 if you have not installed it yet.
- Download initial.zip and unzip it.
- Download antlr-4.8-complete.jar from https://www.antlr.org/download.html, set the environment variable ANTLR_JAR to this file; install antlr4-python3-runtime (see instructions in section Python Targets of the above webpage).
- Remove all files in folders initial/src/main/bkit/utils, initial/src/main/bkit/astgen, initial/src/main/bkit/checker if any.
- Test the initial code again with just three following tructions:
python run.py gen python run.py test LexerSuite python run.py test ParserSuite
- Change folder initial into assignment1 To complete this assignment, you need to:
- read carefully the specification of language
- Modify BKIT.g4. in the initial code to describe formally BKIT language.Please fill in your id in the header of this file.
- Add more test in LexerSuite and ParserSuite in the initial code.
This assignment is divided two phases: lexer phase and recognizer phase. These phases are assessed independently.
1.1 Phase 1: Lexer
In this phase, you are required to write a lexer for a program written in ANTLR. To complete this phase, you need to:
- Modify BKIT.g4 to detect tokens in BKIT language.
- Make 100 testcases for LexerSuite to test your code.
- For lexical errors, please return the following tokens together with specific lexemes:
- ERROR_CHAR with <unrecognized char> lexeme: when the lexer detects an unrecognized character
- UNCLOSE_STRING with <unclosed string> lexeme: when the lexer detects an unterminated string. The <unclosed string> lexeme does not include the opening quote.
- ILLEGAL_ESCAPE with <wrong string> lexeme: when the lexer detects an illegal escape in string. The wrong string is from the beginning of the string (without the opening quote) to the illegal escape.
- UNTERMINATED_COMMENT without any lexeme: when the detects an unterminated comment.
- You can assume that there is only one error in each test case.
1.2 Phase 2: Recognizer
In this phase, you are required to write a recognizer for a program written in BKIT. To complete this phase, you need to:
- Modify BKIT.g4.
- Make 100 testcases for ParserSuite to test your code.
- You can assume that there is at most one error in each test case.


