Description
You will be given a lexer that reads tokens from standard input. Your goal is to write, in C or C++, a program that reads all tokens from standard input by calling the lexer function getToken() and storing certain tokens in a linked list. After all tokens are read, your program will print out the content of the linked list in a specific order. The next section describes the lexer API. You also need to read the provided code in lexer.c and understand how the lexer works. You will only use the getToken() function in this project.
There are two functions that the lexer defines. These two functions compose the application programming interface (API) of our lexer. These functions are declared in lexer.h (and implemented in lexer.c ). You will find the files lexer.h and lexer.c on the submission site for project
2. getToken() reads the next token from standard input and returns its type as a token_type enum. If the token is of type ID , NUM , IF , WHILE , DO , THEN , or PRINT , then the actual token value is stored in the global variable current_token as a null-terminated character array and the length of the string is stored in the global variable token_length .
There are two special token_type values: END_OF_FILE , which is returned when the lexer encounters the end of standard input and ERROR , which is returned when the lexer encounters an unrecognized character in the input. ungetToken() causes the next call to getToken() to return the last token read by the previous call to getToken() . Note that this means the next call to getToken() will not read from standard input.
It’s a logical error to call ungetToken() before calling getToken() . This function is useful for writing recursive descent parsers that you will see later on in this course. There are four global variables declared in lexer.h that are set when getToken() is called: t_type : the token type is stored here. Note that this will be the same value that was returned by getToken() .
CSE 340 Project 2
1. Introduction
2. Lexer API current_token :
the token value is stored in the array current_token . If the token is of type ID , NUM , IF , WHILE , DO , THEN , or PRINT , then current_token contains the token string. For all other token types, current_token contains the empty string. token_length : the length of the string stored in current_token . line :
the current line number of the input when the token was read. You should read the source code provided in lexer.c and lexer.h . Here is a hint for using the lexer: you can use the token type labels such as NUM or END_OF_FILE directly in the code. For example, if you want to check if the token type is NUM , you can write the following code:
Your program should use the provided lexer and read all tokens from the input by repeatedly calling the getToken() function. Certain token strings and additional data should be stored in a linked list. Specifically, if either of the following conditions are true: The token is of type NUM OR The token is of type ID AND the actual token is equal to one of the following values:
CSE 340 Project 2
“cse340” , “programming” , or “language” Then the token string and other information needs to be stored in a node of a linked list. The information that needs to be stored about each of these tokens in the linked list is the following: Token type (from t_type ) Token value (from current_token ) Line number of the input where token was read (from line ) After reading all tokens from the input and storing information about tokens that match the criteria, your program should go over the linked list and print the information in reverse order from when that token was encountered.
Each of the tokens in the linked list must be printed to standard output on a separate line with the if (t_type == NUM) { // … } 3. Requirements following format: Note that




