Please enable JavaScript.
Coggle requires JavaScript to display documents.
Complication - Coggle Diagram
Complication
Summary
The three compilation stages are lexical, syntactic and code generation.
-
Lexical analysis breaks down source code into its component parts and stores them as token types in a token table.
Syntactic analysis checks that the source code follows the grammar rules of the language and to analyse the relationships between tokens.
-
-
-
-
Lexical anaylsis
-
In computer programming, a lexicon is a set of terms, called 'keywords' or 'reserved words' that the compiler is designed to understand.
Compilers for each language will have different lexicons, containing the keywords used by that language
A lexeme is the smallest unit of language. Lexemes cannot be broken down further without losing meaning.
The compiler scans the source code character by character looking for keywords that it recognises from its lexicon. These are immediately labelled as lexemes, since they cannot be broken down without losing their meaning. e.g. it may find keywords such as IF, WHILE, FOR, END and so on.
-
Token
So as the compiler is scanning the source code, it finds lexemes, converts them into tokens, and notes what each of the tokens does. Eventually, the whole source code has been converted by the compiler into a token table in memory that connects all this information together.
After finding these, the compiler connects together the unrecognised characters 't' 'i' 'm' 'e' and a space (which it ignores) into the lexeme "time". "time" isn't a recognised keyword, so the compiler assumes it is either a variable or constant and labels it as such
Lexical analysis breaks down the source code into a set of equally sized tokens held as a token table in memory.
Code optimization
Code generation is the final action of a compiler. It converts source code via the output of lexical and syntactic analysis into machine code.
Create machine code
During code generation, each token is converted into machine code instructions.
Another task of the code generation stage is to allocate memory locations to the tokens that were noted as variables and constant during lexical analysis.
The code generator also works out relative addresses for jumping around within the program. An IF statement, for example, might cause the program to jump to different points in the code if a particular condition is met.
Optimize the code
Finally, the code generator tries to optimise the code, making it as fast and efficient as possible.
-
-
Syntax analysis
This stage is called syntactic analysis or 'parsing'. Syntactic analysis can be further broken down into two stages.
The first stage is checking the structure of the code. This is called syntax analysis. Every programming language has a particular grammar. During syntax analysis, the compiler checks that the way the tokens are arranged makes sense in the grammar of the language. Programmers commonly make mistakes like forgetting "end of line" characters, leaving loops unclosed, or forgetting brackets. These are picked up during syntax analysis.
Once the structure of the code is checked, the second stage is to check whether the tokens themselves make sense. This is semantic analysis. For example, if one token is expected to be a number and is instead a string or a boolean, the compiler notices this error during semantic analysis.
-
If an error is detected, it is added to the 'compiler report'. Once the compiler has finished analysing the code, the complete report is returned to the programmer. The errors have to be fixed in the source code, and then the amended source code put back into the compiler to be checked again.
-