Please enable JavaScript.
Coggle requires JavaScript to display documents.
Lexical Analysis image, image, image, image, image, image, image - Coggle…
Lexical Analysis
Lexical tokens
An example.
-
The lexical analyzer would read this input combining adjacent characters to form syntactically correct tokens, ignoring irrelevant input and passing a steam of tokens to the next phase.
-
-
-
-
The role of the lexical analysis phase of a compiler is to read the program being compiled, breaking up the input into a sequence of tokens.
-
Direct implementation
-
More general issues
Identifiers and Reserved Words. The syntax of identifiers and reserved words/keywords is eas to specify and the rules have to be followed in the lexucal analyser. Issues which sometimes cause mistakes include:
-
If the language has keywords, is the cas of letters significant?
Is there a limit on the leght of an identifier, and are all characters significants?
-
In some cases, the language specification may use the term "implementation-defined"
Numerical constants. The complexity of the analysis code is further increased by having to deal with a wide range of numeric constant types supported by most programming languages. Floating points constants have more complex syntax and require systematic approach to the coding of their lexical analysis otherwise messy and unreliable codes results.
Testing. Beginners to compiler writing often fall into the trap of inadequate testing. They will start a compiler construction project, perfectly reasonably, by coding a lexical analyzer, but only testing it on a handful of small programs. The lexical analyze should not be overly difficult to test, but the testing has to be systematic and thorough.
Difficult languages. Most programming languages include "features" that pose particular problems for the lexical analyzer.
Evaluation. The overall structure of the lexical analyzer is intuitive, with clear separation between the sections of code dealing with each individual token. The code for most tokens seems straightforward. However, some aspects of the design of most programming languages can cause implementation difficulties and may requiere annoying-to-program lookahead.
-
-
Regular expressions
-
Finite-State machines.
This transition diagram which is another directed graph but with labelled branches. Each node in the transition diagram is called a state. One or more states are enclosed in double circled they are accepting states.
-
The transition diagram can be represented in other ways. In particular, representing it as a transition table may be helpful.
Assuming that it is manageable to transform regular expressions into finite-states machines, this approach looks as though as it might be a good, practical route for the implementation of lexical analyzers,
-
Regular expression arise from Chomsky type 3 Grammars. A Chomsky type 3 Grammar has productions of the form: A → a or A →aB
A regular expression is made up of symbols of the language being defined together with operators that support:
-
-
Repetition. Symbols or group of symbols are followed by the * operator to signify zero or more rrepetitions.
Parenthesis can aso be used to group symbols. Conventionally, the repetition operator has highest precedence, followed by concatenation, with alternation having lowest precedence.
-
-
-
-
-
-
-