Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 1. Introduction imagen_2021-03-02_184825, Compilers and…

- - - - In the first stage, the source program is translated into target machine code
      - In the second stage, the hardware executes or a virtual machine interprets this code to produce results.
    - - The first problem concern performance. If a source statement is executed repeatedly it is analysed each time. The cost of possibly multiple statement analysis will be many times greater than the cost of executing a few machine instructions.
      - The second problem concerns the need for the presence of an interpreter at runtime. When the program es executing" it is located in the memory of the target system in source or in a post-analysis intermediate form, together with the interpreter program.
    - - Source level interpretation. Interpreter complexity is high, the runtime efficiency is low, the initial compilation cost is zero because there is no separate compiler and hence the delay in starting the execution of the program is also zero.
      - Intermediate code interpretation. Interpreter complexity is lower, the runtime efficiency is improved, there is an initial compilation cost and hence there is a delay in starting the program.
      - Target code interpretation. Full compilation, there is no need for the interpreter software so interpreter complexity is xero, the runtime efficiency is high, there is a potentially large initial compilation cost and hence there may be a significant delay in starting the program.
- - - - It is tempting, therefore, to rethink the metalanguage used for languages specification so that these additional rules can be incorporated somehow. Such metalanguages do exists.
      - The preferred approach is to stick with the simple context-free rules of BNF, or equivalent, and rely on other sets of rules, formal or informal, to define the additional constraints.
      - The formal syntax of most common programming languages, expressed in the form of BNF or equivalent, will almost certainly lack a specification for some of the rules or the construction of "well-formed" programs.
- - - - Each lexical token is basic syntactic component of the programming language being processed.
      - These are tokens such as numbers, identifiers, punctuation, operators, strings, reserved words and so on.
  - - - First, the term "optimization" is used in the compiler context in a somewhat unconventional way. It is not taken to mean "generate he best code possible", but instead it imples "generate better code".
      - The final output of the compiler is a program that can be run on the target machine, maybe after some further processing.
      - The output may be some form of object file requiring processing by a linker or a loader before it can actually run or maybe an assembly language requiring processing by an assembler to produce loadable target code.
  - - - The task can easily get out of control but there are many standards implementation routes.
      - It may be that more software than just the compiler or interpreter needs to be written.
- - - - This is certainly allowed by the rules defining a grammar, but this restriction gives these grammars certain important features.
  - - - Type 1 or a context-sensitive grammar has productions of the form αAβ → αγβ where α, β, γ ∈ U∗, γ is non-null and A is a single non-terminal symbol. This type of grammar turns out to have significant relevance to programming languages.
      - Type 3 or a regular grammar or a **finite-state grammar puts further restrictions on the form of the productions. Here, all productions are of the form A → a or A → aB where A and B are non-terminal symbols an d a is a terminal symbol. These grammars turn out to be far too restrictive for defining the syntax of conventional programming languages, but do have a key place in the specification of the syntax of the basic lexical tokens dealt with by the lexical analysis phase of a compiler
      - Type 0 or a free grammar or an unrestricted grammar contains productions of the form α → β. These grammars are not sufficiently restricted to be of any practical use of programming languages.
      - Type 2 or a context-free grammar has productions of the form A → γ where A is a single non-terminal symbol. These productions correspond directly to BNF rules. In general, BNF can be used in the definition of a language, then that language is no more complex than Chomsky type 2.
  - - - As well as indicating whether the input to the parser forms syntactically correct sentence, the parser must also generate a data structure reflecting the syntactic structure of the input.
      - This data structure is traditionally a tree.
        
        The parse tree is constructed as the parser perform sits sequence of reductions and the form of the parse tree directly reflects the syntactic specification of the language.
        
        The root node of the parse tree corresponds to the starting symbol of the grammar.
        
        This form of tree accurately reflects the formal syntactic definition of the language, and much of the tree may turn out to be redundant.
      - The parser obtains a stream of tokes, from the lexical analyzer in a conventional compiler, and matches them with the tokens in the production rules.
    - - Most parsers can be classified as being either top-down parsers or bottom-up parsers.
        
        The parsing process takes the string to be parsed and repeateadly matches substrings with the right-hand sides of productions, replacing those substrings with the corresponding left-hand sides.
      - The top - down parsers. start with the starting symbol of the grammar and hence with the root of the parse tree. Its goal is to match the input available with the definition of the starting symbol.
        
        So if the starting symbol S is defined as S → AB, the goal of recognizing S will be achieved by recognizing of an A followed by recognizing an instance of a B.
        
        When the right-hand of a production that is being matched with the input contains terminal symbols, these symbols can be matched with the input string.
      - There are two broad approaches for the construction of an algorithm for parsing
      - The bottom - up parser perhaps reflects a more obvious way of thinking about parsing, where, instead of starting with the starting symbol.
        
        Instead of starting with the starting symbol, we start in the input string, choose a substring to be matched with the right-hand side of a production, replace the substring with the corresponding left-hand side.
        
        The parse tree is being constructed upwards from the leaves, finally reaching the starting symbol at the root. The key problem here is of course one of the determining which reductions to apply and in which order.
- - - - Portability is enhance. If the interpreter is written in a portable language, he interpreter and the virtual machine coda can be shipped and easily run on machines with different architectures.
      - The virtual machine code can be designed to be particularly compact. There are applications areas whee this may be very important.
      - The design of the code generates by the compiler is not constrained by the architecture of he target machine.
      - Runtime debugging and monitoring features can be incorporates in the virtual machine interpreter allowing improved safety an security in program execution.