Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 1. Introduction imagen_2021-03-02_184825, Compilers and…
Chapter 1. Introduction
-
Present and future
The quality of generated code from many of today's compilers is astonishingly good, often far better than that generated by a competent assembly/machine code programmer.
The compiler can cope well with the complex interacting features of computer architectures. But there are practical limits.
Today's compilers and language tools can deal with complex programming languages generating code for a huge range of computer architectures, both real and virtual.
Compilers are not just about generating target code from high-level language programs. Programmers need software tools, probably built on compiler technology, to support the generation of high-quality and reliable software.
High Level Languages
Disadvantages
The program need to perform some low-level, hardware-specific operations which do not correspond to a high level language feature.
The use of low-level languages is often justifies on the grounds of efficiency in terms of execution speed or runtime storage requirements.
Advantages
High-level language programs are generally easier to read, understand and hence mantain.
-
Problem solving is significantly faster. Moving from the problem specification to code is simpler using a high-level language.
High-level language programs can be structures more easily to reflect the structure of the original problem.
High-level languages can offer software portability. This demands some degree of language standarisation.
Compile-time checking can remove many bugs at an early stage, before the program actually runs.
Efficiency
In the early development of language implementations, the issue of efficiency strongly influenced the design of compilers. The key disadvantage of high-level languages was seen as being one of poor eficiency.
Today, compiler-generated code for a wide range of programming languages and target machines is likely to be just as efficient, if not more so, than hand-written code.
Efficiency is usually concerned with the minimization of computation time, but other constraints such as memory usage or power consumption could be more important.
The case of using high-level languages for almost all applications is now very string. In order to run programs written in high-level languages, we need to consider how they can be implemented
When developing software, a valuable rule to remember is that there is no need to otimise if the code is already fast enough.
Access to the Hardware
This is conventionally done by providing the programmer wih a library acting as an interfacebetwwen the program and the operating system and/or hardware.
To address the stated advantage of low-level languages mentioned before, there is nothing to stop the construction of operating system and machine-specific libraries to perform the special-purpose tasks such as providing access to a particular storage location or executing a particular machine instruction.
A program running on a computer system needs to have access to it's environment. It may input data from the user, it may need to output results, create a file, find the time of the day and so on.
-
Programming was done in machine code, it required considerable skill and was hard word, slow and error prone.
Assembly languages were developed, relieving the programmer from having to deal with much of the low-level detail, but requiring an assembler, a piece of software to translate from assembly code to machine code.
The development of high-level languages gathered speed in the 1950s and beyond. The importance of formal language specifications was recognized and the correspondence between particular grammar types and straightforward implementation was understood.
The extensive use of high-level languages prompted the rapid development of a wide range of new languages, some designed por particular application areas such as COBOL for business applications and FORTRAN for numerical computation.
Why study compilers?
Studying compiler construction offers the computer science students many insights. It gives practical application area for many fundamental data estructures and algorithms, it allows the construction of a large-scale and inherently modular piece of software.
Indeed, one of the best ways of learning a programming language is to write a compiler for that language, preferable writing in its own language.
One of the key motivations for studying this technology es that compiler-related algorithms have relevance to application areas outside the computer field.
Writing a ompiler also give some insight into the design of target machines, both real and virtual.
Introduction
-
The idea of a compiler, traditionally translating form the high-level language source program to machine code for some real hardware processor, is well known but there are other routers for language implementation.
-
-
Compilers and Interpreters
-
-
Analysis of Programs
Grammars
If BNF, EBNF or syntax diagrams are used to specify the production rules, all rules have a particular characteristic that the left-hand sides of productions are always single non-terminal symbols
This is certainly allowed by the rules defining a grammar, but this restriction gives these grammars certain important features.
A grammar gives rules for deriving strings of characters conforming to the syntactic rules of the grammar. A sentential form is any string that can be derived from S, the starting symbol.
The term "grammar" has a wide range of definitions and nuances and it is hardly surprising that we need a tigh and formal definition for it when used in the context pf programming languages.
Chomsky Hierarchy
This hierarchy is made up of four levels, as follows:
Type 1 or a context-sensitive grammar has productions of the form αAβ → αγβ where α, β, γ ∈ U∗, γ is non-null and A is a single non-terminal symbol. This type of grammar turns out to have significant relevance to programming languages.
Type 3 or a regular grammar or a **finite-state grammar puts further restrictions on the form of the productions. Here, all productions are of the form A → a or A → aB where A and B are non-terminal symbols an d a is a terminal symbol. These grammars turn out to be far too restrictive for defining the syntax of conventional programming languages, but do have a key place in the specification of the syntax of the basic lexical tokens dealt with by the lexical analysis phase of a compiler
Type 0 or a free grammar or an unrestricted grammar contains productions of the form α → β. These grammars are not sufficiently restricted to be of any practical use of programming languages.
Type 2 or a context-free grammar has productions of the form A → γ where A is a single non-terminal symbol. These productions correspond directly to BNF rules. In general, BNF can be used in the definition of a language, then that language is no more complex than Chomsky type 2.
In 1950, Noam Chomsky produced a hierarchical classification of formal grammars which provides a framework for he definition of programming languages as well as for the analysis of programs written in these languages.
Parsing
The output of the Parser
As well as indicating whether the input to the parser forms syntactically correct sentence, the parser must also generate a data structure reflecting the syntactic structure of the input.
-
The parser obtains a stream of tokes, from the lexical analyzer in a conventional compiler, and matches them with the tokens in the production rules.
Parsing strategies
-
The top - down parsers. start with the starting symbol of the grammar and hence with the root of the parse tree. Its goal is to match the input available with the definition of the starting symbol.
So if the starting symbol S is defined as S → AB, the goal of recognizing S will be achieved by recognizing of an A followed by recognizing an instance of a B.
When the right-hand of a production that is being matched with the input contains terminal symbols, these symbols can be matched with the input string.
-
The bottom - up parser perhaps reflects a more obvious way of thinking about parsing, where, instead of starting with the starting symbol.
Instead of starting with the starting symbol, we start in the input string, choose a substring to be matched with the right-hand side of a production, replace the substring with the corresponding left-hand side.
The parse tree is being constructed upwards from the leaves, finally reaching the starting symbol at the root. The key problem here is of course one of the determining which reductions to apply and in which order.
The process of parsing repeatedly matches substrings with the right hand sides of productions, replacing matches substring with the corresponding production's left-hand side.
The reverse process of going from a program to some data structure representing the structure and details of the program, also checking that the problem is indeed syntactically correct, is unfortunately larger.
Because of the BNF specification lacks the power to define the context-sensitive aspects of the language, we will need additional advice about, for example, making sure that names are declared, that types have to match.
Suppose we have a set of BNF (or equivalent) production rules defining the grammar of a programming language. We can use these grammar rule as a reference while writing programs in that language to help ensure that what is written is syntactically correct.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-