Please enable JavaScript.

Coggle requires JavaScript to display documents.

Browsers: Populating the page, User request: Enter a web adress, clicks…

- - - - DNS lookup
        
        TCP Handshake
        
        TLS Negotiation
        
        For secure connections established over HTTPS, another "handshake" is required. This handshake, or rather the TLS negotiation, determines which cipher will be used to encrypt the communication, verifies the server, and establishes that a secure connection is in place before beginning the actual transfer of data. This requires three more round trips to the server before the request for content is actually sent.
        
        While making the connection secure adds time to the page load, a secure connection is worth the latency expense, as the data transmitted between the browser and the web server cannot be decrypted by a third party.
        After the 8 round trips, the browser is finally able to make the request.
        
        Once the IP address is known, the browser sets up a connection to the server via a TCP three-way handshake. This mechanism is designed so that two entities attempting to communicate—in this case the browser and web server—can negotiate the parameters of the network TCP socket connection before transmitting data, often over HTTPS.
        
        https://developer.mozilla.org/en-US/docs/Glossary/TCP_handshake
        
        TCP's three way handshaking technique is often referred to as SYN, SYN-ACK, ACK—because there are three messages transmitted by TCP to negotiate and start a TCP session between two computers.
        This means three more messages back and forth between each server, and the request has yet to be made.
        The connection can be terminated independently by each side of the connection via a four-way handshake.
        
        The first step of navigating to a web page is finding where the assets for that page are located. If you navigate to https://example.com, the HTML page is located on the server with IP address of 93.184.216.34. If you've never visited this site, a DNS lookup must happen.
        
        Your browser requests a DNS lookup, which is eventually fielded by a name server, which in turn responds with an IP address. After this initial request, the IP will likely be cached for a time, which speeds up subsequent requests by retrieving the IP address from the cache instead of contacting a name server again.
        
        DNS lookups usually only need to be done once per hostname for a page load. However, DNS lookups must be done for each unique hostname the requested page references.
        
        If your fonts, images, scripts, ads, and metrics all have different hostnames, a DNS lookup will have to be made for each one.
        
        This can be problematic for performance, particularly on mobile networks. When a user is on a mobile network, each DNS lookup has to go from the phone to the cell tower to reach an authoritative DNS server. The distance between a phone, a cell tower, and the name server can add significant latency.
    - - Parsing
        
        Once the browser receives the first chunk of data, it can begin parsing the information received.
        Parsing is the step the browser takes to turn the data it receives over the network into the DOM and CSSOM, which is used by the renderer to paint a page to the screen.
        The DOM is the internal representation of the markup for the browser. The DOM is also exposed, and can be manipulated through various APIs in JavaScript.
        Even if the request page's HTML is larger than the initial 14KB packet, the browser will begin parsing and attempting to render an experience based on the data it has.
        This is why it's important for web performance optimization to include everything the browser needs to start rendering a page, or at least a template of the page - the CSS and HTML needed for the first render — in the first 14 kilobytes.
        But before anything is rendered to the screen, the HTML, CSS, and JavaScript have to be parsed.
        
        Building the DOM tree
        
        Preload scanner
        
        Building the CSSOM
        
        JavaScript Compilation
        
        2 more items...
        
        The second step in the critical rendering path is processing CSS and building the CSSOM tree.
        The CSS object model is similar to the DOM. The DOM and CSSOM are both trees. They are independent data structures.
        The browser converts the CSS rules into a map of styles it can understand and work with.
        The browser goes through each rule set in the CSS, creating a tree of nodes with parent, child, and sibling relationships based on the CSS selectors.
        
        1 more item...
        
        While the browser builds the DOM tree, this process occupies the main thread.
        As this happens, the preload scanner will parse through the content available and request high priority resources like CSS, JavaScript, and web fonts.
        Thanks to the preload scanner, we don't have to wait until the parser finds a reference to an external resource to request it.
        It will retrieve resources in the background so that by the time the main HTML parser reaches requested assets, they may possibly already be in flight, or have been downloaded.
        The optimizations the preload scanner provides reduce blockages.
        
        <link rel="stylesheet" src="styles.css"/><script src="myscript.js" async></script>
        <img src="myimage.jpg" alt="image description"/>
        <script src="anotherscript.js" async></script>
        
        1 more item...
        
        We describe five steps in the critical rendering path. The Critical Rendering Path is the sequence of steps the browser goes through to convert the HTML, CSS, and JavaScript into pixels on the screen.
        DOM
        CSSOM
        Render tree
        Layout
        Paint
        
        https://developer.mozilla.org/en-US/docs/Web/Performance/Critical_rendering_path
        
        The first step is processing the HTML markup and building the DOM tree.
        HTML parsing involves tokenization and tree construction.
        HTML tokens include start and end tags, as well as attribute names and values.
        If the document is well-formed, parsing it is straightforward and faster.
        The parser parses tokenized input into the document, building up the document tree.
        
        1 more item...
        
        https://developer.mozilla.org/en-US/docs/Glossary/DOM
        
        https://developer.mozilla.org/en-US/docs/Glossary/CSSOM
        
        Render
        
        Interactivity
        
        Once the main thread is done painting the page, you would think we would be "all set."
        That isn't necessarily the case.
        If the load includes JavaScript, that was correctly deferred, and only executed after the onload event fires, the main thread might be busy, and not available for scrolling, touch, and other interactions.
        
        Time to Interactive (TTI) is the measurement of how long it took from that first request which led to the DNS lookup and SSL connection to when the page is interactive — interactive being the point in time after the First Contentful Paint when the page responds to user interactions within 50ms.
        If the main thread is occupied parsing, compiling, and executing JavaScript, it is not available and therefore not able to respond to user interactions in a timely (less than 50ms) fashion.
        
        In our example, maybe the image loaded quickly, but perhaps the anotherscript.js file was 2MB and our user's network connection was slow.
        In this case the user would see the page super quickly, but wouldn't be able to scroll without jank until the script was downloaded, parsed and executed.
        That is not a good user experience. Avoid occupying the main thread, as demonstrated in this WebPageTest example:
        
        1 more item...
        
        https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event
        
        1 more item...
        
        Rendering steps include style, layout, paint and, in some cases, compositing.
        The CSSOM and DOM trees created in the parsing step are combined into a render tree which is then used to compute the layout of every visible element, which is then painted to the screen.
        In some cases, content can be promoted to their own layers and composited, improving performance by painting portions of the screen on the GPU instead of the CPU, freeing up the main thread.
        
        Style
        
        The third step in the critical rendering path is combining the DOM and CSSOM into a render tree.
        The computed style tree, or render tree, construction starts with the root of the DOM tree, traversing each visible node.
        
        Tags that aren't going to be displayed, like the <head> and its children and any nodes with display: none, such as the script { display: none; } you will find in user agent stylesheets, are not included in the render tree as they will not appear in the rendered output.
        Nodes with visibility: hidden applied are included in the render tree, as they do take up space. As we have not given any directives to override the user agent default, the script node in our code example above will not be included in the render tree.
        
        1 more item...
        
        Layout
        
        The fourth step in the critical rendering path is running layout on the render tree to compute the geometry of each node.
        Layout is the process by which the width, height, and location of all the nodes in the render tree are determined, plus the determination of the size and position of each object on the page.
        Reflow is any subsequent size and position determination of any part of the page or the entire document.
        
        1 more item...
        
        Paint
        
        2 more items...
      - Once we have an established connection to a web server, the browser sends an initial HTTP GET request on behalf of the user, which for websites is most often an HTML file.
        Once the server receives the request, it will reply with relevant response headers and the contents of the HTML.
        
        This response for this initial request contains the first byte of data received.
        Time to First Byte (TTFB) is the time between when the user made the request—say by clicking on a link—and the receipt of this first packet of HTML.
        The first chunk of content is usually 14KB of data.
        
        TCP Slow Start / 14KB rule
        
        The first response packet will be 14KB.
        The 14 KB magic number is because each TCP packet can be up to 1500 bytes, but 40 of those bytes are for TCP to use (TCP headers and the like) leaving 1460 bytes for actual data. 10 of those packets means you can deliver 14,600 bytes or about 14 KB (14.25 KB actually).
        This is part of TCP slow start, an algorithm which balances the speed of a network connection.
        Slow start gradually increases the amount of data transmitted until the network's maximum bandwidth can be determined.
        
        In TCP slow start, after receipt of the initial packet, the server doubles the size of the next packet to around 28KB.
        Subsequent packets increase in size until a predetermined threshold is reached, or congestion is experienced.
        
        If you've ever heard of the 14KB rule for initial page load, TCP slow start is the reason why the initial response is 14KB, and why web performance optimization calls for focusing optimizations with this initial 14KB response in mind.
        TCP slow start gradually builds up transmission speeds appropriate for the network's capabilities to avoid congestion.
        
        Congestion control
        
        As the server sends data in TCP packets, the user's client confirms delivery by returning acknowledgements, or ACKs.
        The connection has a limited capacity depending on hardware and network conditions.
        If the server sends too many packets too quickly, they will be dropped. Meaning, there will be no acknowledgement.
        The server registers this as missing ACKs.
        Congestion control algorithms use this flow of sent packets and ACKs to determine a send rate.
  - - - Web performance is what we have to do to make the page load happen as quickly as possible.
        
        https://developer.mozilla.org/en-US/docs/Web/Performance
    - - For the most part, browsers are considered single threaded. For smooth interactions, the developer's goal is to ensure performant site interactions, from smooth scrolling to being responsive to touch.
        
        Render time is key, with ensuring the main thread can complete all the work we throw at it and still always be available to handle user interactions.
        
        Web performance can be improved by understanding the single-threaded nature of the browser and minimizing the main thread's responsibilities, where possible and appropriate, to ensure rendering is smooth and responses to interactions are immediate.
- - - - Response:
        Once we have an established connection to a web server, the browser sends an initial HTTP GET request on behalf of the user, which for websites is most often an HTML file.
        Once the server receives the request, it will reply with relevant response headers and the contents of the HTML.
        This response for this initial request contains the first byte of data received.
        Time to First Byte (TTFB) is the time between when the user made the request—say by clicking on a link—and the receipt of this first packet of HTML.
        The first chunk of content is usually 14KB of data, a part of the TCP slow start.
        The first response packet will be 14KB.
        The 14 KB magic number is because each TCP packet can be up to 1500 bytes, but 40 of those bytes are for TCP to use (TCP headers and the like) leaving 1460 bytes for actual data. 10 of those packets means you can deliver 14,600 bytes or about 14 KB (14.25 KB actually).
        Another point to note is that HTML is actually read by most browsers as a stream of bytes and so you don't need to download the whole of HTML before the browser starts to process it. Basically browsers are pretty impatient too – because of those impatient users – and so will start looking at HTML as soon as it starts arriving. It may see references to CSS, JavaScript and other resources and start fetching those so it can render the page as quickly as possible. Therefore, even if you can't send the whole page in the first 14 KB (though please do if you can!), having as much critical data in that first 14 KB will allow the browser to start working on that data earlier.
        While 14 KB may not seem a lot in these days of multi-megabyte pages, remember that you are not trying to fit your entire page into that 14 KB limit (though again do if you can!), but only trying to optimise what the browser sees in that first chunk of data. Delivering all your critical resources in the first 14 KB therefore gives you the best chance of maximising the browser's first read and should lead to a faster page, hence why web performance experts have been giving that advice.
        In an ideal world, that will even be enough to start rendering if you inline critical CSS – which I don't actually like btw! – but even if you can't get it as far as a one round-trip render, getting the browser to start downloading all the required resources as quickly as possible will also help.
        However that advice may not be entirely accurate and personally I think fixating on that magic 14 KB number isn't actual that helpful. It makes several assumptions, that aren't really realistic on the web of today, if they ever were.
        
        https://www.tunetheweb.com/blog/critical-resources-and-the-first-14kb/
        
        Congestion control
        As the server sends data in TCP packets, the user's client confirms delivery by returning acknowledgements, or ACKs.
        The connection has a limited capacity depending on hardware and network conditions.
        If the server sends too many packets too quickly, they will be dropped. Meaning, there will be no acknowledgement.
        The server registers this as missing ACKs.
        Congestion control algorithms use this flow of sent packets and ACKs to determine a send rate.
        
        https://en.wikipedia.org/wiki/TCP_congestion_control
        
        A request for a web page or app starts with an HTML request. The server returns the HTML - response headers and data.
        The browser then begins parsing the HTML, converting the received bytes to the DOM tree.
        The browser initiates requests every time it finds links to external resources, be they stylesheets, scripts, or embedded image references.
        Some requests are blocking, which means the parsing of the rest of the HTML is halted until the imported asset is handled.
        Render blocking resources are static files, such as fonts, HTML, CSS, and JavaScript files
        The browser continues to parse the HTML making requests and building the DOM, until it gets to the end, at which point it constructs the CSS object model.
        With the DOM and CSSOM complete, the browser builds the render tree, computing the styles for all the visible content.
        After the render tree is complete, layout occurs, defining the location and size of all the render tree elements.
        Once complete, the page is rendered, or 'painted' on the screen.
        
        Not a context free grammar #
        As we have seen in the parsing introduction, grammar syntax can be defined formally using formats like BNF.
        Unfortunately all the conventional parser topics do not apply to HTML.
        HTML cannot easily be defined by a context free grammar that parsers need.
        There is a formal format for defining HTML - DTD (Document Type Definition) - but it is not a context free grammar.
        This appears strange at first sight; HTML is rather close to XML. There are lots of available XML parsers. There is an XML variation of HTML - XHTML - so what's the big difference?
        The difference is that the HTML approach is more "forgiving": it lets you omit certain tags (which are then added implicitly), or sometimes omit start or end tags, and so on. On the whole it's a "soft" syntax, as opposed to XML's stiff and demanding syntax.
        This seemingly small detail makes a world of a difference. On one hand this is the main reason why HTML is so popular: it forgives your mistakes and makes life easy for the web author. On the other hand, it makes it difficult to write a formal grammar. So to summarize, HTML cannot be parsed easily by conventional parsers, since its grammar is not context free. HTML cannot be parsed by XML parsers.
        
        HTML DTD #
        HTML definition is in a DTD format. This format is used to define languages of the SGML family. The format contains definitions for all allowed elements, their attributes and hierarchy. As we saw earlier, the HTML DTD doesn't form a context free grammar.
        There are a few variations of the DTD. The strict mode conforms solely to the specifications but other modes contain support for markup used by browsers in the past. The purpose is backwards compatibility with older content. The current strict DTD is here: www.w3.org/TR/html4/strict.dtd
        
        The parsing algorithm #
        As we saw in the previous sections, HTML cannot be parsed using the regular top down or bottom up parsers.
        The 4 reasons are:
        1- The forgiving nature of the language.
        2-The fact that browsers have traditional error tolerance to support well known cases of invalid HTML.
        3-The parsing process is reentrant. For other languages, the source doesn't change during parsing, but in HTML, dynamic code (such as script elements containing document.write() calls) can add extra tokens, so the parsing process actually modifies the input.
        4-Unable to use the regular parsing techniques, browsers create custom parsers for parsing HTML.
        The parsing algorithm is described in detail by the HTML5 specification. The algorithm consists of two stages: tokenization and tree construction.
        Tokenization is the lexical analysis, parsing the input into tokens. Among HTML tokens are start tags, end tags, attribute names and attribute values.
        The tokenizer recognizes the token, gives it to the tree constructor, and consumes the next character for recognizing the next token, and so on until the end of the input.
        
        1 more item...
        
        HTML PARSING
        
        Render tree construction
        
        Rendering steps include style, layout, paint and, in some cases, compositing.
        The CSSOM and DOM trees created in the parsing step are combined into a render tree which is then used to compute the layout of every visible element, which is then painted to the screen.
        In some cases, content can be promoted to their own layers and composited, improving performance by painting portions of the screen on the GPU instead of the CPU, freeing up the main thread.
        
        Style
        
        2 more items...
        
        Interactivity
        
        Once the main thread is done painting the page, you would think we would be "all set."
        That isn't necessarily the case.
        If the load includes JavaScript, that was correctly deferred, and only executed after the onload event fires, the main thread might be busy, and not available for scrolling, touch, and other interactions.
        
        1 more item...