What (actually) happens when you type a web address on your browser and hit enter…
Of course that the simplest answer to the question on the title of this article is: “you get to see a web site?”. But on this case let’s try to go deeper into the concept and find out the details within the entire process that happens between you type your web site address (and in this article and for exemplifying purposes we will be using https://holbertonschool.com), hit the “Enter” key and you finally get to see the web site of your choice.
www.holber… what?
Holberton School is a software learning institution based in San Francisco and I am writing this blog as a requirement for their curricula. Here´s a screenshot of the main page from Microsoft Edge:
Web browsers such as Edge, Chrome or Firefox take text strings (because at the end of the day www.holbertonschool.com
or any other web site name is a text string) and directs you to the web site that corresponds with that string.
Now, in order to match the text string it receives with a corresponding web site (with the web site IP address in fact, more about that later), what the browser will do is to parse the text string and divide it in smaller chunks it can identify and relate to something else. The break up goes like this:
In the case of the complete addresswww.holbertonschool.com
, the different components are broken down in this way:
https
— This is the PROTOCOL, spcifically HTTPS (HyperText Transfer Protocol Secure). This is the data transfer method used between the client and server.
www.holbertonschool.com
- the DOMAIN name that matches the web site specific IP address.
The PORT is the specific port of the server where the request will be sent. As in the example we are not actually metioning an specific port, HTTPS will use the default assigned one which is port 443.
PATH AND FILE NAME of the file we are requesting within the server. As it is also being left blank in the example this means that we are querying the server at its root.
We could, of course, provide the browser with the exact IP (Internet Protocol) address of the server we want to access, but human brains are notoriusly bad at remembering large combinations of numbers, while we can easily remember text strings as long as they match actual words from the human-used languages.
DNS (Domain Name System) is what we use to make the text strings we use as domain names, to match specific IP addresses on the server side.
So, again, when we type www.holbertonschool.com
and press Enter, we are telling our browser that we want to access the file located at the root directory of the server that hosts www.holbertonschool.com.
Moreover, we are also indicating the browser to contact the server using HTTPS protocol through port number 443.
A remarkable feature of browsers, is that they have a cache memntlyreory that allows them to “remember” the information they have previously processed. This cache memory stores recently called domain numbers, along with the matching IP addres for each one.
If the browser already has the domain name stored in its own cache, it will bring up the IP address with no further steps, but if the domain name is not then teh browser will search for it within the main Operating System cache. If the domain name is not here neither, then something called the DNS resolution process must start.
DNS resolution process.
Having failed to match the received domain name in both, its own cache and the operating system’s cache, the browser then sends the domain name off to the nearest resolver server, which most of the times will be your ISP or Internet Service Provider, to be resolved into its IP through the Domain Name System.
Now, DNS is a complex process on itself. If you´d like to learn more about it, I highly recommend the extremely effective yet funny explanation on this link.
The resolver contacts both the top-level domain server (.com
, in our case) and domain registrar before successfully matching the domain name into its corresponding IP address. At the end of the process, the browser finally knows the specific IP address that corresponds to our web site www.holbertonschool.com
.
TCP/IP.
Having figured out the IP address of the web site, the browser will start communicating with the server. This communication will most likely be based on TCP/IP, which stands for Transmission Control Protocol/Internet Protocol, as this protocol is the standard in use on applications where instant delivery (such as streaming) IS NOT required. This is because TCP guarantees delivery regardless of it taking longer than instant protocols such as UDP (User Datagram Package).
SSL
Now that the browser has established communication with the IP address, the first thing it does is sending a message containing its Transport Layer Security (TLS) version. TLS is an encryption method used to keep the privacy and reliability of the trasferred data. While nowadays TLS is the new standard, the original and traditionally used method was SSL which stands for Secure Socket Layer. Even though TLS is being used now, we keep using the “S” from SSL to complete the HTTPS acronym.
After receiving the TLS version from the browser, the server will choose its preferred TLS algorithm and method and respond with a security certificate that includes the server TLS public key. The browser then uses this public key to encrypt a pre-master key that is sent back to the server.
When the server receives the browser´s pre-master encrypted key, it will try to decrypt it using its own private key. By being able to perform this decryption, the server can then assume that the communication being received is a valid, reliable one, certified by its own public key that was first sent. The browser and the server have now set up a trusted connection.
This security process is known as the TLS handshake. Browsers show a lock icon on the address bar as indication of having established a secure connection.
HTTPS.
Breaking down the acronym, the first four letters: HTTP stands for HyperText Transfer Protocol, and it is the standard protocol for Internet communications. HTTP defines how computers interact with each other.
Following up with the TLS handshake process, after completion of it, the browser sends a HTTP request message to the server. This message must follow a pre-defined format.:
The first line will define what kind of request message we are talking about. On this case this is a GET message, which means it retrieves web content from a server to a client.
In the header section the browser can specify details of the request, such as if the connection to the server should be terminated immediately or not, or whether the server should store cookies (information that would remain on the browser´s cache even after the session is terminated).
The request body is optional, and mostly irrelevant to request messages.
LOAD BALANCER.
Web sites, in particular those that have a heavy traffic, divide their work load between more than just one single physical server. With over half of the world population accessing the Internet and with some hughe web sites that have a semi-monopoly of some internet functions, it is clear that the load must be balanced between multiple servers.
A load balancer is an intermediary software which, well… balances the incoming requests and directs them according to a load balancing algorithm, in the way that the designers of the back end have decided that their servers will respond in the most efficient way. It can be installed on the same server that hosts the requested web content or on a server of its own.
There are many load balancing algorithms being the most commonly used the one called round-robin which sends requests to servers in turn according to a queue.
In real world, web sites are configured with multiple load balancers that are able to back up each other in the event of one of them failing. This is done in order to avoid having what is known as single point of failure, as if there is only one load balancer, the entire system would collapse if it fails.
FIREWALL.
Before the GET request is finally received by the server, the message goes through one last security check: the firewall.
Firewalls are hardware, software, or an implementation of both that filter all traffic coming into and out of a server. TLS is effective for preventing data from being intercepted mid-transmission. Yet, it assumes that received data is coming from a trusted source. Firewalls make no such assumptions, and utilize a combination of packet filters, application gateways, circuit-level gateways and proxy servers to make certain that a packet does not contain viruses or malicious hardware.
HOST SERVER.
The final destination of the GET request. Host servers are web stack consisting on multiple parts. Let´s break them down:
Operating system: the operating system on which the host server runs. According to this article, at this moment (April 2021) 96.3% of the top 1 million web servers in the world run on some distribution of Linux.
HTTP Web Server: This is the software that handles HTTP request/response messages and ultimately delivers the static web page. The most popular and widely used are Apache and Nginx.
The database server: This is the database software, typically SQL-based, that stores information such as user accounts. A typical website will be configured with multiple database servers, with one configured as a “primary” database having exclusive write privileges whose changes are echoed out to “replicant” databases only having read privileges. This setup is referred to as a “primary-replica” model. MySQL is one of the most typically used database software.
Application Server: Application servers are network computers that store and run an application for client computers. Application servers, whatever their function, occupy a large chunk of computing territory between database servers and the end user. Most broadly, this is called “middleware” which tells us something about what application servers do. First and foremost, application servers connect database information (usually coming from a database server) and our browser. PHP and Python are two high-level languages supported by web servers that can handle dynamic content.
It is extremely common that a typical setup would make use of popular open source solutions to build their web stacks. The combination of the most widely used open source solutions for each component: Linux, Apache, MySQL and PHP/Phyton, has given name to what is known as the LAMP model.
Recap:
- A GET request is received by the web server. The web server pulls up the file configured at the given location (in our example, the HTML file configured at the root (
/
) of the machine). - If the file contains dynamic content, the application server is run (in a LAMP model the corresponding Python or PHP scripts are run). The result of these scripts is inserted into the web page.
- If the dynamic content involves stored data, the Python scripts queries from the database server.
- The web server delivers the web page.
After pulling up the HTML file configured at the root of www.holbertonschool.com
, the host server sent it back to the web browser in a HTTP response message formatted as follows:
“200 OK” in the first line indicates the status code of the request (a succesful request in this case). Other common status codes include 301
(page redirection) and 404
(page not found).
In the response header, the host server states information about the delivered page such as its type (HTML, in our case) and size.
Finally, in the response message body, the host server delivers the actual, entire HTML code itself. Now, utilizing its HTML and CSS engines to parse the code, break it down into its Document Object Model, and render the page. Any JavaScript scripts written in the file are run. Finally the browser displays the Holberton School website home page.