1.6. How HTTPS and SSL/TLS Encryption Work

 

Why HTTPS is Needed

A secure connection via HTTP is achieved through SSL/TLS protocols, which is why this HTTP protocol is referred to as HTTPS (HTTP Secure or HTTP over SSL). SSL/TLS protocols comprise a specific set of encryption and algorithms to create a secure communication tunnel.

Why create a secure connection at all?

In modern web applications, many deal with sensitive data such as passwords, credit cards, confidential documents, and more. It's relatively easy for malicious actors to intercept data or impersonate a specific website. Thus, SSL/TLS protocols implement mechanisms for encrypting traffic and authenticating both the server and the user.

Here's what HTTPS provides:

  • Authentication of connection participants - both the server and client can ensure they're exchanging data with genuine participants.
  • Data integrity - during transmission, data won't be tampered with or altered, even if intercepted by an attacker.
  • Confidentiality of messages - no one can read transmitted data, even if intercepted; it will appear as a meaningless set of characters.

What HTTPS does NOT protect you from:

  • Various network attacks - there are numerous vulnerabilities and attack methods targeting web servers and applications, such as SQL injection, command injection, and more. Comprehensive measures on the server and client sides are applied to protect against them.
  • Anonymity - HTTPS doesn't make your connection anonymous. Your browser, third-party services, and devices can see and track which sites you visited and when. However, they won't see the specific data you and the server exchanged.
  • Visiting malicious sites - fraudulent and hacker websites may contain scripts and programs that can harm you. Additionally, some popular sites might be impersonated to deceive you into taking certain actions. For instance, consider your favorite online store echop.com, where you frequently make purchases. A hacker could create a copy of the site and publish it online under the name esh0p.com, where the letter "o" is replaced with the digit "0". Not everyone might notice this, so the hacker could send thousands of emails offering a sale, mimicking the site's appearance. Trusting users might click the link and make purchases, allowing the hacker to gain money and, more importantly, your card information. HTTPS cannot protect you from this.

Now, let's explore how HTTPS works. In simple terms, both participants, the server, and the client, possess an identical encryption key known only to them. This key enables them to encrypt and decrypt transmitted messages.

However, a question arises. How do communication participants obtain a shared encryption key without risking its exposure? Moreover, the key must change continually during the communication session to enhance security.

A mechanism is required for securely exchanging keys even when the communication channel is eavesdropped on by malicious actors intercepting data. This is where asymmetric encryption and the Diffie-Hellman algorithm come to our aid.

 

Symmetric and Asymmetric Encryption

Symmetric encryption uses a single key for both message encryption and decryption:

Principles of Symmetric encryption

Asymmetric encryption employs a pair of keys: the public key and the private key. The public key is known to everyone and is used for encrypting messages. The private key is known only to the message recipient and is used for decrypting messages:

Principles of Assymetic encryption

The most well-known algorithm for asymmetric encryption is the RSA cipher. It is used in SSL/TLS protocols and VPNs. The diagram below illustrates the exchange of a symmetric key using asymmetric encryption:

Flow of shared encryption key exchange

  1. The client generates a symmetric encryption key.
  2. Then, using the server's public key, it encrypts the generated key and sends it to the server. At this stage, the communication channel is vulnerable to interception by an attacker. However, even if the encrypted data is intercepted, it is difficult to decrypt without the private key. Therefore, to increase the difficulty of decryption, long keys and complex encryption algorithms are used.
  3. The server, upon receiving the message, decrypts it with its private key. From this point onward, both participants possess the same session key.
  4. With the shared key, the client and server establish a secure communication channel.

Now, the question arises: Why not use asymmetric encryption for two-way communication?

The reason is that this algorithm requires significant computational resources and is much slower than symmetric encryption.

 

Diffie-Hellman Protocol

In the Diffie-Hellman algorithm, the shared private key is not transmitted over the network at all. Instead, using special mathematical calculations, the participants generate a shared key. The simplified process of this is shown below:

Principle of key calculation with Diffie Hellman algorithm

 

  1. Initially, special numerical parameters a and b are generated. These parameters are known to both participants and can be openly transmitted over the network.
  2. The client then generates a random numerical sequence x, the so-called private key. This key is not transmitted over the network. Using the private key, the client calculates the public key M and sends it to the server.
  3. The server also generates a private key y and calculates the public key N using the previously received parameters a and b. The key N is then sent to the client.
  4. Now the participants calculate the shared session key. The client uses parameter b, the server's public key N, and its private key x. The server uses parameter b, the client's public key M, and its private key y. As a result of the calculations, both sides obtain the same key K.

If an attacker intercepts a, b, M, and N, they cannot compute the shared key without the private keys x and y.

 

Hash Function and Hashing

A hash function is a mathematical function that takes input data of any size and transforms it into an output sequence of fixed length.

Hashing is the process of applying a hash function to data to obtain a unique "fingerprint" or "hash value" for that data.

Hashing has two distinctive features:

  • The result always has a fixed length, regardless of the size of the original message.
  • Hashing is a one-way process, meaning that it is impossible to retrieve the original message from the hash value.

Hash calculation description

What is the purpose of a hash?

Hash functions and hashing are widely used in computer science for data integrity verification, message identification, and authentication, password protection, and other confidential data. They are also used in network security and cryptography to create digital signatures and protect data from tampering or forgery.

Imagine that we have a message that we are sending to different recipients. During transmission, the message can be intercepted and modified. To detect any changes in the message, the sender passes the data through a hash function and then publishes the hashes so that they are available to everyone.

Upon receipt, the recipients also generate a hash from the received data and compare it to what the sender published. If they match, the message is authentic; if not, it may have been altered or accidentally distorted.

A vivid example of this approach is file repositories. A hash is generated for each file. After downloading a file, you can generate your own hash. If both hashes match, it means the file is not corrupted.

Hashing is also used in HTTPS for message transmission. Another example of using hash functions is for accelerating data retrieval in databases, as they allow for quick comparison of hash values without the need to compare the actual data.

Commonly used hashing algorithms include MD5, SHA-1, SHA-256, and others.

 

Digital Signature

When receiving data over an unprotected channel, the recipient wants to ensure that the data has not been tampered with or forged. For this purpose, the mechanism of a digital signature was devised. The sender adds a digital signature to each message, and the recipient verifies it; if something is amiss, the message is not accepted.

The digital signature mechanism involves two algorithms: hashing and asymmetric encryption. The diagram below illustrates the principle of how a digital signature works:

The flow of digital signature creation

  1. At the first stage, the sender generates the hash of the message before sending it.
  2. Then, using the private key known only to them, the sender encrypts the hash and appends it to the end of the transmitted message. This is the digital signature.
  3. Next, the signed message is sent to the recipient. Since the communication channel is not secure, it is possible that a hacker might tamper with the message. However, they won't be able to generate the signature since they lack the sender's private key.
  4. Upon receiving, the recipient decrypts the signature using the public key, which is known to everyone, and obtains the original hash.
  5. Then, the recipient generates a hash from the received message.
  6. Both hashes are compared, and if they match, the received message is authentic; otherwise, it is discarded.

This is how control over message integrity is achieved. It's important to understand that a digital signature doesn't prevent message forgery; it simply verifies the authenticity of the message and the sender.

 

Certificate

Every website operating over HTTPS must have a certificate. A certificate serves as a kind of passport for the site, confirming its authenticity. Certificates are issued by specialized organizations known as Certification Authorities (CA), which are trusted by all communication participants (browsers, servers).

An analogy to a website certificate is a citizen's passport, which contains certain information about them and is issued by a specific government agency. By verifying the citizen's passport, we establish their authenticity, trusting the organization that issued the passport.

So, what does a certificate consist of?

A certificate is a small file containing essential information about the site: the site's URL, host name, public key, issuer, and much more, along with a digital signature. The signature is generated by the certification authority, and no one else can generate an identical signature:

Simplified structure of digital certificate

Typically, certificates are stored in files with extensions like .cer, .p12, .pfx. The file format is determined by the X.509 standard.

When a client accesses a website, it requests its certificate and verifies the following information:

  • Issuer of the certificate. Modern browsers come with pre-installed certificates and contain a list of trusted organizations and certification authorities.
  • Certificate's validity period.
  • Domain name of the site.
  • Digital signature.
  • Revocation status of the certificate. Sometimes, for various reasons, a certification authority might revoke a certificate, rendering it invalid.

If any of these checks fail, the browser will display a message that the certificate is not trusted and will recommend not visiting the site. As mentioned earlier, browsers maintain a list of trusted organizations and certification authorities, along with their public keys. If a certificate is issued by an organization not in the browser's list, the browser will issue a warning that the certificate is not reliable and should not be trusted.

Is it possible to create own certificate and sign it?

Yes, it isn. Such certificates are self-signed.

What will the browser do if it receives such a certificate?

If the certificate is signed by an organization that the browser doesn't recognize, it will display a warning about an unreliable certificate. Self-signed certificates are issued by root certification authorities themselves. In these certificates, they specify their name and their own public key, and they sign it themselves. Since certification authorities have the highest level of trust, any browser will accept such a certificate.

When visiting any website, you can verify the certificate yourself, especially if the browser is showing a warning. To do this, click on the lock icon in the browser's address bar (demonstrated here for Mozilla Firefox):

Depiction of HTTPS padlock with certificate data

If you click on "Connection secure," you will see information about the certificate:

Issuer of certificate

To view the entire certificate, click on "More information," and then switch to the "Security" tab. This will provide you with detailed information about the certificate, its issuer, validity, and other security-related details:

Details of IMVK site certificicate

Here you can see the organization that issued the certificate, the version of the TLS protocol, and the set of encryption algorithms used. If you click on "View certificate," the full information from the certificate will be displayed in a new browser tab:

Example of detailed certificate information

 

Certificate Authority

A Certificate Authority (CA) is an organization that possesses all the technical means to issue and revoke certificates, as well as the highest level of trust. The requirements for such organizations are quite high, so not everyone can become a Certificate Authority.

However, there is a certain hierarchy that allows for the creation of chains of trusted organizations that can issue certificates.

If we return to the previous illustration, you can notice additional tabs:

Tabs with the chain of certificate issuers

This is the chain of trusted organizations, and each child element in the chain inherits trust from its parent elements. In the rightmost tab, you will find information about the certificate of the root Certificate Authority:

Self signed cert of CA - root issuer

Take note of the Subject Name and Issuer Name. They are identical, indicating that this is a self-signed certificate.

 

SSL/TLS - How It Works

Now that you have a basic understanding of cryptography, you can grasp how SSL/TLS protocols work. But first, let's briefly describe SSL and TLS protocols.

SSL (Secure Sockets Layer) uses asymmetric encryption for key exchange and symmetric encryption to establish a secure channel for transmitting user data. The protocol includes an extensive set of encryptions and hashing algorithms, although many of them are considered obsolete. There are three versions of the protocol: SSL 1.0, SSL 2.0, and SSL 3.0. All versions of the protocol are considered vulnerable, leading to the development of an enhanced version called TLS (Transport Layer Security).

TLS has four versions: 1.0, 1.1, 1.2, and 1.3. While the protocol initially supported asymmetric encryption for session key exchange, this scheme is not recommended and has been removed in version 1.3. Instead, the Diffie-Hellman protocol is used across all versions. As of today, it is recommended to use TLS 1.2 and TLS 1.3, with version 1.3 being the most secure and reliable.

So, how does SSL/TLS work?

If you look at the TCP/IP stack, the SSL/TLS protocol sits between the application and transport layers:

Layers on which HTTP and HTTPS operate

Any HTTPS connection starts with establishing a TCP connection. Then, an SSL/TLS connection is established, and after that, HTTP starts transmitting data over the encrypted channel. A simplified diagram of how HTTPS works is presented below:

SSL handshake flow

SSL handshake flow

SSL handshake flow

  1. A TCP connection is established.
  2. The client then sends the server a list of supported ciphers and the SSL/TLS protocol version.
  3. The server selects the latest version and secure encryption algorithms from the provided list.
  4. The server sends its own certificate to the client.
  5. The browser verifies the certificate and its signature, authenticating the server. If the certificate is not valid, the browser displays a warning to the user.
  6. The server may also request the client to provide its certificate to ensure that the client can communicate with the server and access its resources. This happens when a web application handles sensitive data and needs to restrict access to specific users. To achieve this, browsers are pre-configured with the necessary certificates.
  7. Upon receiving the client's certificate, the server verifies it. If everything is in order, the server continues the connection; otherwise, it terminates.
  8. The browser generates a session symmetric key, which it encrypts using the server's public key specified in its certificate and sends it to the server.
  9. The server, using its private key, retrieves the session key. At this point, the establishment of the SSL/TLS connection is completed.
  10. With a shared session key for encryption, both sides can now encrypt and transmit data to each other.