1.5. Session Management

Session management in web technologies refers to the process of managing a user's state on a website. When a user visits a website, a unique session identifier is automatically assigned to them. This identifier is used to track the user's activity on the site, save their settings and form data, and ensure security.

In modern web applications, identifying and tracking users is a crucial task. For instance, consider an online store where a user adds items to their cart. Information about these items is stored in the user's session until they complete the purchase. If the user closes their browser and then returns to the site, the information about the added items will still be available in their session.

However, the HTTP protocol is stateless, meaning it doesn't remember either the user or the device they are using. Therefore, various methods were used for user and session identification, including:

  • HTTP Headers
    • From
    • User-Agent
    • Referer
    • Client-ip
    • X-Forwarded-For
  • Fat URLs
  • User Authentication
  • Session Identifiers/Tokens in Cookie Headers and Hidden Forms

Now, let's discuss each method in detail.

The mentioned HTTP headers were used to identify users. Identifiers included the sender's IP address, email, browser type, and operating system. However, they were unreliable and non-unique. Nowadays, this method is no longer used.

Fat URLs contained a unique identifier generated by the server for each user. These URLs looked like: https://example.com/documents/002-1145265-8016838. All user actions were tracked and remembered by the server, but this method also had its drawbacks. It is still used occasionally but has become less common.

User authentication is another method of session management. HTTP supports built-in authentication mechanisms using the WWW-Authenticate and Authorization headers. This prompts a login and password input window in the browser. The working principle is illustrated below:

HTTP 401 - Authorization in WEB

During subsequent requests to server resources, it becomes necessary to reauthenticate. However, browsers can remember the login and password and automatically fill them in the requests to the server. This approach is also quite inconvenient and is rarely used nowadays.

The most common and well-established method is session management through Set-Cookie and Cookie headers.

The principle is quite simple. The server generates a unique identifier or token, which is then passed to the client. The client, in turn, stores the token and sends it to the server in all subsequent requests. Upon receiving the token from the client, the server checks it against the database, thus "recognizing" the user:

Session management with Cookies

Cookies can be session-based (temporary) or persistent. Temporary cookies are valid as long as the website tab is open in the browser, after which they are deleted. Persistent cookies are stored even after closing the browser and restarting the system. It's these cookies that are used for user identification, eliminating the need for users to re-enter their login and password when visiting a site.

The structure of Set-Cookie and Cookie headers is depicted in the diagram below:

Structure of Cookie header

As you noticed, cookies have special attributes. Let's briefly describe what they mean:

  • Name and Value - the name and value of the cookie
  • Expires - the expiration date of the cookie.
  • Domain - the domain that can use the cookie.
  • Path - the server path to which the cookie should be sent.
  • Secure - indicates that the cookie should only be used when using the secure HTTPS protocol.
  • HttpOnly - specifies that the cookie should only be used within HTTP headers and is not accessible from JavaScript.

We will return to a more detailed examination of these attributes when we study certain types of attacks and ways to protect against them.

Additionally, hidden HTML forms are used as identifiers, containing special tokens. Here, standard HTTP headers are not used. Instead, identifiers are transmitted in the response body within the HTML code, with token values hidden from users. In most cases, they are used to protect against CSRF attacks.