6.8. Protection Against XSS Attacks

 

To prevent XSS attacks, a comprehensive set of protective measures is employed. The simplest and most effective way is to completely disable JavaScript in the browser. However, in reality, this is not feasible because many applications rely on JavaScript for their normal operation. Therefore, disabling JavaScript would mean refraining from visiting websites altogether.

 

Escaping and Encoding Special Characters

When the server inserts user-entered data into a page, it should be encoded. This prevents the execution of JavaScript code. Considering the characteristics of applications, dynamic content can be used as HTML element attributes, CSS style properties, JavaScript variables, and more. In other words, malicious code can be injected almost anywhere. Below are effective and straightforward techniques to secure your application.

 

HTML Context Encoding

Imagine the following server-side code:

<div> Search for: $variable </div>

If a hacker inserts malicious code, such as <svg onload=alert(1)>, after the browser renders it, we will get the following code:

<div> Search for: <svg onload="alert(1)"> </svg> </div>

However, if we perform encoding, we will get the following representation:

<div> Search for: &lt;svg onload=&quot;alert(1)&quot;&gt; &lt;/svg&gt; </div>

The browser will display this code as plain text and will not execute it.

Let's consider a simple example. Open the Juice Shop application and try to insert the following code in the product comments:

JS exploit in comments  of JuiceShop

At first glance, everything looks good:

Encoded JS exploit in comments JuiceShop

However, if you right-click and open the context menu, selecting the "Edit as HTML" option, you will see that HTML entities have been encoded:

&   =>   &amp;<   =>   &lt;>   =>   &gt;"   =>   &quot;'   =>   &#x27;

 

Encoding content in HTML attributes, JavaScript, and CSS styles

Sometimes, it's necessary to insert dynamic content inside HTML tags as attributes, in JavaScript code, and within CSS styles.

If you wrap variables in single or double quotes, the inserted code will be treated by the browser as plain text, and therefore, the injected code won't be executed.

Compare:

<div data-content=$variable> </div>   - unsafe<div data-content="$variable"> </div> - safe

<style> selector { property: $variable; } </style>   - unsafe<style> selector { property: "$variable"; } </style> - safe

 

The use of Content Security Policy (CSP)

CSP is an effective mechanism for preventing attacks such as XSS, Clickjacking, and the injection of external resources. The working principle is very straightforward. The server sends a predefined set of directives/instructions with each page, telling the browser what is allowed and what is not.

Directives can be delivered to the client in one of two ways:

Through an HTTP header:

Content-Security-Policy: directive-list

Through a meta tag in the HTML code:

<meta http-equiv="Content-Security-Policy" content="directive-list">

Let's provide a few examples to understand how CSP works:

Example 1. Prohibit the embedding and execution of inline scripts:

Content-Security-Policy: script-src 'none';

Example 2. Allow only self-hosted scripts and styles, and prohibit loading resources from external sources:

Content-Security-Policy: script-src 'self'; style-src 'self';

Example 3. Allow loading and execution of scripts from a specific server (e.g., google.com) and block all other scripts:

Content-Security-Policy: script-src 'self' https://google.com;

Example 4. Allow loading frames (iframes) from an external site (google.com) and block all others:

Content-Security-Policy: frame-src 'self' https://google.com;

Now, let's consider a more complex example:

Content-Security-Policy: frame-src 'none';                        img-src *;                        script-src google.com yandex.ru;                        style-src 'self';                        connect-src 'self';

This CSP policy does the following:

frame-src 'none'; - Blocks loading frames (iframes) from any source.

img-src *; - Allows loading images from any source. The asterisk (*) means "any."

script-src google.com yandex.ru; - Permits loading scripts only from the domains google.com and yandex.ru. Any script attempting to execute from a different domain will be blocked.

style-src 'self'; - Allows only self-hosted styles to be used on the page; scripts from other sources will not be executed.

connect-src 'self'; - Allows only AJAX requests to the self-hosted server. All requests to external resources will be blocked.

 

Cookie Settings

In the previous lesson, we saw how easy it is to extract cookies from a victim's browser. To prevent this on the server side, a special attribute called HttpOnly is added to cookies:

Set-Cookie: myCookie=myValue; HttpOnly

When this attribute is set, browsers prohibit JavaScript scripts from accessing cookie files. You can easily verify this by opening any website, such as Google, and switching to the Application tab (for Chrome). For some cookies, this attribute is already set.

Depiction of cookies in Chrome

If you set HttpOnly for all cookies except one or two, and then execute the command document.cookie in the console, only the cookie values without HttpOnly will be displayed:

5 Retrieving cookies with JS EN

 

 

List of Allowed Characters - Whitelist

In many applications, certain HTML tags need to be used. A typical mistake made by some developers is using a list of forbidden characters/words, known as a blacklist. For instance, in an application, they might create a blacklist that includes the <script> and <iframe> tags. However, this restriction can be easily bypassed using other tags and encoding. Moreover, it's impossible to cover the entire list of prohibited and unsafe characters/words/combinations with a blacklist. Something will inevitably be missed. Furthermore, technologies are constantly evolving, and new features are added, while hackers come up with new ways to bypass them.

It's much simpler and more effective to use a list of allowed characters/words, known as a whitelist. For example, developers decide to allow users to apply italics and bold font in their comments, which can be achieved using the <i> and <b> tags, respectively. You just need to add these tags, alphabet letters, and some punctuation marks to the whitelist. Anything that doesn't match the allowed characters will be blocked and removed.

 

WAF

A Web Application Firewall (WAF) is a firewall that inspects web traffic passing through it. It is installed just before the protected web server as a proxy server and can block malicious requests when a threat is detected. The firewall can analyze both incoming and outgoing traffic.

The basis for traffic analysis is special patterns (signatures). If a specific pattern is found in a request, that request will be blocked. Any WAF, to some extent, can be bypassed, so the firewall requires very careful and individual configuration for each specific application.

Nevertheless, a WAF can prevent many XSS attacks and create a headache for hackers trying to bypass the protection.