Prevent cross-site scripting attacks by encoding HTML responses


Cross-site scripting attacks

In a cross-site scripting (XSS) attack, the attacker injects malicious code into a legitimate web page that then runs malicious client-side script. When a user visits the infected web page, the script is downloaded to, and run from, the user's browser. There are many variations to this scheme. The malicious script could access browser cookies, session tokens, or other sensitive information retained by the browser. However, all XSS attacks follow the pattern shown in Figure 1.

Figure 1. Typical XSS attack
Four boxes starting with                    hacker ending with web server, showing the path of an attack
Four boxes starting with hacker ending with web server, showing the path of an attack

XSS vulnerabilities

In a typical XSS attack, the attacker finds a way to insert a string into a server's web page. Suppose the attacker injects the following string into the web page: <script>alert("attacked")</script> . Every time an end user visits this page, their browser will download this script and run it as part of rendering the page. In this case, the script will run and the user sees an alert pop up that says "attacked."

Impact of XSS

When attackers successfully exploit XSS vulnerabilities in a web application, they can insert script that gives them access to end users' account credentials. Attackers can perform a variety of malicious activities, such as:

  • Hijack an account
  • Spread web worms
  • Access browser history and clipboard contents
  • Control the browser remotely
  • Scan and exploit intranet appliances and applications

Preventing XSS attacks

To help prevent XSS attacks, an application needs to ensure that all variable output in a page is encoded before being returned to the end user. Encoding variable output substitutes HTML markup with alternate representations called entities. The browser displays the entities but does not run them. For example, <script> gets converted to &lt;script&gt;.

Table 1 shows the entity name for some common HTML characters.

Table 1. Entity names for HTML characters
ResultDescriptionEntity nameEntity number
 Non-breaking space&nbsp;&#160;
<Less than&lt;&#60;
>Greater than&gt;&#62;
®Registered trademark&reg;&#174

When a web browser encounters the entities, they will be converted back to HTML and printed but they will not be run. For example, if an attacker injects <script>alert("you are attacked")</script> into a variable field of a server's web page, the server will, using this strategy, return &lt;script&gt;alert("you are attacked")&lt;/script&gt;.

When the web browser downloads the encoded script, it will convert the encoded script back to <script>alert("you are attacked")</script> and display the script as part of the web page but the browser will not run the script.

Adding HTML code to a server-side Java application

To ensure that malicious scripting code is not output as part of a page, your application needs to encode all variable strings before they're displayed on a page. Encoding is merely converting every character to its HTML entity name, as shown in the Java code example in Listing 1.

Listing 1. Convert characters to HTML entity name
public class EscapeUtils {

	public static final HashMap m = new HashMap();
	static {
		m.put(34, "&quot;"); // < - less-than
		m.put(60, "&lt;");   // < - less-than
		m.put(62, "&gt;");   // > - greater-than
	//User needs to map all html entities with their corresponding decimal values. 
     //Please refer to below table for mapping of entities and integer value of a char

	public static String escapeHtml() {
		String str = "<script>alert(\"abc\")</script>";
		try {
			StringWriter writer = new StringWriter((int) 
                           (str.length() * 1.5));
			escape(writer, str);
			System.out.println("encoded string is " + writer.toString() );
			return writer.toString();
		   } catch (IOException ioe) {
			return null;

	public static void escape(Writer writer, String str) throws IOException {
		int len = str.length();
		for (int i = 0; i < len; i++) {
			char c = str.charAt(i);
			int ascii = (int) c;
			String entityName = (String) m.get(ascii);
			if (entityName == null) {
				if (c > 0x7F) {
					writer.write(Integer.toString(c, 10));
				} else {
			} else {

In the Java example in Listing 1, for the HTML encoding the input is String "<script>alert(\"abc\")</script>". Use the following steps.

  1. Create a hashmap of all HTML entities and their decimal values. The example shows only three entries. You need to map all HTML entities with their decimal values in this map. Table 2 shows some of the commonly used entities and their decimal values. For a complete reference of all character entities, see the HTML Entities Reference in Resources.
  2. Create a StringWriter of buffer 1.5 times the size of the input string length.
  3. Pass both this writer and input string to the escape method, which picks up each character of the string one by one and gets the integer value of the character.
  4. Pass the integer value to the map you created in step 1, fetch the entity name, and write this value on the writer.

    Every character of the string will be converted into its entity name.

The output is &lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;.

Table 2 maps the HTML entities to their decimal values.

Table 2. Decimal values of HTML entities
160&nbsp;Non-breaking space
60&lt;Less than
62&gt;Greater than
174&reg;Registered trademark


Cross-site scripting is still one of the most common ways to attack a user's machine. However, you can largely eliminate an attacker's ability to infect your web application with malicious code. When writing your application, be diligent about encoding all variable output in a page before sending it to the end user's browser.

Downloadable resources


Sign in or register to add and subscribe to comments.

Zone=Security, Java development
ArticleTitle=Prevent cross-site scripting attacks by encoding HTML responses