Prevent cross-site scripting attacks by encoding HTML responses

Uncover the basics of cross-site scripting attacks and learn how you can prevent them using a Java™-based approach to encode HTML output from a server.

Usha Ladkani (uladkani@in.ibm.com), Senior Staff Software Engineer, IBM

Photo of Usha LadkaniUsha Ladkani has extensive experience in application development, with a focus on Java/J2EE and middleware technologies. She has worked with the B2B development and maintenance team since 2005. Usha developed a framework preventing cross-site scripting issues in WebSphere Partner Gateway to make products more secure. Her contributions to other IBM products include autonomic computing, enabling WebSphere Application Server update installer to be used for WPG, build engineering, and static code analysis. Usha is the focal point for application and static scan analysis for WPG and Sterling Integrator. She has an electronics and telecommunication engineering degree.



30 July 2013

Also available in Russian

Cross-site scripting attacks

Creating a security culture

Creating a Security Culture

Companies move applications to the web to improve customer interactions, lower business processing costs, and speed outcomes. But doing so also increases vulnerabilities—unless security is an integral component of the application development process. To learn more about creating a culture of security in your organization, download and read "Secure Web Applications: Creating a Security Culture."

In a cross-site scripting (XSS) attack, the attacker injects malicious code into a legitimate web page that then runs malicious client-side script. When a user visits the infected web page, the script is downloaded to, and run from, the user's browser. There are many variations to this scheme. The malicious script could access browser cookies, session tokens, or other sensitive information retained by the browser. However, all XSS attacks follow the pattern shown in Figure 1.

Figure 1. Typical XSS attack
Four boxes starting with hacker ending with web server, showing the path of an attack

XSS vulnerabilities

In a typical XSS attack, the attacker finds a way to insert a string into a server's web page. Suppose the attacker injects the following string into the web page: <script>alert("attacked")</script> . Every time an end user visits this page, their browser will download this script and run it as part of rendering the page. In this case, the script will run and the user sees an alert pop up that says "attacked."

Impact of XSS

When attackers successfully exploit XSS vulnerabilities in a web application, they can insert script that gives them access to end users' account credentials. Attackers can perform a variety of malicious activities, such as:

  • Hijack an account
  • Spread web worms
  • Access browser history and clipboard contents
  • Control the browser remotely
  • Scan and exploit intranet appliances and applications

Preventing XSS attacks

To help prevent XSS attacks, an application needs to ensure that all variable output in a page is encoded before being returned to the end user. Encoding variable output substitutes HTML markup with alternate representations called entities. The browser displays the entities but does not run them. For example, <script> gets converted to &lt;script&gt;.

Table 1 shows the entity name for some common HTML characters.

Table 1. Entity names for HTML characters
ResultDescriptionEntity nameEntity number
 Non-breaking space&nbsp;&#160;
<Less than&lt;&#60;
>Greater than&gt;&#62;
&Ampersand&amp;&#38;
¢Cent&cent;&#162;
£Pound&pound;&#163;
¥Yen&yen;&#165;
Euro&euro;&#8364;
§Section&sect;&#167;
©Copyright&copy;&#169;
®Registered trademark&reg;&#174
Trademark&trade;&#8482;

When a web browser encounters the entities, they will be converted back to HTML and printed but they will not be run. For example, if an attacker injects <script>alert("you are attacked")</script> into a variable field of a server's web page, the server will, using this strategy, return &lt;script&gt;alert("you are attacked")&lt;/script&gt;.

When the web browser downloads the encoded script, it will convert the encoded script back to <script>alert("you are attacked")</script> and display the script as part of the web page but the browser will not run the script.


Adding HTML code to a server-side Java application

To ensure that malicious scripting code is not output as part of a page, your application needs to encode all variable strings before they're displayed on a page. Encoding is merely converting every character to its HTML entity name, as shown in the Java code example in Listing 1.

Listing 1. Convert characters to HTML entity name
public class EscapeUtils {

	public static final HashMap m = new HashMap();
	static {
		m.put(34, "&quot;"); // < - less-than
		m.put(60, "&lt;");   // < - less-than
		m.put(62, "&gt;");   // > - greater-than
	//User needs to map all html entities with their corresponding decimal values. 
     //Please refer to below table for mapping of entities and integer value of a char
              }

	public static String escapeHtml() {
		String str = "<script>alert(\"abc\")</script>";
		try {
			StringWriter writer = new StringWriter((int) 
                           (str.length() * 1.5));
			escape(writer, str);
			System.out.println("encoded string is " + writer.toString() );
			return writer.toString();
		   } catch (IOException ioe) {
			ioe.printStackTrace();
			return null;
		                                            }
	                                                 }

	public static void escape(Writer writer, String str) throws IOException {
		int len = str.length();
		for (int i = 0; i < len; i++) {
			char c = str.charAt(i);
			int ascii = (int) c;
			String entityName = (String) m.get(ascii);
			if (entityName == null) {
				if (c > 0x7F) {
					writer.write("&#");
					writer.write(Integer.toString(c, 10));
					writer.write(';');
				} else {
					writer.write(c);
				}
			} else {
                     writer.write(entityName);
			}
		}
	}
}

In the Java example in Listing 1, for the HTML encoding the input is String "<script>alert(\"abc\")</script>". Use the following steps.

  1. Create a hashmap of all HTML entities and their decimal values. The example shows only three entries. You need to map all HTML entities with their decimal values in this map. Table 2 shows some of the commonly used entities and their decimal values. For a complete reference of all character entities, see the HTML Entities Reference in Resources.
  2. Create a StringWriter of buffer 1.5 times the size of the input string length.
  3. Pass both this writer and input string to the escape method, which picks up each character of the string one by one and gets the integer value of the character.
  4. Pass the integer value to the map you created in step 1, fetch the entity name, and write this value on the writer.

    Every character of the string will be converted into its entity name.

The output is &lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;.

Table 2 maps the HTML entities to their decimal values.

Table 2. Decimal values of HTML entities
DecimalEntityDescription
160&nbsp;Non-breaking space
60&lt;Less than
62&gt;Greater than
38&amp;Ampersand
162&cent;Cent
163&pound;Pound
165&yen;Yen
8364&euro;Euro
167&sect;Section
169&copy;Copyright
174&reg;Registered trademark
8482&trade;Trademark

Conclusion

Cross-site scripting is still one of the most common ways to attack a user's machine. However, you can largely eliminate an attacker's ability to infect your web application with malicious code. When writing your application, be diligent about encoding all variable output in a page before sending it to the end user's browser.

Resources

Learn

Get products and technologies

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, or use a product in a cloud environment.
  • The Apache commons-lang package contains useful methods for handling encoding and other common tasks. For the specific case mentioned in this article, the static method org.apache.commons.lang3.StringEscapeUtils.escapeHtml4(String input) can be used to escape a string.

Discuss

  • Get involved in the developerWorks Community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Security on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Security, Java technology
ArticleID=938495
ArticleTitle=Prevent cross-site scripting attacks by encoding HTML responses
publish-date=07302013