How to configure server-side privacy with Experience Analytics

You can use Experience Analytics to mask sensitive information to protect the privacy of users who interact with your website or mobile applications.

The information that follows explains how to mask sensitive information on the server-side of your Experience Analytics solution. Server-side privacy is one part of a privacy solution. See the Watson™ Customer Engagement | For Developers documentation for more information about privacy masking of sensitive user input on the client-side.

In Experience Analytics, the Hit attribute can block and replace sensitive information. For privacy masking to work, the Event for which the Hit Attribute is a condition, must fire.

What defines a Hit attribute?

Hit attributes are defined by the following characteristics:
  • The start and / or end patterns of strings
  • A regular expression that is applied to found text.

    The regular expression (regex) can enhance hit attribute filtering (match/privacy blocking/replacement) when start and end patterns do not isolate the target text completely.

  • A specific string constant, like “SSN”
    Note: A hit attribute can match either a start / end pattern, OR a specific string, but not both. You can apply a regular expression on either.

How to mask sensitive data with IBM® Tealeaf® on Cloud Hit attributes

Ideally, you never want sensitive data to be sent to Experience Analytics. In most cases, you can block sensitive data through client-side privacy configuration.

Server-side privacy (implemented with Hit attributes) is the mechanism for blocking /replacing content that was not blocked at the client side. The UIC on the client-side handles tagged content well, but can struggle with searching large chunks of text for complex patterns. Server-side privacy serves as a "catch-all" for sensitive information that might not be stopped by client-side privacy rules and configuration.

Let’s say the following error message was captured that includes a user’s social security number.

<form name="webform" action="form_submit method="get">
    First Name: <input type="text" name="fname">
    Last Name: <input type="text" name="lname" value="">
      <font color="red">THIS IS A REQUIRED FIELD</font>
    SSN: <input type="text" name=ssn value="012-34-567">
    <input type="button" value="send_form">
</form>

To mask the social security number on the Experience Analytics server side, you need to create and configure a Hit attribute.

The following example explains how to create and configure the Hit attribute to mask the SSN.
  1. Log on to Experience Analytics.

  2. Select Event Manager.

  3. Select New > Hit Attribute.

  4. Set the fields for the Hit attribute.
    For example, set the following fields:
    Hit attribute name
    Enter a descriptive name for the Hit attribute. For example, Mask SSN.
    Match
    Set Match to Response.
    Use
    Set Use to Start/End expression.
    Start tag
    For this example, you would enter the following value for the Start tag
    <input type="text" name=ssn value="
    End tag
    For this example, you would enter the following value for the End tag
    
    ">
    Block / Replace
    Select the check box for Block / Replace
    Block replacement
    For the Block replacement, you might enter the following text.
    XXX_SSN_XXX

    The expressions that you add to the Start tag and End tag is the text that encloses the SSN that is captured from the HTML in a DOM snapshot.


  5. Click Save to save the Hit attribute.
Consider the following items when masking sensitive data.
  • The code for a web form can vary as follows.
    Figure 1. Web form variations
    
    SSN: <input type=”text” name=”ssn” value=”012-34-5678″ required>
    SSN: <input type “text” name=”ssn” required value=”012-34-5678″>
    SSN: <input type “text” name=”ssn” disabled value=”012-34-5678″>
    
    Note: For any of these variations, the Start/End patterns would differ from the configuration settings that are specified in step 4.
    You need to use a regular expression value to mask only the content of the value attribute.
    
    value="(.*?)"
    Where everything inside of the parenthesis is masked with the value of the “Block replacement” field.
    Note: The final double quotation mark can be in the regex only as a character that is looked for if the End pattern does not include the double quotation mark, for example, >.

Adding the privacy Hit attribute as an Event condition

After you configure and save the privacy Hit attribute to block and replace sensitive information, it is applied to incoming data automatically.

If you configure the privacy Hit attribute correctly, elements of sensitive information in the DOM snapshots that are captured by the configuration are sent to Experience Analytics UI as masked data, in the fashion that is specified when the DOM captures are viewed in Replay.

The privacy Hit attribute configuration described here that matches against the "Response" works equally well against DOM capture data, whether it be full DOM captures or DOM diffs.