Removing data from the Ariel database

The ACP (Ariel Copy) tool reads through an Ariel database, applies criteria, and then re-writes the filtered data to another location. This tool is useful for GDPR (General Data Protection Regulation) compliance. For example, you want to remove all data that is flagged with a given user name or source IP address.

Note: This technical blog article is as-is and didn’t go through any extra vetting.

Usage

Provide the tool with database (events | flows), the start and end times that need to be filtered, the AQL query (-q) or the key creator/value (-k/-p) combination to apply, the user who needs to be running it (-u), and the destination directory (-d).

The ACP tool reads through the Ariel database for the specified times, applies the filter, and places all the records that match the filter in the specified destination directory.

[root@m5arch06 templates]# /opt/qradar/bin/runjava.sh com.q1labs.ariel.io.ACP
data base name is required
Configured data bases:
		flows
		hc
		simarc
		events
		simevent
usage: ACP -n events -d /store/new_ariel -b "2013/11/21 1:00:00" -e
           "2013/11/21 4:00:00"
           ACP -n events -d /store/new_ariel -b "2013/11/21 1:00:00" -e "2013/11/21
           4:00:00" -q "username is not null and sourceip = '127.0.0.1'" -u admin
           Options:
 -n,--name           data base name, for example -n events
 -b,--begintime      begin time, for example -t "2018/08/28 09:04",
                     optional, by default current system time
 -e,--endtime        end time, for example -t "2018/08/28 10:04",
                     optional, by default current system time
 -d,--destination    Destination directory i.e. /store/filtered_ariel/
 -h,--help           print this message
 -k,--keycreator     class for IKeyCreator
 -p,--param          optional parameter for IKeyCreator
 -q,--query          AQL 'where clause' to copy records
 -u,--user           username (default=admin)
 -v,--testedvalue    value to compare createKey with
[root@m5arch06 templates]#

Primer on the Ariel file structure

Before we go any further, let's review how Ariel data is laid out on disk. Ariel is a time-series based database. Ariel does not know what data is on disk until it searches it. You can copy Ariel data between systems without any consequences or the Ariel database knowing about it. Ariel data is stored in minute-by-minute records in a format similar to the following example:

/store/ariel/[events | flows ]/[records | payloads | md]/YYYY/MM/dd/hh/
Where:
  • events or flows is the name of the Ariel database

  • records is the normalized event record
  • payloads is the raw payload associated with the records
  • md are the hash signatures or message digests (optionally HMAC encrypted) of the Ariel data

In general, the Ariel data files have the following structure:

identifier~Minute#_File#~UUID~UUID~RetentionBucketNumber

In the records directory, you see files like the following files:
  • events~5_0~35c5c6bddcb24578~ad5711e8c1337858~0

    This record represents the normalized event record for the 5th minute and the 0th retention bucket (default bucket).

  • SourceIP~5_0~35c5c6bddcb24578~ad5711e8c1337858~0

    The sourceIP index for the 5th minute and the 0th retention bucket (default bucket).

In the payloads directory, we will see files like the following files:

payload_events~5_0~35c5c6bddcb24578~ad5711e8c1337858~0

The raw payloads for the 5th minute and the 0th retention bucket (default bucket).

In the md directory, you see files like this file:

  • events~5_0~35c5c6bddcb24578~ad5711e8c1337858~0.HMACSHA512

    The HMAC-SHA512 signature for the normalized event records for the 5th minute and 0th retention bucket.

  • payload_events~5_0~35c5c6bddcb24578~ad5711e8c1337858~0.HMACSHA512

    The HMAC-SHA512 signature for the raw payloads for the 5th minute and the 0th retention bucket.

    Note: The ACP tool creates a directory called hashes instead of md. If you have file hashing or integrity hashing enabled, then beware of this issue.

Example - Removing events that have root as the user name

In this example, we want to remove all events that have the user name of root from the Ariel events database for a time period between 09:00 and 10:00.

  1. First, we search and show what we want to remove.
    The following screen capture shows all events:
    Figure 1. All events
    Image showing all events that are captured in a specific timeframe.
  2. Next, run the ACP tool:
    [root@m5arch06 ~]# /opt/qradar/bin/runjava.sh com.q1labs.ariel.io.ACP -n events -b "2018/08/28 09:00:00" -e "2018/08/28 10:00:00" -q "username not like 'root'" -u admin -d /store/new_ariel
    AQL criteria: [username not like 'root'] User: admin
    Copying: [events] to /store/new_ariel Timeline from Tue Aug 28 09:00:00 NDT 2018 to Tue Aug 28 09:59:00 NDT 2018
    Trying to copy dir: /store/ariel/events/records/2018/8/28/9[18-08-28,09:00:00]
    Reader started ...
    Processing interval /store/ariel/events/records/2018/8/28/9/events~59_0~13e6848042143d0~92f58da705b4b452~0[18-08-28,09:59:00]
    Processing interval /store/ariel/events/records/2018/8/28/9/events~58_0~23b0a632f1df4d10~bbe50149b34453d8~0[18-08-28,09:58:00]
    Processing interval /store/ariel/events/records/2018/8/28/9/events~57_0~745684fd0cca424c~a9857807a2952671~0[18-08-28,09:57:00]
    Processing interval /store/ariel/events/records/2018/8/28/9/events~56_0~89c02507d81940c3~a54edd654c2ce796~0[18-08-28,09:56:00]
    …. … ...
    Processing interval /store/ariel/events/records/2018/8/28/9/events~4_0~2c48501471734615~b8ca3cb590a16e05~0[18-08-28,09:04:00]
    Processing interval /store/ariel/events/records/2018/8/28/9/events~3_0~3e369b679aa64306~a97b95e8528a1a9b~0[18-08-28,09:03:00]
    Processing interval /store/ariel/events/records/2018/8/28/9/events~2_0~4e370db759d44bd2~81f695b0164b5a66~0[18-08-28,09:02:00]
    Processing interval /store/ariel/events/records/2018/8/28/9/events~1_0~cdbe9e25b9ee4dbd~982ef0da1ab7e85d~0[18-08-28,09:01:00]
    Processing interval /store/ariel/events/records/2018/8/28/9/events~0_0~ecd2e4bf6bbb425a~b2acfe04fda202a6~0[18-08-28,09:00:00]
    Reader stopped. Written 9507759 records, Skipped 2654 records
    
    Completed copying dir: /store/ariel/events/records/2018/8/28/9[18-08-28,09:00:00]
    
    [root@m5arch06 ~]#
    
    The following screen capture shows all events without the root user name:
    Figure 2. All events with root user name
    Image showing all events that are captured with the root user name in a specific timeframe.
    The following screen capture shows all events without the root user name.
    Figure 3. All events without root user name
    Image showing all events that are captured without the root user name in a specific timeframe.

    As explained previously, we removed all events from 09:00 to 10:00 that matched the username of root. So, with the ACP tool, we specified in our criteria that the username was NOT root.

    Also, we placed the filtered records in the /store/new_Ariel directory.

  3. Replace the old Ariel data with the new Ariel data.

    Looking in the /store/new_ariel directory (the destination we chose in this example), we can see that we processed one hour worth of data:

    [root@m5arch06 new_ariel]# find /store/new_ariel -maxdepth 6
    /store/new_ariel
    /store/new_ariel/events
    /store/new_ariel/events/hashes
    /store/new_ariel/events/hashes/2018
    /store/new_ariel/events/hashes/2018/8
    /store/new_ariel/events/hashes/2018/8/28
    /store/new_ariel/events/hashes/2018/8/28/9
    /store/new_ariel/events/records
    /store/new_ariel/events/records/2018
    /store/new_ariel/events/records/2018/8
    /store/new_ariel/events/records/2018/8/28
    /store/new_ariel/events/records/2018/8/28/9
    /store/new_ariel/events/payloads
    /store/new_ariel/events/payloads/2018
    /store/new_ariel/events/payloads/2018/8
    /store/new_ariel/events/payloads/2018/8/28
    /store/new_ariel/events/payloads/2018/8/28/9
    [root@m5arch06 new_ariel]#
    

    In this example, the data was chosen to line up on hour boundaries to make the archive and copies easier (mv command can be used). If your data does not line up on hour boundaries, you will need to migrate file-by-file.

  4. Archive the old data from the ariel directory to the ariel_archive directory by typing the following command:
    [root@m5arch06 store]# mv /store/ariel/events/records/2018/8/28/9 /store/ariel_archive/events/records/2018/8/28/
    [root@m5arch06 store]# mv /store/ariel/events/payloads/2018/8/28/9 /store/ariel_archive/events/payloads/2018/8/28/
    [root@m5arch06 store]# mv /store/ariel/events/md/2018/8/28/9 /store/ariel_archive/events/hashes/2018/8/28/
    [root@m5arch06 store]#
    

    At this point, that hour of data is not searchable. The Ariel database will not be able to find it because the files are no longer in the Ariel database.

  5. Copy (or move) the new data from the new_ariel directory to the ariel directory by typing the following command:
    [root@m5arch06 store]# cp -a /store/new_ariel/events/records/2018/8/28/9 /store/ariel/events/records/2018/8/28
    [root@m5arch06 store]# cp -a /store/new_ariel/events/payloads/2018/8/28/9 /store/ariel/events/payloads/2018/8/28
    [root@m5arch06 store]# cp -a /store/new_ariel/events/hashes/2018/8/28/9 /store/ariel/events/md/2018/8/28
    [root@m5arch06 store]#
    
    Note: We changed from hashes to md. See the note in the Ariel file structure primer section.

    Because we ran the searches against this data before we began, let's remove those searches so that we don’t get the cursor cache when we run identical searches. Remove the searches from the Manage Search Results from the Log Activity tab.

  6. Run the searches for all data from 09:00 to 10:00.
    Figure 4. New search
    Image showing a search that is run in a specific timeframe.
    Note: The total results are the same as the "not" root user name results from before we ran the ACP tool.
  7. Search for the root user name.
    Figure 5. New search for root user name
    Image showing a search for root usernames that is run within a specific timeframe.
    Note: Some of my indexes were not regenerated. The regular minute-by-minute property-based indexes get created, but the super indexes never get generated and the free text searching /Lucene does not get generated either. (Known issue). Luckily for us, we have the /opt/qradar/bin/ariel_offline_indexer.sh tool!

Usage for the offline indexer script

Use the /opt/qradar/bin/ariel_offline_indexer.sh to generate the super indexes.

Usage is:
[root@m5arch06 store]# /opt/qradar/bin/ariel_offline_indexer.sh --help
usage: options
 -R,--repair         re-build corrupted super indices
 -d,--duration       time duration to look files for in minutes, for
                     example -d 5
 -n,--name           ariel data base name, for example -n events
 -t,--endtime        end time, for example -t "2018/08/28 11:00",
                     optional, by default current system time
 -F,--renamefrom     rename from (internal use)
 -L,--light          load minimal QRadar frameworks
 -T,--renameto       rename to (internal use)
 -V,--validate       validate super indices
 -a,--auto           backfill all active indexes
 -b,--batchmode      run in batch mode with options in a file
 -f,--fts            create free text search indices
 -h,--help           print this message
 -k,--key            property java class name
 -l,--list           list all enabled indices from the configuration
 -p,--param          optional paramiter for property (key creator
                     construction)
 -r,--remove         remove indices for a property
 -s,--superindices   create super indices from the minute indices
 -v,--verbose        verbose (optional, default = false)
 -w,--threads        maximum number of threads to produce minute indices
                     if requested, default is 48
[root@m5arch06 store]#
We'll regenerate the free text index and the super indexes by typing the following command:
[root@m5arch06 store]# /opt/qradar/bin/ariel_offline_indexer.sh -n events -t "2018/08/28 10:00" -d 60 -f -s

2018/08/28 11:04	Running command [-n, events, -t, 2018/08/28 10:00, -d, 60, -f, -s] ...
Trying to index dir: /store/ariel/events/records/2018/8/28/9/lucene[18-08-28,09:00:00]
Completed indexing dir: /store/ariel/events/records/2018/8/28/9/lucene[18-08-28,09:00:00]

All done in 0:02:39.840
[root@m5arch06 store]#

Now our free text index and super indexes are populated.