This page has been replicated and moved, onto the new blog site at https://developer.ibm.com/qradar/2017/08/22/1775/. Updates are available on that page, including estimating storage requirements.
What is an Event?
In QRadar terms, an event is a message we receive and process from a device on your network, that represents the log of some particular action on that device. An example would be an "ssh login" on a unix server, a VPN connection to a vpn device, or a firewall deny logged by your perimeter firewall. These actions represent something that occurs at an instance of time and are logged.
What identifies a unique event?
QRadar identifies a unique event based on a series of properties - source ip, destination ip, destination port, protocol, username, and log source id/event id. In some rare circumstances, source port is also used. Once 4 events come in with the same key properties, they are coalesced into a single record, for a period of 10 seconds. Once this period of time passes, the cycle would repeat. See this related post for additional discussion on coalescing.
Coalescing - what are the reasons behind using or not using it?
Coalescing is used as a means of reducing data going into the pipeline. As data comes in and is 'coalesced', a large burst of events, such as a firewall reporting a denial of service attack, could convert hundreds of thousands of events into only a few dozen records, while maintaining the count of the number of actual events. This allows QRadar the ability to see and enumerate/track an attack on a huge scale, but protects the performance of the pipeline by reducing the workload of the system, including storage requirements for those events.
One limitation to using coalescing, however, is that when data is being normalized, the first event in the coalesced record, that is used as the base record, is the only one that is kept in its entirety, including the payload. For this reason, many users are required to disable coalescing for devices/log sources that are used to track audit and compliance requirements in their environment. Examples of these kinds of devices could be custom applications, any customer-facing services, critical assets, or other similarly important devices.
How do different event/log sources compare?
There are -many- different supported log sources, and kinds of log sources, in QRadar. From firewalls, authentication devices, scanners, file servers, application platforms, etc. Each of these log sources types, as they are referred to in QRadar, provide a different perspective and type of information about your network. For example, a firewall will report the number of remote systems trying to get into your network, while a windows or ldap authentication server will provide you information about local staff members logging into network resources. Your monitoring, audit and security needs, will influence the kinds of log sources you end up sending into QRadar.
If we use a load balancer, do my events get parsed by any event collector? do multiple log sources get created?
Yes, any syslog based source, that is sending data to a load balancer in front of QRadar, can be parsed on any of the event collectors. All auto-detected log sources in QRadar can be processed by any event collector in the deployment. When auto-detection fires & creates a log source, a "create log source" request is sent to the QRadar console, and the log source is created. Within a minute, all event processors/collectors are aware of this new log source, and any data sent to any event processor, is associated to that log source automatically. This was implemented in this manner to allow for users to enable a load balancer in front of multiple event collectors/processors.
One 1 log source would be created as well, in this scenario. While multiple "create" commands may be sent from multiple processors during the first few minutes of a log source coming in, it is only created once. The log source management code on the console, once it receives the 'create' command, looks to see if it is already there, if not, creates it, and if already there, simply ignores the additional create request.
What do the 3 time stamps seen in event details represent?
There are 3 timestamp fields on events in qRadar - Start Time, Storage Time, and Log Source Time. These can have different values depending on where the data originated, when data arrives and when it is written in QRadar. Below are details for each time property:
The "Start Time" in an event record, represents when the event arrived at a QRadar event collector. When events arrive into the pipeline, an object is created in memory, and the 'start time' is set to that time.
The "Storage Time" is when data is written out to disk by the ariel component at the end of the event pipeline. This can be useful for determining if the event pipeline is backed up, for performance or licensing reasons. If you are investigating events "delayed" in the pipeline, or messages about licensing or dropped events because of licensing, you can look at the start->storage timestamps to see how far apart they are, which can indicate how delayed the pipeline may be.
Log Source Time
The "Log Source Time" is a timestamp that is pulled from the event payload itself. Most commonly, the time in the event syslog header that is the value that is used, though some log sources have time stamps included in the payload, such as Windows logs that have a "MessageTime" field in the body of the payload. Keep these in mind when looking at timestamp discrepancies, and look in payloads for possible timestamp values. If there is no time available in the payload at all then the "Log Source Time" field will be populated with the same value as "Start Time".
How does QRadar assign a source & destination ip address to my events?
QRadar events require both a source & destination IP to be assigned; they cannot be blank. QRadar has up to 3 locations to locate an ip address
from the event payload - primary method
Supported Log Sources (and universal log sources, if you create your own parsing regex patterns) will search the payload of received events, for a source and destination IP address. When located, these are placed into the associated source & destination fields of the event records. If no ip addresses are available in the payload, the following methods will be used.
hostname field in syslog header - second method
If there are no ip addresses in the event payload, the hostname field of the syslog header is used. This is quite common for events from log sources, that only include a "source" address in the field, such as web service logs, that only include the remote host ip address. In events such as this, the "destination" ip address is populated by the syslog header / hostname field of the event.
If the syslog header/hostname field is a hostname, and NOT an ip address, no DNS lookup is performed and instead, the method 3 below is used.
source ip address of the network packet - third/last method
If no ip addresses are available in a) the event payload, or b) the syslog header field, then the source IP address of the network packet is used as the only available address. Again, as with option 2, if the source IP address is pulled from the payload, then this packet ip address is only used as the destination address field. This is normally because the device that sends QRadar the event message, was the destination address of the event in the first place.
If both source & destination IP addresses were not found in either the payload, or in the syslog header hostname field, then both the source & destination ip address of the event will be assigned the network packet ip address.
IP address caveat - Central Syslog Servers and/or NAT devices
As mentioned above, when QRadar parses IP addresses from events, it has 3 options. When there are no IP addresses available in the payload, the syslog header hostname field is used, if it's an IP address, and as a last resource, the network packet IP address is used.
This becomes an issue for customers who have an existing central syslog service infrastructure, who wish to add a "forwarding" rule to this device, that copies a stream of all events to the QRadar system. For any sources that do not include IP address fields, or use hostnames in the syslog header, the ip address that qradar uses, is the packet ip address. This is why many users with a central syslog server, will often see it's IP address in a large number of events, and even in the log source names themselves, for any that are autodetected by traffic analysis, and don't have their own syslog headers.
A way to avoid this with a central syslog server, is to configure the syslog server to prepend a new syslog header, that includes the original source ip address of the packet that was received. This is fairly common feature, and practice, when forwarding events - qradar itself even offers it as part of the "Forwarding Destinations" options (see attachment). When this is done, at the very least, there is always the IP address of the original event source device, in the syslog header hostname field, and QRadar can use that in the events. With NAT Devices, though, this cannot really be done, and you may require going back to the log source devices themselves, and reconfiguring them to use the IP address of the host in the syslog header hostname field, rather than a string based hostname.
For example, syslog-ng services refers to this option as "chain_hostname" - https://www.google.ca/search?q=syslog-ng+chain_hostnames
For additional information on events, please check this Knowledgebase Article on event data and normalization. If you have any additional questions, feel free to enter them below, and I'll expand them in this post.