Background of sequential patterns
Sequential patterns are frequent patterns that are available in one or several successive transactions of many input sequences. Due to its scalability, the PrefixSpan algorithm is used for sequential patterns mining.
The PrefixSPan algorithm organizes the search for frequent patterns by systematically considering item sets and item set sequences of increasing size in consecutive iterations. The PrefixSPan algorithm does multiple passes through the data and creates iteratively several prefix trees, one per prefix. A prefix is a frequent item. All prefix trees together are a compact representation of the data set contents. Nodes of the tree represent single frequent items, and store their occurrence counts and the transaction time in which they are found. The path from the root node of the tree to the node represents a sequential pattern.
The PrefixSpan algorithm searches the complete set of patterns but avoids the creation of unnecessary candidates. Moreover, sorting the items and prefix-projection reduce the size of projected databases significantly and lead to efficient processing.