Teiresias-based Pattern Discovery On Integers

What Is This Tool For? - Input - Options - Parameters - Mode of Operation - Output - References

What Is This Tool For?

This tool allows the user to carry out pattern discovery on event streams that consist of positive integers.

Why did we do this? In the papers that described the Teiresias algorithm and its applications, and in order to simplify the presentation therein, we used a small-size alphabet based on alphanumeric characters. Examples include: nucleotides (= {A, C, G, T}), amino acids (= {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}), the English alphabet (= {A, B, C, D, E, F, G, H, I, J, K, L, M, N, P, O, Q, R, S, T, U, V, W, X, Y, Z}), etc. However, such alphabet sets do not contain enough distinct symbols to accommodate the needs that frequently arise in many other problems where pattern discovery can be useful. Consequently, we have implemented a version of Teiresias where the permitted "alphabet" set is the set of positive integers. In fact, you can use as many as 231 -1 distinct positive integers to forms input streams.

There is a very large number of problems that can be solved using this version of the algorithm. Essentially, any pattern discovery problem that can be converted into a stream of positive integers can be solved with this tool.

__________________________________________________________________________

__________________________________________________________________________

Input Format

This tools takes as inputs data lines consisting of space/tab-separated integers. Carriage returns indicate a new event stream. The web version of the tool does not require that label lines precede data lines; moreover, it will automatically add the integer "-1" at the end of each data line so that it can be processed by the Teiresias algorithm. However, if you run this version of Teiresias on the command line, your will need to add label lines and also to terminate each data line with "-1"

__________________________________________________________________________

__________________________________________________________________________

Options

The following option is available to the user:

__________________________________________________________________________

__________________________________________________________________________

Parameters

The parameters you can set here are the following:

__________________________________________________________________________

__________________________________________________________________________

Mode Of Operation

The input to be processed consists of streams of space/tab-separated positive integers. We assume that you will re-map the original set of "symbols" to a set of positive integers of your choice which we will then process for you. The data lines are permitted to have different lengths in the general case.

__________________________________________________________________________

__________________________________________________________________________

Output Format

After you prepared and entered the input in the provided window, click on the COMPUTE button. Once the processing has completed, the results will be reported as follows:

Each line corresponds to an integer-based pattern with a 'dot' representing a wild-card. The leftmost number in each line is the rank of the pattern. The second and third numbers in each line are the number of instances of the pattern and the number of input sequences that contain these instances respectively.

You can next click on a pattern to select it, then click on the SEQUENCES button:

This will open a new window that will show the original input sequences with the instances of the selected pattern highlighted.

__________________________________________________________________________

__________________________________________________________________________

References