With the increase in the amount of data collected, knowledge discovery in database (KDD) has become an important research area. The goal of KDD is to efficiently identify valid, novel, and non-trivial knowledge from a large amount of data. As a research topic of KDD, knowledge discovery from temporal data is of great importance in many application domains, such as engineering, medicine, biology and finance. In this doctoral dissertation, our research is on the context of a long temporal event sequence, which is a large number of events ordered by their timestamps. Two fundamental research problems in KDD are investigated, i.e., defining and discovering a new type of pattern.
Most traditional studies in temporal data mining treat every type of event equally. However, in some applications (e.g., telecommunication network fault analysis), not every type of event is of equal importance to the goals of data analysis. In order to adapt to the speciality of event types, we define a new type of pattern, called the event-oriented (E-0) pattern, which has either a positive or negative association with a special type of event, called the target event. These E-0 patterns could be of great help in understanding the dependency between the target event and other types of events. Moreover, E-0 patterns that are strongly associated with the target event can be applied for the target event prediction.
We classify the E-0 patterns into four types according to the types of association between the target event and E-0 patterns. Based on this classification, the problem of finding E-0 patterns is decomposed into sub-tasks, each of which is studied in detail.
For the purpose of predicting target events, we define strong event-oriented (S-E-0) pattern, which can be regarded as a variation on the first type of E-0 pattern with a strong positive association. After the problem of finding S-E-0 patterns is addressed, we further investigate two related research problems. Firstly, due to the fact that inaccurate temporal information of events may affect the correctness of the mining result, we propose an uncertainty model to deal with this negative impact. Secondly, as a post-analysis on the S-E-0 pattern mining result, we refine S-E-0 patterns by reducing the size of their prediction intervals.