Streaming time series data have been attracting huge interests in database area and data mining research for decades, due to the its important role in modeling the data properties in various real applications such as processing medical, financial and scientific data. Although existing database techniques have been extended for processing streaming data, the Data Stream Management System can only handle limited number of operations. The practical applications introduce more challenges on the adaptivity and efficiency and desire novel and advanced techniques for online processing. This work utilizes the approximation techniques to address some challenging problems for streaming time series data. Based on some essential studies of the piecewise linear approximation on streaming data, this work further addresses some advanced applications in stream mining, and designs efficient algorithms making use of linear approximation technique.
Along with the review of current research, this thesis addresses the optimized error-bound linear approximation on data stream, which is a fundamental problem for advanced applications. By summarizing several essential properties of the linear function and the convex hull bounding the feasible region, this work interprets the proposed optimal algorithms, which produce the Piecewise Linear Representation with linear time and space complexity. The comparison with existing works demonstrates the superiority of the proposed solutions.
The buffer technique has been applied to optimize error-bound linear representation in streaming scenario. In multi-stream environment, the overall buffer should be divided and allocated to each stream. Since buffer size can affect the approximation quality, this thesis addresses the allocation of the limited overall buffer resource, which aims to optimize the overall approximation quality by adjusting the buffer assigned to each stream. By discovering the statistical relationship between the data behavior and the buffer demand, this work designs a dynamic framework to decide the buffer allocation, which is adaptive to the actual buffer demand and optimizes the overall approximation quality.
This thesis also focuses on the detection of local correlation for streaming data, which is an important issue in stream mining. Such problem is more challenging when there can be certain time delay between correlated subsequences, which is called lag correlation. Based on the fundamental results of Fast Fourier Transform, this work designs the basic solution to detect correlation supporting time delay. This thesis further utilizes the results of linear approximation to estimate the linear relationship between subsequences, which can effectively accelerate the correlation detection, and satisfy the online requirement.
This work also puts efforts to apply approximation technique in real applications. The continuous detection of online video near-duplicates can be modeled as the continuous query on streaming data, however the high dimensionality of video data results in the failure of conventional pattern matching solutions. By discovering the inherent relationship between video frames, this work designs an approximation signature named Video Trend Stream to effectively reduce the dimensionality. This thesis discovers the information of content structure in video data, and proposes cut signature to accelerate the online processing.