Data Intensive Mediator-based Web Services Composition with Self-tuning Histogram

Zhang, Yu (2006). Data Intensive Mediator-based Web Services Composition with Self-tuning Histogram MPhil Thesis, School of Information Technology and Electrical Engineering, University of Queensland.

       
Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
n01front_Zhang.pdf n01front_Zhang.pdf application/pdf 143.24KB 3
n02content_Zhang.pdf n02content_Zhang.pdf application/pdf 654.09KB 1
Author Zhang, Yu
Thesis Title Data Intensive Mediator-based Web Services Composition with Self-tuning Histogram
School, Centre or Institute School of Information Technology and Electrical Engineering
Institution University of Queensland
Publication date 2006
Thesis type MPhil Thesis
Supervisor AsPr Xiaofang Zhou
Total pages 136
Subjects 291799 Communications Technologies not elsewhere classified
Abstract/Summary Effectively using heterogeneous, distributed information has attracted much research in recent years. Current web services technologies have been used suc- cessfully in some non data intensive distributed systems. However, we still need to investigate the performance of web services applied on data intensive distributed systems. To take a step further, compositing several distributed services together normally needs web service composition technique. Obviously different compo- sition plans would end up with different performances at the end. Especially in a data intensive environment, choosing an optimal plan becomes even more critical. For example, Environment Protection agency, Queensland (EPA) needs an effective and optimal solution of providing spatial information services from several organizations. The data volumes need to be transferred and processed of are extremely high. A bad composition plan may end up running for days. In the contrast, a good plan may only take a couple of minutes. Therefore, our work here is to provide EPA a system prototype that is smart enough to effectively provide spatial information services by using web services over the Internet. To make our system clever enough to know what the best plan is, we propose a cost model to predict and estimate the cost of each plan. This cost model requires data selectivity estimation as a key parameter. In another words, the more precise the selectivity estimation on underlying dataset is, the more accurate our system is to pick up an optimal plan. Therefore, to make more accurate estimation and suite web services dynamic feature, we propose a query execution feedback learn- ing based histogram and parametric technique to estimate range query selectivity. Actually histograms have been studied extensively in the context of selectivity es- timation and approximate query processing. Comparing to static histograms that require periodical histogram reconstruction to reflect changes of the underlying data distribution, workload-aware dynamic histograms can self-tune itself based on user query feedback. Without scanning or sampling the underlying datasets in a systematic and comprehensive way, dynamic histograms allocate more buckets not only for the areas with most skewed data distribution but also according to users' interests. A major limitation of such an approach, however, is that it takes long time to warm-up (i.e., a large number of queries need to be processed before the histogram can provide a satisfactory coverage and accuracy), and it is less effective to adapt to workload changes. To sum up, in this thesis, we investigate performance issues of applying web services in data intensive environment. To optimize dynamic web service com- positions strategies, we propose a cost model to measure different composition plans. And we develop a self-tuning histogram and parametric technique to esti- mate key parameters of our proposed cost model. This is the core contribution of our work as it has a direct impact on our system's performance.

 
Citation counts: Google Scholar Search Google Scholar
Created: Fri, 21 Nov 2008, 14:50:32 EST