Data mining over large data sets is considered to be a very important research subject due to its obvious commercial potential. However, it is also a major challenge due to its complexity and computational intensity. Exploiting the inherent parallelism of data mining algorithms provides a direct solution by utilising the large data retrieval and processing power of parallel architectures. In this paper, we classify various data mining algorithms with respect to their most effective parallel structure. We study induction based classification algorithms, neural networks, clustering algorithms and genetic algorithms. This classification is based on our intensive research on the parallelisation of data mining algorithms. We also present a methodology for determining the proper parallelisation strategy based on the idea of algorithmic skeletons and performance modelling. This research aims to provide a systematic way to develop parallel data mining algorithms and applications.
pubs.doc.ic.ac.uk: built & maintained by Ashok Argent-Katwala.