Journal of Computer Science

FROM DATA MINING AND KNOWLEDGE DISCOVERY TO BIG DATA ANALYTICS AND KNOWLEDGE EXTRACTION FOR APPLICATIONS IN SCIENCE

Subana Shanmuganathan

DOI : 10.3844/jcssp.2014.2658.2665

Journal of Computer Science

Volume 10, Issue 12

Pages 2658-2665

Abstract

“Data mining” for “knowledge discovery in databases” and associated computational operations first introduced in the mid-1990 s can no longer cope with the analytical issues relating to the so-called “big data”. The recent buzzword big data refers to large volumes of diverse, dynamic, complex, longitudinal and/or distributed data generated from instruments, sensors, Internet transactions, email, video, click streams, noisy, structured/unstructured and/or all other digital sources available today and in the future at speeds and on scales never seen before in human history. The big data also being described using 3 Vs, volume, variety and velocity (with an additional 4th V for “veracity” and more recently with a 5th V for “value”), requires a set of new technologies, such as high performance computing i.e., exascale, architectures (distributed or grid), algorithms (for data clustering and generating association rules), programming languages, automated and scalable software tools, to uncover hidden patterns, unknown correlations and other useful information lately referred to as “actionable knowledge” or “data products” from the massive volumes of complex raw data. In view of the above facts, the paper gives an introduction to the synergistic challenges in “data-intensive” science and “exascale” computing for resolving “big data analytics” and “data science” issues in four main disciplines namely, computer science, computational science, statistics and mathematics. For the realisation of vital identified foundational aspects of an effective cyber infrastructure, basic problems need to be addressed adequately in the respective disciplines and are outlined. Finally, the paper looks at five scientific research projects that are urgently in need of high performance computing; this is in contrast to the earlier situations where private business enterprises were the drivers of better modern and faster technologies.

Copyright

© 2014 Subana Shanmuganathan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.