TrAgLor - Turkish Agricultural Learning Objects Repository

Object Details

Efficiency of Random Sampling Based Data Size Reduction on Computing Time and Validity of Clustering in Data Mining

Identifier : Catalog : URI
Entry : http://journal.magisz.org/index.php/jai/article/download/266/pdf_266


Title : English Efficiency of Random Sampling Based Data Size Reduction on Computing Time and Validity of Clustering in Data Mining


Language : English English



Descriptions : English In data mining, cluster analysis is one of the widely used analytics to discover existing groups in datasets. However, the traditional clustering algorithms become insufficient for the analysis of big data which have been formed with the enormous increase in the amount of collected data in recent years. Therefore, the scalability has been one of the most intensively studied research topics for clustering big data. The parallel clustering algorithms and the Map-Reduce framework based techniques on multiple machines are getting popular in scalability for big data analysis. However, applying the sampling techniques on big datasets could be still alternative or complementary task in order to run the traditional algorithms on single machines. The results obtained in this study showed that the data size reduction by the simple random sampling could be successfully used in cluster analysis for large datasets. The clustering validities by running K-means algorithm on the sample datasets were found as high as those of the complete datasets. Additionally the required execution time for cluster analysis on the sample datasets was significantly shorter than those obtained for the complete datasets.


Keywords : English data reduction
English random sampling
English cluster analysis
English external validity indices
English big data
English k-means clustering


Coverage : World


Structure : Atomic


Aggregation Level : Level 1


Version : English JAI, 2016


Status : Final


Contribute : Role : Publisher
Date : 2016-05-03
name : Journal of Agricultural Informatics
e-mail : herdon@agr.unideb.hu
organization : The Hungarian Association of Agricultural Informatics




Identifier : Catalog : URI
Entry : http://traglor.cu.edu.tr/common/object_xml.aspx?id=1952


Contribute : Role : Author
Date : 2016-05-03
name : Zeynel Cebeci
e-mail : cebeciz@gmail.com
organization : Çukurova Üniversitesi Ziraat Fakültesi Biyometri ve Genetik Anabilim Dalı


Metadata Schema : TrAgLor LOM AP


Language : English English
Format : Text


Requirements : Operating System: Multios
Min ver :
Max ver :
Browser: Any
Min ver :
Max ver :


Installation Remarks :


Other Platform Requirements :


Duration : Year : 0 Month : 0 Day : 0 Hour : 0 Minutes : 0


Size : bytes


Location : http://journal.magisz.org/index.php/jai/article/download/266/pdf_266


Interactivity Type : Expositive


Learning Resource Type : Research


Interactivity Level : Low


Semantic Density : Very High


Intended End User Role : Other


Context : University Postgraduate


Typical Age Range : Turkish 18Ü


Difficulty Level : Difficult


Duration : Year : 0 Month : 0 Day : 0 Hour : 0 Minutes : 0


Description :


Cost : No


Copyright and Other Restrictions : Yes


Description : This resource is licensed under the license(CC-BY-NC-ND) Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported


Kind : IsPartOf


Resource : Catalog : ISSN
Entry : 2061-862X


Description : English Journal of Agricultural Informatics (ISSN 2061-862X) 2016 Vol. 7, No. 1:53-64


Entity : name :
e-mail :
organization :


Date :


Description :


Purpose : Discipline


Source : Turkish AGRICOLA


Entry : Computer and Library Sciences


Description :


Keywords : English data mining
English data reduction
English cluster analysis
English random sampling