E-Mail
IMAGE: Estimating the variance of the number of clusters and the sample size for which it is maximum can give us an estimate of the total number of clusters for the...
view more
Credit: Ryo Maezono from JAIST.
Ishikawa, Japan - Any high-performance computing should be able to handle a vast amount of data in a short amount of time -- an important aspect on which entire fields (data science, Big Data) are based. Usually, the first step to managing a large amount of data is to either classify it based on well-defined attributes or--as is typical in machine learning--"cluster" them into groups such that data points in the same group are more similar to one another than to those in another group. However, for an extremely large dataset, which can have trillions of sample points, it is tedious to even group data points into a single cluster without huge memory requirements.