vimarsana.com

A statistical solution to processing very large datasets efficiently with memory limit

E-Mail
IMAGE: Estimating the variance of the number of clusters and the sample size for which it is maximum can give us an estimate of the total number of clusters for the...
view more
Credit: Ryo Maezono from JAIST.
Ishikawa, Japan - Any high-performance computing should be able to handle a vast amount of data in a short amount of time -- an important aspect on which entire fields (data science, Big Data) are based. Usually, the first step to managing a large amount of data is to either classify it based on well-defined attributes or--as is typical in machine learning--"cluster" them into groups such that data points in the same group are more similar to one another than to those in another group. However, for an extremely large dataset, which can have trillions of sample points, it is tedious to even group data points into a single cluster without huge memory requirements.

Related Keywords

Ibaraki ,Osaka ,Japan ,Tokyo ,Ryo Maezono ,Keishu Utimula ,University Of Tokyo ,National Institute For Materials Science ,Japan Advanced Institute Of Science ,Big Data ,Professor Ryo Maezono ,Japan Advanced Institute ,Advanced Theory ,Information Science ,National Institute ,Materials Science ,Chemistry Physics Materials Sciences ,Mathematics Statistics ,Technology Engineering Computer Science ,இபராகி ,ஒசகக ,ஜப்பான் ,டோக்கியோ ,பல்கலைக்கழகம் ஆஃப் டோக்கியோ ,தேசிய நிறுவனம் க்கு பொருட்கள் அறிவியல் ,பெரியது தகவல்கள் ,தகவல் அறிவியல் ,தேசிய நிறுவனம் ,பொருட்கள் அறிவியல் ,வேதியியல் இயற்பியல் பொருட்கள் அறிவியல் ,கணிதம் புள்ளிவிவரங்கள் ,தொழில்நுட்பம் பொறியியல் கணினி அறிவியல் ,

vimarsana.com © 2020. All Rights Reserved.