New system cleans messy data tables automatically : vimarsan

New system cleans messy data tables automatically

New system cleans messy data tables automatically
May 12, 2021MIT
MIT researchers have created a new system that automatically cleans “dirty data” — the typos, duplicates, missing values, misspellings, and inconsistencies dreaded by data analysts, data engineers, and data scientists. The system, called PClean, is the latest in a series of domain-specific probabilistic programming languages written by researchers at the Probabilistic Computing Project that aim to simplify and automate the development of AI applications (others include one for
According to surveys conducted by Anaconda and Figure Eight, data cleaning can take a quarter of a data scientist’s time. Automating the task is challenging because different datasets require different types of cleaning, and common-sense judgment calls about objects in the world are often needed (e.g., which of several cities called “Beverly Hills” someone lives in). PClean provides generic common-sense models for these kinds of judgment calls that can be customized to specific databases and types of errors.

Related Keywords

Florida , United States , California , University Of California At Berkeley , Missouri , Texas , Beverly Hills , David Pfau , David Sontag , Hanna Pasula , Monica Agrawal , Alex Lew , Vikashk Mansinghka , Stuart Russell , Department Of Brain , Society For Artificial Intelligence , Linkedin , Cognitive Sciences , Medicare Physician Compare National , Department Of Electrical Engineering , Probabilistic Computing Project , Figure Eight , Electrical Engineering , Computer Science , Computing Project , Artificial Intelligence , புளோரிடா , ஒன்றுபட்டது மாநிலங்களில் , கலிஃபோர்னியா , பல்கலைக்கழகம் ஆஃப் கலிஃபோர்னியா இல் பெர்க்லி , மிச Ou ரி , டெக்சாஸ் , பெவர்லி மலைகள் , டேவிட் ப்போ , டேவிட் சொந்தக் , அலெக்ஸ் ல்யூ , ஸ்டூவர்ட் ரஸ்ஸல் , துறை ஆஃப் மூளை , சென்டர் , அறிவாற்றல் அறிவியல் , துறை ஆஃப் மின் பொறியியல் , எண்ணிக்கை எட்டு , மின் பொறியியல் , கணினி அறிவியல் , கணினி ப்ராஜெக்ட் , செயற்கை உளவுத்துறை ,