Data mining is an integral part of knowledge discovery in databases (KDD), which is the overal process of converting raw data into useful information.
The process of knowledge discovery in databases:
Input Data
-> Data Preprocessing(Feature Selection, Dimensionality Reduction, Normalization, Data Subsetting) (the most laborious and time-consuming task)
-> Data Mining
-> Postprocessing (Filtering Patterns, Visualization, Pattern Interpretation)
-> Information
The purpose of preprocessing: raw input data -> appropriate format
Steps involved in data preprocessing:
1. fusing data from multiple sources;
2. cleaning data to remove noise and duplicate observatoins;
3. selecting records and features that are relevant to the data mining task at hand.