Data Preparation...
Is the most important process when positioning an analysis for modeling. The reason being is that if the data is not properly cleaned and validated; all of your calculations and models will be built on the foundation of inaccurate data.
Everything hangs on the validity, and accuracy of the data sets that you decide to use.
For example if you have misspelled records; it will result in certain records not being grouped appropriately.
If for example you want to group data from the State of California and 20% of the records have an abbreviation or misspellings you will miss out on the true quantity.
If you build calculations regarding the state of California, all of your calculations will be incorrect if not cleaned.
Furthermore if the data is simply not valid and comes from an unverified source, or if it came from your company but there are dormant or improperly ingested and distributed ways of handling data...
The same applies...
How accurate is your data?