Role of Statistics in Data Science
As I am trying to study Data Science, I had learned that there are several competencies that one should have to be a Data Scientist. One of them is Statistics. In this article I will be discussing the relevance and importance of statistics in Data Science.
Relevance of Statistics in the field of Data Science
Long before the advent of high computing computers, statistics had been in the forefront in a lot of the innovations we see today. One good example in the use of statistics even before we have computers, is in researches because it answers the certainty an event will happen from a quantifiable observation. Technically, we can say that Data Science on its core is a marriage between Statistics and Technology. Key concepts in statistics such as Regression, Probability Distributions and the likes are widely used in the field of Data Science.
Importance of Data Preparation before Data Analysis
Data preparation is probably the most time-consuming part of Data Analysis. Having a clean set of data is extremely important to be able to analyze and derive an insight. For instance, if we are getting the average age of people in a certain country and someone had an age of 1000, the average age will increase significantly to an unrealistic human average age. It is always a good idea to check your data before starting your analysis.