Data Analysis begins with a question which we are trying to answer. Once we have our question and we have identified our data sources, the process of analysis begins.
Three key stages pertaining to Analytics are:
- Preparing to run the model/ analysis.
- Running the model/ analysis.
- Interpreting and Communicating the results
Stage 1: Preparing to run the model: This is the most challenging and most critical of all the steps involved. The data that we get could be structured or unstructured. It is not always in the format we need. The challenge then is to clean it and process it to transform into a state where we can start our analysis.
Some of the steps involved are :
- Collection and acquisition
Clean: The raw data that we access is most of the times is dirty and not in a state where we could start the analysis work. For example, some of the data points may be missing or dates or curreny format may not be consistent across the records. The first step therefore is about the cleaning of the data. Tools like openrefine.org can be made use of for the process.
Shape: Before we begin our analysis, data should be in tidy format which means that each variable that we measure should be in one column and every observation of that variable should be in a separate row. Data in wide format may have to be converted into narrow format before analysis.
Augmenting: Sometimes the data is spread across multiple tables or sources. In order to do the analysis and get an complete picture the data should be augmented from various sources
Stage 2: Conducting the analysis: Once we have the processed the raw data, we can start with the analysis work. Analysis can be of many types depending of the type of study. It can range from a simple descriptive analysis to predictive analysis.
- Descriptive Analysis
Descriptive analysis is about describing the data and it does not involve decision making. Exploratory analysis as the name suggests is about exploring the data to discover new relationships or connections but not necessarily confirm them. Discovery of any relationship can further lead to more rigorous analysis. Inferential analysis is about drawing the inference about the population at larger through statistical analysis of a small random sample drawn from the population. Predictive analysis is about predicting a value of a variable in a after training a predictive model on a training dataset.
Stage 3: Interpretation and Communicating the results: Once the analysis is complete, the results can be published for wider audience. Since the larger audience may not be technical it is important to communicate the results in the manner which can be readily understood by them.
Visualization makes analytics products available to the broadest possible audience. It helps us make sense of complexity. Appropriate use of charts, visualization and dashboards make the communication of results more effective.
Data can be visually represented in many formats like Sankey Diagrams, charts, tables etc. Once we have the data/ results in the pictorial format we can start interpretation of answers through the below means:
- Look for outliers
- Look for similarities and differences
- Look for patterns
- Look for trends
- View on maps