Hi guys I didn’t put any post in regarding to the data analytics and visualization i didn’t have time to do that and now we will do a small data analysis using Microsoft Excel.
So you know that we can do many number of calculations like sum, average, difference, median, variance and etcetra in Excel. So today we will do some couple of data analysis calculation with a data set. You can download the data set from this link.
So let us set up the environment in the Excel for the Data analysis. First open your Microsoft Excel if you don’t have Excel please download in this link.
Next import the dataset to the Excel using “open” option in the file.
Now we can see the attributes as Id, Sepal Length, Sepal Width, Petal Length, Petal Width and Species. Actually we will remove Species because it is a categorical value. Now lets start the data analysis using this simple data set.
Before we start the data analysis we need to download some of the add ins to the excel because when we download the excel it will come as the basic version and if we need we can put as many as the add ins and make it as an advanced excel.
First step is click “file” menu and go to options.
Once you click “option” You should get a dialog box like given below.
If you get this dialog box you are fine enough to start your analysis. The next step is to click “Add-Ins” in the dialog box.
Don’t change any settings just click “Go” in the dialog box to enable another dialog box.
In this small dialog box just check in the first two boxes and press “ok” button. Once you are done the excel will download some patch files from internet so you are asked to connected to internet.
Once it is over we need to check where add ins package is downloaded in the excel. Now go to Data section and check if you see a Data Analysis button.
If you cannot see the data analysis button please go back and do the process again the problem expected to be you have not connected to the internet properly that’s why you will get these errors.
Ok let’s start with t-test. What is t-test?
The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t-distribution under the null hypothesis. A t-test is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known.
Lets do it practical by taking two attributes the one is Sepal Length and Petal Length.
Click the “t-test paired Two samples for means”. Once you click that you will get another dialog box.
Give the values which is given above and click ok. The above values depict that B2:B151 tells that Iam taking the value from B’Column and 2'Row to 151'Row. And put the output has H”Row. Click ok.
Check the output the mean depicts the average of the number. Variance is the calculated from the mean. The number of observation shows how many records in the rows. Pearson correlation definition I will tell in the next line. “df” says the difference, the difference is zero because the we took the both the rows as 150. So 150–150=0. And the other or the other statistical calculation of t-test.
Pearson Correlation is a number between -1 and 1 that indicates the extent to which two variables are linearly related. The Pearson correlation is also known as the “product moment correlation coefficient” (PMCC) or simply “correlation”.
Try other statistical calculation and take a screenshot and drop in the comment box. If you have any question please put it in comment box.