The Journey Begins
I have started on a journey to know more about data science. There is a lot of science and statistics behind using big data tools and this blog is about making a few notes along the way.
One of the things I learnt today was about T-test. There is the Student's t-test and the Welch t-test. They are hypothesis tests on two samples. The t-test can be used, for example, to determine if two sets of data are significantly different from each other.
Below is a piece of code using the scipy library of python on using the Welch t-test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy | |
import scipy.stats | |
import pandas | |
def compare_averages(filename): | |
baseball_data = pandas.read_csv(filename) | |
l_data = baseball_data[baseball_data['handedness'] == 'L'] # select rows for left handed bastmen | |
r_data = baseball_data[baseball_data['handedness'] == 'R'] | |
l_avg = l_data[['avg']] # select column for average | |
r_avg = r_data[['avg']] | |
test_statistic, p_value = scipy.stats.ttest_ind(l_avg, r_avg, equal_var=False) | |
test_statistic = test_statistic[0] | |
p_value = p_value[0] | |
hypothesis = True | |
if p_value < 0.05: | |
hypothesis = False | |
return (hypothesis, (test_statistic, p_value)) |
The baseball data is from the Lahman database.
This is really very nice blog post.I like your blog ,it is full of knowledge and gain.keep it up.
ReplyDeleteNitro Pro Enterprise Portable Crack
GlassWire Elite Crack
Parallels Desktop Crack
Freemake Video Converter Crack
Bitdefender Total Security crack
Avira Phantom VPN Pro Crack
KMPlayer Crack
UMT Dongle crack
Adobe Illustrator CC crack