###QUESTION 05

###Topic: Sheet 9 Exercise 1
###Questions:
##1) Why do you use "var.equal = FALSE" additionally to the command?
#     In my solution I had not used this command and got exactly the same solution
##2) I also wanted to know if the command "paired = FALSE" is necessary in this case
#     or can be skipped because R assumes that an unpaired t-test is calculated?
##3) Finally, I wanted to ask why a comma should be added to the square bracket
#     "data.ccrt[data.ccrt$population == "BKK",]" after "BKK" before closing it?

###Answer:
##1)You should not get the exact same solution, look at the degrees of freedom (df) and the p-value

data.ccrt <- read.table("C:/Users/Ingo/Documents/A-Uni/EES/3rd_semester/Rcourse2020_Tutoring/data/ccrt.txt", header = TRUE)

var_noneq <- t.test(data.ccrt[data.ccrt$population == "KATH",]$ccrt, 
       data.ccrt[data.ccrt$population == "BKK",]$ccrt, 
       paired = FALSE, 
       var.equal = FALSE)

var_eq <- t.test(data.ccrt[data.ccrt$population == "KATH",]$ccrt, 
       data.ccrt[data.ccrt$population == "BKK",]$ccrt, 
       paired = FALSE, 
       var.equal = TRUE)

var_noneq$parameter
##       df 
## 175.9615
var_eq$parameter
##  df 
## 223
var_noneq$p.value
## [1] 3.686528e-10
var_eq$p.value
## [1] 7.150701e-11
#The solution says that we don't know the variance, but actually we do, you may remember var = sd^2
tapply(data.ccrt$ccrt, data.ccrt$population, var)
##       BKK      KATH 
## 141.57834  66.85144
#Still, even if we didn't know the variances we should assume they are not equal, because Bangkok and Kathmandu flies are different populations
#Individuals from different populations should have different variances.
#Depending on if the variances are the same or not you will either perform a Two-sample t-test (equal var) or a Welch Two-sample t-test (different var)
#You can also see which t-test has been run in the output
var_noneq$method
## [1] "Welch Two Sample t-test"
var_eq$method
## [1] " Two Sample t-test"
#In both tests, the t-statistic and df are being calculated differently (more on that in the stats lecture next semester)

##2)You are right, if you check the help file you can see that paired = FALSE is set as default, if you know that this is the case you can of course leave it out
#If you are just starting to use a command, however, I would recommend still writing this down to help remember what the default setting is
#As you can see you could also leave out var.equal = FALSE, as this is also the default

t.test(data.ccrt[data.ccrt$population == "KATH",]$ccrt, 
       data.ccrt[data.ccrt$population == "BKK",]$ccrt)
## 
##  Welch Two Sample t-test
## 
## data:  data.ccrt[data.ccrt$population == "KATH", ]$ccrt and data.ccrt[data.ccrt$population == "BKK", ]$ccrt
## t = -6.6436, df = 175.96, p-value = 3.687e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.948003  -6.475203
## sample estimates:
## mean of x mean of y 
##  36.77869  45.99029
##3)The comma before the end of the bracket in data.ccrt[data.ccrt$population == "BKK",] signifies that you want to take all columns from this
#leaving the comma out would result in an "undefined columns selected" error
#alternatively you can also specify that you only want a single or not all of the columns, by for example specifying
data.ccrt[data.ccrt$population == "BKK",2]
##   [1] 24 25 26 27 27 27 27 27 27 28 29 30 30 31 31 32 32 33 34 34 35 35 36 36 37
##  [26] 37 37 37 38 38 38 39 39 39 40 41 42 42 43 43 43 43 43 44 44 45 45 45 46 46
##  [51] 46 46 46 47 47 47 47 48 48 48 48 49 49 50 51 51 51 51 51 52 53 53 53 54 54
##  [76] 54 54 54 55 55 55 56 58 58 58 58 59 59 59 60 61 61 61 62 62 63 65 68 68 69
## [101] 69 69 70
#which only gives you the ccrt values without the population name
#when leaving the space after the comma empty you are saying that you want to take all of the columns