### QUESTION 2
#
## Topic: Sheet 8 Exercise 2
## Question:
# So my question here would be: the for loop, would not be necessary in vectors then? (regatding the two codes below)
# If so, then would it only be necessary on data frames? or also in lists and matrices? (for function creating)
# code that student provided:
which.NNA<- function(data) {
for (i in 1:length(data)){
NNA<- which(is.na(c(data)))
fNNA<- data[-NNA]
}
return(fNNA)
}
# code from the solutions
which.na <- function(x){
return (which(is.na(x)))
}
## Answer:
# Although you define a for loop, you are actually not using it in the function (notice, that you
# didn't use i anywhere in the function). Additionally, you could delete c() from NNA expression,
# so that your fuction could properly work for dataframe as well. Try your function with the data below:
# Why we don't need for loop? - explanation:
# Let's create a dataframe, a vector and a matrix to see what happens:
test_dataframe <- data.frame(students = c("Mateo", "Laura", NA, "Ana", NA, "Mia"),
favourite_number = c(4, 2, NA, 89, 35, NA))
test_vector <- c(4, 2, NA, 89, 35, NA)
test_matrix <- matrix(c(4, 2, NA, 89, 35, NA), nrow = 2, ncol = 3, byrow = TRUE)
# now try function which(is.na()) on each data type:
which(is.na(test_dataframe))
## [1] 3 5 9 12
# Explanation: which fuction returns the vector of indeces where values are NA.
# When we applied it to the dataframe, we got, 3, 5, 9 and 12, that's because the ideces in this
# dataframe are distributited as following:
# 1 7
# 2 8
# 3 9
# 4 10
# 5 11
# 6 12
# so, you would not need to use for loop to get the default indeces if that's your goal.
# Let's try with vector.
which(is.na(test_vector))
## [1] 3 6
# Since vectors are one dimensional, the indeces are in one line
# in our test vector indeces would be: 1 2 3 4 5 6, so we have NA at position 3 and 6.
# And finally the matrices. Note: when we created the matrix we added the argument byrow = TRUE,
# which means that the matrix will be filled by rows. Check how the matrix looks like, and try which function.
which(is.na(test_matrix))
## [1] 5 6
# The output is 5 and 6, because of the matrix shape. Indexing is similar as in the dataframe,
# they are indexed by the column. So, we have these indeses:
# 1 3 5
# 2 4 6
# and that's why we got 5 and 6 as an output.
# When would we use a for loop? E.g. when we would like to get the number of column and rows
# that has NA value. I will give an example of for loop, that will hopefully help a bit
# (compare the output with the original dataframe, try it with matrix as well).
for (c in 1:ncol(test_dataframe)){
# ncol() function gives us the total number of columns, thus this for loop will check each colum
if (sum(is.na(test_dataframe[,c])) > 0){
# this if says, if there is any NA in the column c, do the following:
print(paste("There is NA in the column:", c)) # pastes the sentence and the column number and prints
row <- which(is.na(test_dataframe[,c]))
# row variable will store the vector of all indeces
print("in the rows:") # prints text
print(row) # prints a vector with row numbers.
}
}
## [1] "There is NA in the column: 1"
## [1] "in the rows:"
## [1] 3 5
## [1] "There is NA in the column: 2"
## [1] "in the rows:"
## [1] 3 6