### QUESTION 2
#
## Topic: Sheet 8 Exercise 2

## Question:
# So my question here would be: the for loop, would not be necessary in vectors then? (regatding the two codes below)
# If so, then would it only be necessary on data frames? or also in lists and matrices? (for function creating)

# code that student provided:
which.NNA<- function(data) {
for (i in 1:length(data)){
NNA<- which(is.na(c(data)))
fNNA<- data[-NNA]
}
return(fNNA)
}

# code from the solutions

which.na <- function(x){
return (which(is.na(x)))
}

# Although you define a for loop, you are actually not using it in the function (notice, that you
# didn't use i anywhere in the function). Additionally, you could delete c() from NNA expression,
# so that your fuction could properly work for dataframe as well. Try your function with the data below:

# Why we don't need for loop? - explanation:
# Let's create a dataframe, a vector and a matrix to see what happens:

test_dataframe <- data.frame(students = c("Mateo", "Laura", NA, "Ana", NA, "Mia"),
favourite_number = c(4, 2, NA, 89, 35, NA))

test_vector <- c(4, 2, NA, 89, 35, NA)

test_matrix <- matrix(c(4, 2, NA, 89, 35, NA), nrow = 2, ncol = 3, byrow = TRUE)

# now try function which(is.na()) on each data type:

which(is.na(test_dataframe))
##   3  5  9 12
# Explanation: which fuction returns the vector of indeces where values are NA.
# When we applied it to the dataframe, we got, 3, 5, 9 and 12, that's because the ideces in this
# dataframe are distributited as following:
#      1 7
#      2 8
#      3 9
#      4 10
#      5 11
#      6 12
# so, you would not need to use for loop to get the default indeces if that's your goal.

# Let's try with vector.
which(is.na(test_vector))
##  3 6
# Since vectors are one dimensional, the indeces are in one line
# in our test vector indeces would be: 1 2 3 4 5 6, so we have NA at position 3 and 6.

# And finally the matrices. Note: when we created the matrix we added the argument byrow = TRUE,
# which means that the matrix will be filled by rows. Check how the matrix looks like, and try which function.

which(is.na(test_matrix))
##  5 6
# The output is 5 and 6, because of the matrix shape. Indexing is similar as in the dataframe,
# they are indexed by the column. So, we have these indeses:
#      1 3 5
#      2 4 6
# and that's why we got 5 and 6 as an output.

# When would we use a for loop? E.g. when we would like to get the number of column and rows
# that has NA value. I will give an example of for loop, that will hopefully help a bit
# (compare the output with the original dataframe, try it with matrix as well).

for (c in 1:ncol(test_dataframe)){
# ncol() function gives us the total number of columns, thus this for loop will check each colum
if (sum(is.na(test_dataframe[,c])) > 0){
# this if says, if there is any NA in the column c, do the following:

print(paste("There is NA in the column:", c)) # pastes the sentence and the column number and prints

row <- which(is.na(test_dataframe[,c]))
# row variable will store the vector of all indeces

print("in the rows:") # prints text
print(row) # prints a vector with row numbers.
}
}
##  "There is NA in the column: 1"
##  "in the rows:"
##  3 5
##  "There is NA in the column: 2"
##  "in the rows:"
##  3 6