question04.R

### QUESTION 2
#
## Topic: Sheet 8 Exercise 2

## Question: 
# So my question here would be: the for loop, would not be necessary in vectors then? (regatding the two codes below)
# If so, then would it only be necessary on data frames? or also in lists and matrices? (for function creating)

# code that student provided: 
which.NNA<- function(data) {
    for (i in 1:length(data)){
      NNA<- which(is.na(c(data)))
      fNNA<- data[-NNA]
    }
    return(fNNA)
  }

# code from the solutions
  
which.na <- function(x){
    return (which(is.na(x)))
  }


## Answer: 
# Although you define a for loop, you are actually not using it in the function (notice, that you 
# didn't use i anywhere in the function). Additionally, you could delete c() from NNA expression, 
# so that your fuction could properly work for dataframe as well. Try your function with the data below: 

# Why we don't need for loop? - explanation: 
# Let's create a dataframe, a vector and a matrix to see what happens: 

test_dataframe <- data.frame(students = c("Mateo", "Laura", NA, "Ana", NA, "Mia"), 
                              favourite_number = c(4, 2, NA, 89, 35, NA))

test_vector <- c(4, 2, NA, 89, 35, NA)

test_matrix <- matrix(c(4, 2, NA, 89, 35, NA), nrow = 2, ncol = 3, byrow = TRUE)

# now try function which(is.na()) on each data type: 

which(is.na(test_dataframe))

## [1]  3  5  9 12

# Explanation: which fuction returns the vector of indeces where values are NA. 
# When we applied it to the dataframe, we got, 3, 5, 9 and 12, that's because the ideces in this 
# dataframe are distributited as following: 
#      1 7 
#      2 8
#      3 9
#      4 10
#      5 11
#      6 12
# so, you would not need to use for loop to get the default indeces if that's your goal. 

# Let's try with vector. 
which(is.na(test_vector))

## [1] 3 6

# Since vectors are one dimensional, the indeces are in one line
# in our test vector indeces would be: 1 2 3 4 5 6, so we have NA at position 3 and 6. 


# And finally the matrices. Note: when we created the matrix we added the argument byrow = TRUE, 
# which means that the matrix will be filled by rows. Check how the matrix looks like, and try which function. 

which(is.na(test_matrix))

## [1] 5 6

# The output is 5 and 6, because of the matrix shape. Indexing is similar as in the dataframe, 
# they are indexed by the column. So, we have these indeses: 
#      1 3 5 
#      2 4 6
# and that's why we got 5 and 6 as an output. 

# When would we use a for loop? E.g. when we would like to get the number of column and rows 
# that has NA value. I will give an example of for loop, that will hopefully help a bit 
# (compare the output with the original dataframe, try it with matrix as well). 

for (c in 1:ncol(test_dataframe)){
  # ncol() function gives us the total number of columns, thus this for loop will check each colum
  if (sum(is.na(test_dataframe[,c])) > 0){
    # this if says, if there is any NA in the column c, do the following:
    
    print(paste("There is NA in the column:", c)) # pastes the sentence and the column number and prints
    
    row <- which(is.na(test_dataframe[,c]))
    # row variable will store the vector of all indeces
    
    print("in the rows:") # prints text
    print(row) # prints a vector with row numbers. 
  }
}

## [1] "There is NA in the column: 1"
## [1] "in the rows:"
## [1] 3 5
## [1] "There is NA in the column: 2"
## [1] "in the rows:"
## [1] 3 6

question04.R

Kosmickizaborav

2020-03-12