### QUESTION 3
#
## Topic: na.rm, function rm.NA, Sheet 8 Exercise 2 

## Question: 
# In exercise 2 the rm.NA() function doesn't work at all, also that from the solutions.
# Could you please explain me the meaning of the factor na.rm?

# Answer

# na.rm and rm.NA are two different things.

# na.rm is an option that you find in many R functions. It is used for specifying whether the NA values
# contained in the argument are to be removed before executing the fucntion.
# For example the function mean has by default na.rm=FALSE
# thus if you use mean on a vector with NA without specifying na.rm=TRUE it does not remove the NA
# and thus the output is NA (cannot calculate the mean of a vector containing NA)
x <- c(10,2,5,3, NA)
mean(x)
## [1] NA
# if we add the option na.rm=TRUE when we call mean we can get rid of the NA
# and get the mean value of the non-NA elements of the vector
mean(x, na.rm = TRUE)
## [1] 5
# if we want, we can also add this option in our functions. It will simply be one of the arguments of the function.
# We will have to give it a name and (if we want) a default value. We could perfectly give it another name 
# but it is more convenient to call it na.rm as in other R functions.
# then we will have to specify in the commands of our function what to do if na.rm is TRUE and what to do when it is FALSE.
# You can see an example in the se function from Exercise 3:
se <- function(x, na.rm = FALSE)
{
  if (is.numeric(x) != TRUE)
  {
    warning("Argument is not numeric: returning NA")
    return(NA)
  }
  
  if (na.rm == TRUE)
  {
    x<-x[is.na(x) == FALSE]
  }
  
  return(sd(x)/sqrt(length(x)))
}
# by default we have decided to have na.rm=FALSE (see first line of the function)
# and if it is TRUE we will remove the NAs from the vector before calculating 
# the standard error (lines 9-12 of the function)
# if we use it without specifying na.rm = TRUE it will have the default value FALSE 
# and thus we will not remove NA. This is the same as mean. Let´s see an example
# first you need to run the function code so that r knows it and then:
se(x)
## [1] NA
# it was not able to calculate the value and returned NA
# now if I set the option na.rm = TRUE
se(x, na.rm = TRUE)
## [1] 1.779513
# it calculated the value of sd for elements of x without the NA


# rm.NA is a function we want to write. It does not exist in R and we want to create it
# the goal is that the function removes the NA from a vector and returns the modified vector.
# here is the function from the solution
# let´s check that it works:
rm.na <- function(x)
{
  return (x[!is.na(x)])
}
# you need to execute it (run the lines of code above) and then test it on some vectors:
# a vector without NA
y <- c(10,2,5,3)
rm.na(y)
## [1] 10  2  5  3
# fine it has not removed anything

# now a vector with NA
x <- c(10,2,5,3, NA)
rm.na(x)
## [1] 10  2  5  3
# fine the output is the vector without the NA

# Now let´s explain how it works
# the code is.na(x) uses the R function is.na that answers for each element of a vector
# TRUE if it is a NA value and FALSE else
# let´s see an example with the vector x defined above
is.na(x)
## [1] FALSE FALSE FALSE FALSE  TRUE
# in the code of the function we add ! before that to say we want to keep only 
# the elements that are no NA and thus that have FALSE for is.na
# by adding ! we just reverse. Let´s see:
!is.na(x)
## [1]  TRUE  TRUE  TRUE  TRUE FALSE
# we could also write is.na(x)==FALSE to get the same output
is.na(x)==FALSE
## [1]  TRUE  TRUE  TRUE  TRUE FALSE
# and then we want to keep from the original vector only the elements that are TRUE with 
x[!is.na(x)]
## [1] 10  2  5  3
# or 
x[is.na(x)==FALSE]
## [1] 10  2  5  3
# this is what is returned by the function