### QUESTION 3
#
## Topic: na.rm, function rm.NA, Sheet 8 Exercise 2
## Question:
# In exercise 2 the rm.NA() function doesn't work at all, also that from the solutions.
# Could you please explain me the meaning of the factor na.rm?
# Answer
# na.rm and rm.NA are two different things.
# na.rm is an option that you find in many R functions. It is used for specifying whether the NA values
# contained in the argument are to be removed before executing the fucntion.
# For example the function mean has by default na.rm=FALSE
# thus if you use mean on a vector with NA without specifying na.rm=TRUE it does not remove the NA
# and thus the output is NA (cannot calculate the mean of a vector containing NA)
x <- c(10,2,5,3, NA)
mean(x)
## [1] NA
# if we add the option na.rm=TRUE when we call mean we can get rid of the NA
# and get the mean value of the non-NA elements of the vector
mean(x, na.rm = TRUE)
## [1] 5
# if we want, we can also add this option in our functions. It will simply be one of the arguments of the function.
# We will have to give it a name and (if we want) a default value. We could perfectly give it another name
# but it is more convenient to call it na.rm as in other R functions.
# then we will have to specify in the commands of our function what to do if na.rm is TRUE and what to do when it is FALSE.
# You can see an example in the se function from Exercise 3:
se <- function(x, na.rm = FALSE)
{
if (is.numeric(x) != TRUE)
{
warning("Argument is not numeric: returning NA")
return(NA)
}
if (na.rm == TRUE)
{
x<-x[is.na(x) == FALSE]
}
return(sd(x)/sqrt(length(x)))
}
# by default we have decided to have na.rm=FALSE (see first line of the function)
# and if it is TRUE we will remove the NAs from the vector before calculating
# the standard error (lines 9-12 of the function)
# if we use it without specifying na.rm = TRUE it will have the default value FALSE
# and thus we will not remove NA. This is the same as mean. Let´s see an example
# first you need to run the function code so that r knows it and then:
se(x)
## [1] NA
# it was not able to calculate the value and returned NA
# now if I set the option na.rm = TRUE
se(x, na.rm = TRUE)
## [1] 1.779513
# it calculated the value of sd for elements of x without the NA
# rm.NA is a function we want to write. It does not exist in R and we want to create it
# the goal is that the function removes the NA from a vector and returns the modified vector.
# here is the function from the solution
# let´s check that it works:
rm.na <- function(x)
{
return (x[!is.na(x)])
}
# you need to execute it (run the lines of code above) and then test it on some vectors:
# a vector without NA
y <- c(10,2,5,3)
rm.na(y)
## [1] 10 2 5 3
# fine it has not removed anything
# now a vector with NA
x <- c(10,2,5,3, NA)
rm.na(x)
## [1] 10 2 5 3
# fine the output is the vector without the NA
# Now let´s explain how it works
# the code is.na(x) uses the R function is.na that answers for each element of a vector
# TRUE if it is a NA value and FALSE else
# let´s see an example with the vector x defined above
is.na(x)
## [1] FALSE FALSE FALSE FALSE TRUE
# in the code of the function we add ! before that to say we want to keep only
# the elements that are no NA and thus that have FALSE for is.na
# by adding ! we just reverse. Let´s see:
!is.na(x)
## [1] TRUE TRUE TRUE TRUE FALSE
# we could also write is.na(x)==FALSE to get the same output
is.na(x)==FALSE
## [1] TRUE TRUE TRUE TRUE FALSE
# and then we want to keep from the original vector only the elements that are TRUE with
x[!is.na(x)]
## [1] 10 2 5 3
# or
x[is.na(x)==FALSE]
## [1] 10 2 5 3
# this is what is returned by the function