R - Matrix, List, Factor, Data Frame


  • Matrix
    • Matrix is a vector with dimension attribute (dimension itself is a integer vector of length 2 : nrow and ncol).
    •  > m <- matrix(1:10, nrow = 2, ncol = 5) # create matrix   
       > m # print matrix  
          [,1] [,2] [,3] [,4] [,5]  
       [1,]  1  3  5  7  9  
       [2,]  2  4  6  8  10  
       > dim(m) # get dimension of m  
       [1] 2 5  
       > attributes(m) # get attributes of m  
       $dim  
       [1] 2 5  
      
    • As you can see, Matrix constructed from column-first order (column-wise).
    • Matrix can be created from vector by adding a dimension attribute.
    •  > m <- 1:10  # vector
       > dim(m) <- c(2,5)  # add dimension attribute to vector, m will become Matrix
       > m  
          [,1] [,2] [,3] [,4] [,5]  
       [1,]  1  3  5  7  9  
       [2,]  2  4  6  8  10  
      
    • Matrix can be created from cbind() or rbind() function.
    •  > a <- 1:2  
       > b <- 3:4  
       > c <- 5:6  
       > d <- 7:8  
       > e <- 9:10  
       > cbind(a,b,c,d,e)  # create Matrix with cbind()
          a b c d e  
       [1,] 1 3 5 7 9  
       [2,] 2 4 6 8 10  
       > f <- 1:10  # what if the length of the column is different?
       > cbind(a,b,c,d,e,f)  # create Matrix with cbind(), f is length 10
          a b c d e f  
        [1,] 1 3 5 7 9 1  
        [2,] 2 4 6 8 10 2  
        [3,] 1 3 5 7 9 3  
        [4,] 2 4 6 8 10 4  
        [5,] 1 3 5 7 9 5  
        [6,] 2 4 6 8 10 6  
        [7,] 1 3 5 7 9 7  
        [8,] 2 4 6 8 10 8  
        [9,] 1 3 5 7 9 9  
       [10,] 2 4 6 8 10 10  
    • How to do multiple operation for matrix?
    •  > x <- matrix(1:4, 2, 2); y <- matrix(rep(10,4),2,2)  
       > y  
          [,1] [,2]  
       [1,]  10  10  
       [2,]  10  10  
       > x  
          [,1] [,2]  
       [1,]  1  3  
       [2,]  2  4  
       > x * y  # it just do multiple in element-wise 
          [,1] [,2]  
       [1,]  10  30  
       [2,]  20  40  
       > x %*% y  # matrix multiplication
          [,1] [,2]  
       [1,]  40  40  
       [2,]  60  60  
      
  • List
    • A special type of vector that contain different classes of objects
    •  > x <- list(TRUE, "list", 1L)  
       > x  
       [[1]]  
       [1] TRUE  
       [[2]]  
       [1] "list"  
       [[3]]  
       [1] 1  
  • Factor
    • used for categorical data 
    •  > x <- factor(c("male","male","female","female"))  # create factor
       > x  # print x
       [1] male  male  female female  
       Levels: female male  
       > table(x)  # call table() to show how many items have in each labels
       x  
       female  male   
          2   2   
      
    • Factor labels have order, you could set order while creating with factor()
    •  > x <- factor(c("male","male","female","female"), levels = c("male","female"))  
       > x  
       [1] male  male  female female  
       Levels: male female  # order has been changed from example above
  • Data Frame
    • for storing tabular data, can have different classes of objects in each column.
    •  > x <- data.frame(sex = c("male","female", "male", "male", "female"), age = 26:30)  
       > x  
          sex age  
       1  male 26  
       2 female 27  
       3  male 28  
       4  male 29  
       5 female 30  
      
    • Use nrow() and ncol() function to get the no. of rows and columns
    •  > nrow(x)  
       [1] 5  
       > ncol(x)  
       [1] 2  
      
    • How to change column names in data frame? -> use colnames()
    •  > my_data  
        patients X1 X2 X3 X4 X5  
       1   Bill 1 5 9 13 17  
       2   Gina 2 6 10 14 18  
       3  Kelly 3 7 11 15 19  
       4   Sean 4 8 12 16 20  
       > cnames <- c("patient","age","weight","bp","rating","test")  
       > colnames(my_data) <- cnames  
       > my_data  
        patient age weight bp rating test  
       1  Bill  1   5 9   13  17  
       2  Gina  2   6 10   14  18  
       3  Kelly  3   7 11   15  19  
       4  Sean  4   8 12   16  20  
      
    • Check how much memory the dataset is occupying.
    •  > object.size(x)  
       1312 bytes  
      
    • Suppose data frame is pretty big and have thousands of rows, probably you could check first several rows using head(). (Use tail() for last n rows)
    •  > head(x)  # head() function returns first 6 rows
          sex age  
       1  male 26  
       2 female 27  
       3  male 28  
       4  male 29  
       5 female 30  
       > head(x,3)  # specify how many rows you want to see
          sex age  
       1  male 26  
       2 female 27  
       3  male 28  
      
  • Sequence
    • There are several ways to create sequence
    •  > 1:20  
        [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  
       > seq(1,20)  # seq() does exactly the same thing : does
        [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  
       > seq(1,20,by=0.5)  # set by argument so that increase by 0.5 each time
        [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0  
       [16] 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5  
       [31] 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0  
      
       > x <- seq(1,20,length = 5)  # create with length 5, (values between 1~20)
       > length(x)  
       [1] 5  
       > 1:length(x)  # use length to create vector 
       [1] 1 2 3 4 5   
       > seq_along(x)  # the same as 1:length(x)
       [1] 1 2 3 4 5   
       > rep(1, times = 10)  # rep replicate value 1 for 10 times
        [1] 1 1 1 1 1 1 1 1 1 1  
       > rep(c(0,1,2), times = 5)  # replicate 0,1,2 for 5 times
        [1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2  
       > rep(c(0,1,2), each = 5)  # replicate 0 for 5 times and 1 for 5 times...
        [1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2  
      
  • Change class
 > yesno <- sample(c("yes","no"), size = 10, replace = TRUE)  
 > class(yesno)  
 [1] "character"  
 > yesnoFactor <- as.factor(yesno)  
 > yesnoFactor  
  [1] yes no yes no yes yes yes no no no   
 > yesnoFactor <- factor(yesno, levels = c("yes","no"))  
 > yesnoFactor  
  [1] yes no yes no yes yes yes no no no   
 Levels: yes no  
 > as.numeric(yesnoFactor)  
  [1] 1 2 1 2 1 1 1 2 2 2