- Matrix
- Matrix is a vector with dimension attribute (dimension itself is a integer vector of length 2 : nrow and ncol).
> m <- matrix(1:10, nrow = 2, ncol = 5) # create matrix
> m # print matrix
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> dim(m) # get dimension of m
[1] 2 5
> attributes(m) # get attributes of m
$dim
[1] 2 5
As you can see, Matrix constructed from column-first order (column-wise).
Matrix can be created from vector by adding a dimension attribute.
> m <- 1:10 # vector
> dim(m) <- c(2,5) # add dimension attribute to vector, m will become Matrix
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Matrix can be created from cbind() or rbind() function.
> a <- 1:2
> b <- 3:4
> c <- 5:6
> d <- 7:8
> e <- 9:10
> cbind(a,b,c,d,e) # create Matrix with cbind()
a b c d e
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> f <- 1:10 # what if the length of the column is different?
> cbind(a,b,c,d,e,f) # create Matrix with cbind(), f is length 10
a b c d e f
[1,] 1 3 5 7 9 1
[2,] 2 4 6 8 10 2
[3,] 1 3 5 7 9 3
[4,] 2 4 6 8 10 4
[5,] 1 3 5 7 9 5
[6,] 2 4 6 8 10 6
[7,] 1 3 5 7 9 7
[8,] 2 4 6 8 10 8
[9,] 1 3 5 7 9 9
[10,] 2 4 6 8 10 10
How to do multiple operation for matrix?
> x <- matrix(1:4, 2, 2); y <- matrix(rep(10,4),2,2)
> y
[,1] [,2]
[1,] 10 10
[2,] 10 10
> x
[,1] [,2]
[1,] 1 3
[2,] 2 4
> x * y # it just do multiple in element-wise
[,1] [,2]
[1,] 10 30
[2,] 20 40
> x %*% y #
matrix multiplication
[,1] [,2]
[1,] 40 40
[2,] 60 60
- List
- A special type of vector that contain different classes of objects
> x <- list(TRUE, "list", 1L)
> x
[[1]]
[1] TRUE
[[2]]
[1] "list"
[[3]]
[1] 1
- Factor
- used for categorical data
> x <- factor(c("male","male","female","female")) # create factor
> x # print x
[1] male male female female
Levels: female male
> table(x) #
call table() to show how many items have in each labels
x
female male
2 2
Factor labels have order, you could set order while creating with factor()
> x <- factor(c("male","male","female","female"), levels = c("male","female"))
> x
[1] male male female female
Levels: male female # order has been changed from example above
- Data Frame
- for storing tabular data, can have different classes of objects in each column.
> x <- data.frame(sex = c("male","female", "male", "male", "female"), age = 26:30)
> x
sex age
1 male 26
2 female 27
3 male 28
4 male 29
5 female 30
Use nrow() and ncol() function to get the no. of rows and columns
> nrow(x)
[1] 5
> ncol(x)
[1] 2
How to change column names in data frame? -> use colnames()
> my_data
patients X1 X2 X3 X4 X5
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
> cnames <- c("patient","age","weight","bp","rating","test")
>
colnames(my_data) <- cnames
> my_data
patient age weight bp rating test
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
Check how much memory the dataset is occupying.
> object.size(x)
1312 bytes
Suppose data frame is pretty big and have thousands of rows, probably you could check first several rows using head(). (Use tail() for last n rows)
> head(x) #
head() function returns first 6 rows
sex age
1 male 26
2 female 27
3 male 28
4 male 29
5 female 30
> head(x,3) #
specify how many rows you want to see
sex age
1 male 26
2 female 27
3 male 28
Sequence
- There are several ways to create sequence
> 1:20
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> seq(1,20) #
seq() does exactly the same thing : does
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> seq(1,20,by=0.5) #
set by argument so that increase by 0.5 each time
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
[16] 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5
[31] 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0
> x <- seq(1,20,length = 5) # create with length 5, (values between 1~20)
> length(x)
[1] 5
> 1:length(x) # use length to create vector
[1] 1 2 3 4 5
> seq_along(x) # the same as 1:length(x)
[1] 1 2 3 4 5
> rep(1, times = 10) # rep replicate value 1 for 10 times
[1] 1 1 1 1 1 1 1 1 1 1
> rep(c(0,1,2), times = 5) #
replicate 0,1,2 for 5 times
[1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
> rep(c(0,1,2), each = 5) #
replicate 0 for 5 times and 1 for 5 times...
[1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2
Change class
> yesno <- sample(c("yes","no"), size = 10, replace = TRUE)
> class(yesno)
[1] "character"
> yesnoFactor <-
as.factor(yesno)
> yesnoFactor
[1] yes no yes no yes yes yes no no no
> yesnoFactor <-
factor(yesno, levels = c("yes","no"))
> yesnoFactor
[1] yes no yes no yes yes yes no no no
Levels: yes no
> as.numeric(yesnoFactor)
[1] 1 2 1 2 1 1 1 2 2 2