發現當資料量大的時候,
dplyr 跟 data.table 處理會差很多(個人處理到200萬筆資料時速度上就有明顯差異)。
此篇文章有 data.table 的介紹:連結,
詳細內容可以看上述文章,
下面舉三個例子(有的文章內沒有),
分別為 dplyr 的 mutate、group_by 以及在 grepl 上的應用,
對應到 data.table 時是怎麼使用的。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(dplyr) | |
library(data.table) | |
DT <- data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c=13:18, word=c("Hello!", "Hi!", "Good!", "Hi!Hello", "High", "Ha")) | |
DT | |
DF <- data.frame(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c=13:18, word=c("Hello!", "Hi!", "Good!", "Hi!Hello", "High", "Ha")) | |
DF | |
##Add new col | |
#data.table | |
DT[, new_value:=b+c] | |
DT | |
#dplyr | |
DF <- DF %>% mutate(new_value = b + c) | |
DF | |
##Group | |
#data.table | |
DT[, new_sum_value:=sum(a), by=.(ID)] | |
DT | |
#dplyr | |
DF <- DF %>% group_by(ID) %>% mutate(new_sum_value = sum(a)) | |
DF | |
##Grepl | |
#data.table | |
DT[word %like% "Hi"] | |
#dplyr | |
DF[grepl("Hi", DF$word),] |
沒有留言:
張貼留言