[R] Data Frame 資料集合併(merge, join)

資料集的合併常用的大致為Inner Join、Full Join、Left Join、Right Join。
這些方式的差異主要為最後留下的key(依照哪個欄位來合併)是什麼。
下圖整理了大致上差異為何。


下面依照網路資源,
修正整理了四大類join方法,
分別使用merge function以及、sqldf、dplyr、data.table套件。
第一及三種方法相當直覺,
而第二種方法對熟悉SQL的使用者來說則比較方便。

#http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right
df1 <- data.frame(CustomerId = c(1:5), Product = c(rep("Toaster", 3), rep("Radio", 2)))
df2 <- data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))
##1
##merge
#Inner join
merge(x = df1, y = df2, by = "CustomerId")
#Full join
merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
#Left join
merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)
#Right join
merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)
##2
#SQL
library(sqldf)
#Inner join
sqldf("SELECT CustomerId, Product, State
FROM df1
JOIN df2 USING(CustomerID)")
#Full join
sqldf("SELECT CustomerId, Product, State
FROM df1 LEFT JOIN df2 USING(CustomerID)
UNION
SELECT CustomerId, Product, State
FROM df2 LEFT JOIN df1 USING(CustomerID)")
#Left join
sqldf("SELECT CustomerId, Product, State
FROM df1
LEFT JOIN df2 USING(CustomerID)")
##3
library(dplyr)
#Inner join
inner_join(df1, df2)
#Outer join
full_join(df1, df2)
#Left join
left_join(df1, df2)
##4
library(data.table)
setDT(df1)
setDT(df2)
##Set key
setkey(df1, CustomerId)
setkey(df2, CustomerId)
df1[df2]
df2[df1]
merge(df1,df2,by="CustomerId",all=TRUE)
view raw join hosted with ❤ by GitHub

Related Posts:

沒有留言:

張貼留言