這些方式的差異主要為最後留下的key(依照哪個欄位來合併)是什麼。
下圖整理了大致上差異為何。
下面依照網路資源,
修正整理了四大類join方法,
分別使用merge function以及、sqldf、dplyr、data.table套件。
第一及三種方法相當直覺,
而第二種方法對熟悉SQL的使用者來說則比較方便。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right | |
df1 <- data.frame(CustomerId = c(1:5), Product = c(rep("Toaster", 3), rep("Radio", 2))) | |
df2 <- data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1))) | |
##1 | |
##merge | |
#Inner join | |
merge(x = df1, y = df2, by = "CustomerId") | |
#Full join | |
merge(x = df1, y = df2, by = "CustomerId", all = TRUE) | |
#Left join | |
merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE) | |
#Right join | |
merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE) | |
##2 | |
#SQL | |
library(sqldf) | |
#Inner join | |
sqldf("SELECT CustomerId, Product, State | |
FROM df1 | |
JOIN df2 USING(CustomerID)") | |
#Full join | |
sqldf("SELECT CustomerId, Product, State | |
FROM df1 LEFT JOIN df2 USING(CustomerID) | |
UNION | |
SELECT CustomerId, Product, State | |
FROM df2 LEFT JOIN df1 USING(CustomerID)") | |
#Left join | |
sqldf("SELECT CustomerId, Product, State | |
FROM df1 | |
LEFT JOIN df2 USING(CustomerID)") | |
##3 | |
library(dplyr) | |
#Inner join | |
inner_join(df1, df2) | |
#Outer join | |
full_join(df1, df2) | |
#Left join | |
left_join(df1, df2) | |
##4 | |
library(data.table) | |
setDT(df1) | |
setDT(df2) | |
##Set key | |
setkey(df1, CustomerId) | |
setkey(df2, CustomerId) | |
df1[df2] | |
df2[df1] | |
merge(df1,df2,by="CustomerId",all=TRUE) |
沒有留言:
張貼留言