R Merge Two Datasets

Using rbind to merge two R data frames We’ve encountered rbind before, when appending rows to a data frame. This function stacks the two data frames on top of each other, appending the second data frame to the first. For this function to operate, both data frames need to have the same number of columns and the same column names. Combining many datasets in R At least once a year I meet with a graduate student who has many separate datasets that need to be combined into a single file. The data are usually from a series of data loggers (e.g., iButtons or RFID readers) that record data remotely over a specified time period. Currently dplyr supports four types of mutating joins, two types of filtering joins, and a nesting join. Mutating joins combine variables from the two data.frames. Innerjoin return all rows from x where there are matching values in y, and all columns from x and y.If there are multiple matches between x and y, all combination of the matches are returned.

merge {base}R Documentation

Merge Two Data Frames

Description

Merge two data frames by common columns or row names, or do otherversions of database join operations.

Usage

Arguments

x, y

data frames, or objects to be coerced to one.

by, by.x, by.y

specifications of the columns used for merging.See ‘Details’.

all

logical; all = L is shorthand for all.x = L andall.y = L, where L is either TRUE orFALSE.

all.x

logical; if TRUE, then extra rows will be added tothe output, one for each row in x that has no matching row iny. These rows will have NAs in those columns that areusually filled with values from y. The default isFALSE, so that only rows with data from both x andy are included in the output.

all.y

logical; analogous to all.x.

sort

logical. Should the result be sorted on the bycolumns?

suffixes

a character vector of length 2 specifying the suffixesto be used for making unique the names of columns in the resultwhich are not used for merging (appearing in by etc).

no.dups

logical indicating that suffixes are appended inmore cases to avoid duplicated column names in the result. Thiswas implicitly false before R version 3.5.0.

incomparables

values which cannot be matched. Seematch. This is intended to be used for merging on onecolumn, so these are incomparable values of that column.

...

arguments to be passed to or from methods.

Details

merge is a generic function whose principal method is for dataframes: the default method coerces its arguments to data frames andcalls the 'data.frame' method.

By default the data frames are merged on the columns with names theyboth have, but separate specifications of the columns can be given byby.x and by.y. The rows in the two data frames thatmatch on the specified columns are extracted, and joined together. Ifthere is more than one match, all possible matches contribute one roweach. For the precise meaning of ‘match’, seematch.

Columns to merge on can be specified by name, number or by a logicalvector: the name 'row.names' or the number 0 specifiesthe row names. If specified by name it must correspond uniquely to anamed column in the input.

If by or both by.x and by.y are of length 0 (alength zero vector or NULL), the result, r, is theCartesian product of x and y, i.e.,dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y)).

If all.x is true, all the non matching cases of x areappended to the result as well, with NA filled in thecorresponding columns of y; analogously for all.y.

If the columns in the data frames not used in merging have any commonnames, these have suffixes ('.x' and '.y' bydefault) appended to try to make the names of the result unique. Ifthis is not possible, an error is thrown.

If a by.x column name matches one of y, and ifno.dups is true (as by default), the y version gets suffixed aswell, avoiding duplicate column names in the result.

The complexity of the algorithm used is proportional to the length ofthe answer.

Datasets

In SQL database terminology, the default value of all = FALSEgives a natural join, a special case of an innerjoin. Specifying all.x = TRUE gives a left (outer)join, all.y = TRUE a right (outer) join, and both(all = TRUE) a (full) outer join. DBMSes do not matchNULL records, equivalent to incomparables = NA in R.

Value

A data frame. The rows are by default lexicographically sorted on thecommon columns, but for sort = FALSE are in an unspecified order.The columns are the common columns followed by theremaining columns in x and then those in y. If thematching involved row names, an extra character column calledRow.names is added at the left, and in all cases the result has‘automatic’ row names.

Note

This is intended to work with data frames with vector-like columns:some aspects work with data frames containing matrices, but not all.

Currently long vectors are not accepted for inputs, which are thusrestricted to less than 2^31 rows. That restriction also applies tothe result for 32-bit platforms.

See Also

data.frame,by,cbind.

dendrogram for a class which has a merge method.

Examples

merge {base}R Documentation

Merge Two Data Frames

Projects

R Merge Two Different Datasets

Description

Merge two data frames by common columns or row names.

Usage

Arguments

x, y data frames, or objects to be coerced to one
by, by.x, by.y specifcations of the common columns. See Details.
sort logical. Should the results be sorted on the by columns?

Rstudio Merge Two Datasets

Details

By default the data frames are merged on the columns with names they both have, but separate specifcations of the columns can be given by by.x and by.y. Columns can be specified by name, number or by a logical vector: the name 'row.names' or the number 0 specifies the row names. The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each.

If the remaining columns in the data frames have any common names, these have '.x' and '.y' appended to make the names of the result unique.

Value

Datasets For Data Mining Projects

A data frame. The rows are by default lexicographically sorted on the common columns, but are otherwise in the order in which they occurred in x. The columns are the common columns followed by the remaining columns in x and then those in y. If the matching involved row names, an extra column Row.names is added at the left, and in all cases the result has no special row names.

See Also

data.frame, by, cbind

Combine Data In R

Examples