Combining Columns In R

  1. Combining Columns In R
  2. R Add Column To Table
  3. R Delete Columns Data Frame
  4. Merge Two Columns In R
  5. Merge Columns In R
10 min read

This function takes input from two or more columns and allows the contents to be merged them into a single column, using a pattern that specifies the formatting. We can specify which columns to merge together in the columns argument. The string-combining pattern is given in the pattern argument. The first column in the columns series operates as the target column (i.e., will undergo mutation. The generic functions cbind and rbind take a sequence of vector and/or matrix arguments and combine them as the columns or rows, respectively, of a matrix. If there are several matrix arguments, they must all have the same number of columns (or rows) and this will be the number of columns.

2017/09/09

Introduction

This is the tenth post in the series Data Visualization With R. In the previous post, we learnt how to add text annotations to plots. In this post, we will learn how to combine multiple plots. Often, it is useful to have multiple plots in the same frame as it allows us to get a comprehensive view of a particular variable or compare among different variables. The Graphics package offers two methods to combine multiple plots. par() can be used to set graphical parameters regarding plot layout using the mfcol and mfrow arguments. layout() serves the same purpose but offers more flexibility by allowing us to modify the height and width of rows and columns.

The data frames must have same column names on which the merging happens. Merge Function in R is similar to database join operation in SQL. The different arguments to merge allow you to perform natural joins i.e. Inner join, left join, right join,cross join, semi join, anti join and full outer join.

par() allows us to customize the graphical parameters(title, axis, font, color, size) for a particular session. For combining multiple plots, we can use the graphical parameters mfrow and mfcol. These two parameters create a matrix of plots filled by rows and columns respectively. Let us combine plots using both the above parameters.

OptionDescriptionArguments
mfrowFill by rowsNumber of rows and columns
mfcolFill by columnsNumber of rows and columns

mfrow combines plots filled by rows i.e it takes two arguments, the number of rows and number of columns and then starts filling the plots by row. Below is the syntax for mfrow.

Let us begin by combining 4 plots in 2 rows and 2 columns:

Libraries, Code & Data

All the data sets used in this post can be found here and code can be downloaded from here.

Case Study 1

Let us begin by combining 4 plots in 2 rows and 2 columns. The plots will be filled by rows as we are using the mfrow function:

Case Study 2

Combine 2 plots in 1 row and 2 columns.

Combining multiple columns into one in r

Case Study 3

Combine 2 plots in 2 rows and 1 column.

Case Study 4

Combine 3 plots in 1 row and 3 columns.

Case Study 5

Combine 3 plots in 3 rows and 1 column.

mfcol

mfcol combines plots filled by columns i.e it takes two arguments, the number of rows and number of columns and then starts filling the plots by columns. Below is the syntax for mfrow:

Let us begin by combining 4 plots in 2 rows and 2 columns:

Case Study 6

Combine 3 plots in 3 rows and 1 column.

Special Cases

What happens if we specify lesser or more number of graphs? In the next two examples, we will specify lesser or more number of graphs than we ask the par() function to combine. Let us see what happens in such instances:

Case 1: Lesser number of graphs specifiedWe will specify that 4 plots need to be combined in 2 rows and 2 columns but provide only 3 graphs.

Case 2: Extra graph specifiedWe will specify that 4 plots need to be combined in 2 rows and 2 columns but specify 6 graphs instead of 4.

Case Study 8

Layout

At the core of the layout() function is a matrix. We communicate the structure in which the plots must be combined using a matrix. As such, the layout function is more flexible compared to the par() function.

OptionDescriptionValue
matrixmatrix specifying location of plantsmatrix
widthswidth of columnsvector
heightsheight of rowsvector

Let us begin by combining 4 plots in a 2 row/2 column structure. We do this by creating a layout using the matrix function.

Case Study 1

Combine 4 plots in 2 rows/2 columns filled by rows.

Case Study 2

Combine 4 plots in 2 rows/2 columns filled by columns

To fill the plots by column, we specify byrow = FALSE in the matrix.

Case Study 3

Combine 3 plots in 2 rows/2 columns filled by rows

The magic of the layout() function begins here. We want to combine 3 plots and the first plot should occupy both the columns in row 1 and the next 2 plots should be in row 2. If you look at the matrix below, 1 is specified twice and since the matrix is filled by row, it will occupy both the columns in the first row. Similarly the first plot will occupy the entire first row. It will be crystal clear when you see the plot.

Case Study 4

Combine 3 plots in 2 rows/2 columns filled by rows

The plots must be filled by rows and the third plot must occupy both the columns of the second row while the other two plots will be placed in the first row. The matrix would look like this:

Case Study 5

Combine 3 plots in 2 rows/2 columns filled by columns

The plots must be filled by columns and the first plot must occupy both the rows of the first column while the other two plots will be placed in the second column in two rows. The matrix would look like this:

Case Study 6

Combine 3 plots in 2 rows/2 columns filled by columns

Combining

Combining Columns In R

The plots must be filled by columns and the first plot must occupy both the rows of the second column while the other two plots will be placed in the first column in two rows. The matrix would look like this:

R Add Column To Table

Widths

In all the layouts created so far, we have kept the size of the rows and columns equal. What if you want to modify the width and height of the columns and rows? The widths and heights arguments in the layout() function address the above mentioned issue. Let us check them out one by one: The widths argument is used for specifying the width of the columns. Based on the number of columns in the layout, you can specify the width of each column. Let us look at some examples.

Case Study 7

Width of the 2nd column is twice the width of the 1st column

Case Study 8

Width of the 2nd column is twice that of the first and last column

Heights

The heights arguments is used to modify the height of the rows and based on the number of rows specified in the layout, we can specify the height of each row.

Case Study 9

Height of the 2nd row is twice that of the first row

Putting it all together…

Before we end this section, let us combine plots using both the widths and heights option

merge is a generic function whose principal method is for data frames: the default method coerces its arguments to data frames and calls the 'data.frame' method.

By default the data frames are merged on the columns with names they both have, but separate specifications of the columns can be given by by.x and by.y. The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each. For the precise meaning of ‘match’, see match.

R Delete Columns Data Frame

Columns to merge on can be specified by name, number or by a logical vector: the name 'row.names' or the number 0 specifies the row names. If specified by name it must correspond uniquely to a named column in the input.

If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y, i.e., dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y)).

If all.x is true, all the non matching cases of x are appended to the result as well, with NA filled in the corresponding columns of y; analogously for all.y.

Merge Two Columns In R

If the columns in the data frames not used in merging have any common names, these have suffixes ('.x' and '.y' by default) appended to try to make the names of the result unique. If this is not possible, an error is thrown.

If a by.x column name matches one of y, and if no.dups is true (as by default), the y version gets suffixed as well, avoiding duplicate column names in the result.

Merge Columns In R

The complexity of the algorithm used is proportional to the length of the answer.

In SQL database terminology, the default value of all = FALSE gives a natural join, a special case of an inner join. Specifying all.x = TRUE gives a left (outer) join, all.y = TRUE a right (outer) join, and both (all = TRUE) a (full) outer join. DBMSes do not match NULL records, equivalent to incomparables = NA in R.