Data Manipulation in R (9 Examples)

 

This article shows how to manipulate data frames in R programming.

Table of contents:

You’re here for the answer, so let’s get straight to the exemplifying R code!

 

Creation of Example Data

To begin with, we need to load some exemplifying data. In this tutorial, we’ll use the iris flower data set. We can load this data set using the data() function as shown below:

data(iris)                                            # Load iris data set

We can now use the head() function to print the first six rows of the iris data set to the RStudio console:

head(iris)                                            # Print first six rows of data frame

 

table 1 data frame data manipulation

 

Table 1 shows the first six rows of our example data.

The variables Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width are numerical, and the variable Species has the factor class.

As indicated by the column names, the rows of this data set contain information on different flowers, i.e. the length and width of certain components and the flower species.

In the following examples, I’ll use this data set to demonstrate some of the most important techniques for the wrangling and manipulation of data frames in R.

Let’s do this!

 

Example 1: Select Column of Data Frame

In Example 1, I’ll demonstrate how to extract a certain column from a data frame.

For this task, we can use the $ operator as shown in the following R code:

my_col <- iris$Sepal.Length                           # Convert column to vector

The previous R code has created a new data object called my_col. This data object contains the values of the column Sepal.Length.

We can print the first six elements of this data object using the head function:

head(my_col)                                          # First six elements of vector
# [1] 5.1 4.9 4.7 4.6 5.0 5.4

We might now analyze this data object as desired. Just to show you an example, the following R code calculates the mean value of our data object:

mean(my_col)                                          # Calculate mean of vector
# [1] 5.843333

 

Example 2: Remove Column from Data Frame

Example 2 shows how to delete particular columns from a data frame.

For this example, I’m first creating a duplicate of our example data frame, since I want to keep an original version of the example data:

irisA <- iris                                         # Duplicate data frame

After running the previous lines of code, a new data frame object called irisA has been created, which contains exactly the same values as the original iris data frame.

Next, we can use the colnames function and the != operator to drop a specific column (i.e. the variable Sepal.Length):

irisA <- irisA[ , colnames(irisA) != "Sepal.Length"]  # Drop one column
head(irisA)                                           # Print first six rows of data frame

 

table 2 data frame data manipulation

 

After executing the previous R programming syntax the new data frame shown in Table 2 has been created. As you can see, we have removed the column Sepal.Length.

 

Example 3: Add New Column to Data Frame

In this example, I’ll demonstrate how to add a new column to a data frame.

To achieve this, I’m using the $ operator once again:

irisB <- iris                                         # Duplicate data frame
irisB$new_col <- (1:nrow(irisB))^2                    # Add new column
head(irisB)                                           # Print first six rows of data frame

 

table 3 data frame data manipulation

 

Table 3 illustrates the output of the previous R programming code – A new data frame containing an additional column called new_col.

 

Example 4: Rename Columns of Data Frame

It is also possible to change the names of the columns of a data frame.

We are using the colnames function for this once again. Furthermore, we are using the paste0 function to create column names with the prefix x and a range from 1 to the number of columns in our data frame (i.e. x1, x2, and so on…).

irisC <- iris                                         # Duplicate data frame
colnames(irisC) <- paste0("x", 1:ncol(irisC))         # Change column names
head(irisC)                                           # Print first six rows of data frame

 

table 4 data frame data manipulation

 

By executing the previously shown syntax, we have created Table 4, i.e. a data frame with renamed variable names.

 

Example 5: Reorder Rows of Data Frame

Example 5 shows how to order the rows of a data frame.

More precisely, we are sorting our data frame based on the values in the column Sepal.Length by using the order function:

irisD <- iris                                         # Duplicate data frame
irisD <- irisD[order(irisD$Sepal.Length), ]           # Sort rows by values
head(irisD)                                           # Print first six rows of data frame

 

table 5 data frame data manipulation

 

As shown in Table 5, we have created a new version of our data frame where the rows have been sorted based on the column Sepal.Length.

 

Example 6: Subset Data Frame Rows

Example 6 illustrates how to create a data frame subset by selecting certain rows.

For this task, we can specify the row indices that we want to extract within the c() function:

irisE <- iris                                         # Duplicate data frame
irisE <- irisE[c(1, 3, 5), ]                          # Create data frame subset
irisE                                                 # Print data frame subset

 

table 6 data frame data manipulation

 

As illustrated in Table 6, we have created a data frame subset using the previous R programming code. This subset contains only three rows of the original input data.

 

Example 7: Replace Values in Data Frame

In this example, I’ll illustrate how to exchange certain values in a data frame.

Once again, we can use the c() function. In this specific example, I’m exchanging the first, third, and fifth value in the first column by the value 999:

irisF <- iris                                         # Duplicate data frame
irisF[c(1, 3, 5), 1] <- 999                           # Replace values in data frame
head(irisF)                                           # Print first six rows of data frame

 

table 7 data frame data manipulation

 

In case you want to replace certain values based on a logical condition, please have a look here.

Example 8: Remove Duplicate Rows in Data Frame

This example shows how to drop duplicate rows from a data frame.

As preparation for this example, we first have to create a data frame with duplicates:

irisG <- rbind(iris[1:3, ], iris[2:5, ])              # Create data frame with duplicates
irisG                                                 # Print data frame with duplicates

 

table 8 data frame data manipulation

 

The output of the previously shown R programming code is shown in Table 8. As you can see, the fourth and fifth rows contain the same values as the second and third rows.

If we want to delete these duplicates, we can use the unique function as demonstrated in the following R code:

irisH <- unique(irisG)                                # Drop duplicates from data frame
irisH                                                 # Print data frame with unique rows

 

table 9 data frame data manipulation

 

Table 9 reveals the output of the previous R syntax – We have kept only the unique rows in our input data matrix.

 

Example 9: Aggregate Values in Data Frame by Group

As you might have already noticed, our data frame contains different species groups (or categories). We may use this information to calculate certain descriptive metrics by group.

Example 9 shows how to use the aggregate function to calculate the sum of all the values in the different species groups.

irisI <- aggregate(Sepal.Length ~ Species,            # Calculate sum by group
                   iris,
                   sum)
irisI                                                 # Print sum by group

 

table 10 data frame data manipulation

 

By running the previous syntax, we have managed to construct Table 10, i.e. an aggregated data set showing the sum by group.

 

Video, Further Resources & Summary

Have a look at the following video on my YouTube channel. I’m showing the examples of this tutorial in the video:

 

The YouTube video will be added soon.

 

Besides the video, you may want to have a look at some of the related tutorials on my website. There’s much more to explore in case you want to manipulate, wrangle, and manage data frames in R:

 

In addition, you may have a look at the dplyr package of the tidyverse and the data.table package. They also provide excellent functions for manipulating and working with data sets – especially for advanced R users.

This tutorial has shown basic and advanced examples on how to edit and handle data frames in the R programming language. Please let me know in the comments section, in case you have further comments and/or questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top