Read, Row-Bind, Summarize & Visualize Multiple Data Sets in R (4 Examples)

 

In this tutorial, I’ll demonstrate how to import, combine, summarize, and visualize two data sets in the R programming language.

This is a bonus tutorial for the Statistics Globe online course on “Data Manipulation in R Using dplyr & the tidyverse”, and hence we will focus on tidyverse functions in this article as well. By the way, if you’d like to learn more about the inline course, you may check out this page.

The post is structured as follows:

Let’s dive right into it…

 

Installing & Loading tidyverse packages

If we want to use the commands and functions of the tidyverse such as dplyr and ggplot2, we first have to install and load the tidyverse packages:

install.packages("tidyverse")                      # Install & load tidyverse package
library("tidyverse")

Now, we are set up and can move on to importing our data sets.

 

Read Multiple tibbles Using readr Package

In this tutorial, we will use two CSV files. You may download them here:

When reading or writing data in R, the first step is to specify a file location path:

my_path <- "C:/Users/Joach/Desktop/my project/"    # Specify working directory

Next, we can use the read_csv function of the readr package in combination with the str_c function of the stringr package to import our first data set as a tibble object.

data1 <- read_csv(str_c(my_path, "data1.csv"))     # Import first data set
data1                                              # Print first data set

 

table 1 data frame read row bind summarize visualize multiple data sets r

 

Table 1 shows the first rows of our first imported example tibble – it is also shown that our data consists of two columns. The ID column is an identifier for the participants, and the country column contains the countries where each participant is from.

This first data set contains the participants of the initial registration phase of the dplyr course. However, there was a late registration which resulted in another country data set. Let’s read this second data set:

data2 <- read_csv(str_c(my_path, "data2.csv"))     # Import second data set
data2                                              # Print second data set

 

table 2 data frame read row bind summarize visualize multiple data sets r

 

Table 2 shows the output of the previous R programming code: Our second example tibble with the same column names as our first tibble.

Let’s work with these data!

 

Remove Certain Rows of a tibble

In this section, I’ll demonstrate how to extract specific rows of a data set.

The reason why I’m doing this in this tutorial is that 3 out of the 103 participants of the initial registration phase used their 30-days-money-back-guarantee. That’s a great result, by the way, since it shows that almost everybody is happy with the course! 🙂

We can achieve this by using the filter function of the dplyr package:

data1_upd <- data1 %>%                             # Filter rows
  filter(! ID %in% c("ID5", "ID13", "ID55"))
head(data1_upd)

 

table 3 data frame read row bind summarize visualize multiple data sets r

 

After running the previous R code, the new tibble illustrated in Table 3 has been created. As you can see, for example, the ID No. 5 was removed from the data set.

 

Bind Two tibbles by Rows

The next step is to combine our two tibbles in a single tibble. For this, we can apply the bind_rows function of the dplyr package as shown below:

data_all <- data1_upd %>%                          # Row-bind multiple data sets
  bind_rows(data2)
head(data_all)                                     # Head of combined data

 

table 4 data frame read row bind summarize visualize multiple data sets r

 

The output of the previous code is shown in Table 4: The first six rows of a unified tibble that contains 111 rows, i.e. all final participants of the course.

We may also use the tail function to return the bottom rows of our combined tibble:

tail(data_all)                                     # Tail of combined data

 

table 5 data frame read row bind summarize visualize multiple data sets r

 

Let’s check the final number of participants once again using the nrow function:

nrow(data_all)                                     # Total number of participants
# [1] 111

111 participants, what a great success!

 

Summarize tibble

Next, I would like to check the country distribution in the course. For this task, I use the group_by and summarize functions of dplyr.

data_summ <- data_all %>%                          # Convert vector to tibble
  group_by(country) %>%                            # Group tibble
  summarize(country_count = n()) %>%               # Calculate country count
  arrange(desc(country_count))                     # Arrange tibble descendingly
data_summ                                          # Print country data
# # A tibble: 31 × 2
#    country        country_count
#    <chr>                  <int>
#  1 United States             44
#  2 United Kingdom            12
#  3 Germany                    5
#  4 Mexico                     5
#  5 Australia                  4
#  6 Canada                     4
#  7 Netherlands                4
#  8 Ireland                    3
#  9 Spain                      3
# 10 Sweden                     3
# # ℹ 21 more rows
# # ℹ Use `print(n = ...)` to see more rows

As you can see, we have created another tibble that contains the country counts of the participants in our course.

 

Visualize tibble Using ggplot2

Finally, I would like to visualize our country data in a ggplot2 barplot to identify the countries with the most participants.

Take a look at the R code below:

data_summ %>%                                      # Create ggplot2 plot
  ggplot(aes(x = reorder(country, - country_count),
             y = country_count)) +
  geom_col() +                                     # Specify to draw a barplot
  theme(axis.text.x = element_text(angle = 90,     # Vertical x-axis labels
                                   hjust = 1,
                                   vjust = 0.3)) +
  xlab("Country") +                                # Change x-axis label
  ylab("Count") +                                  # Change y-axis label
  annotate("text",                                 # Add text element to plot
           x = 15,
           y = 25,
           label = "Course Participants\nby Country",
           size = 15,
           color = "#1b98e0")

 

ggplot2 barplot

 

As demonstrated in the graph above, the most participants come from the United States, followed by the United Kingdom, Germany, and Mexico. What a nice international group! 🙂

 

Video, Further Resources & Summary

Do you want to know more about the topics of this article? Then I recommend watching the following video instruction which I have published on my YouTube channel. I illustrate the examples of this tutorial in the video.

 

 

In addition, you might take a look at the other tutorials on my website:

 

At this point of the post, you should know how to work with multiple data sets using the functions of the tidyverse in R. If you have further questions, please let me know in the comments section. Furthermore, please subscribe to my email newsletter for updates on the newest articles.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

  • George Carter
    February 8, 2024 4:58 pm

    Great video! How can I rotate plot for easier reading?

    Reply
    • Thanks George, glad you like it! 🙂

      You may use the coord_flip() function for this:

      data_summ %>%                                      # Create ggplot2 plot
        ggplot(aes(x = reorder(country, - country_count),
                   y = country_count)) +
        geom_col() +                                     # Specify to draw a barplot
        theme(axis.text.x = element_text(angle = 90,     # Vertical x-axis labels
                                         hjust = 1,
                                         vjust = 0.3)) +
        xlab("Country") +                                # Change x-axis label
        ylab("Count") +                                  # Change y-axis label
        annotate("text",                                 # Add text element to plot
                 x = 15,
                 y = 25,
                 label = "Course Participants\n by Country",
                 size = 15,
                 color = "#1b98e0") +
        coord_flip()

      ggplot2 barplot

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top