Analyze & Visualize Country Data in R Using dplyr & ggplot2 (Example)

 

Recently, I have launched the first-ever Statistics Globe online course on “Data Manipulation in R Using dplyr & the tidyverse“, and this course has 103 participants ( Hooray! 🙂 ).

In this tutorial, I will use the country information from these participants to show how to analyze and visualize country data in the R programming language.

The content of the post is structured as follows:

Let’s start right away.

 

Creating Example Data

The first step is to create some data that we can use in the tutorial later on. Since we want to analyze the country information of our participants, we first have to store these data in a vector object:

x <- c("United Kingdom",                             # Create vector of countries
       "United Kingdom",
       "Australia",
       "United States",
       "United States",
       "United Kingdom",
       "Netherlands",
       "Austria",
       "United States",
       "United States",
       "Ireland",
       "United States",
       "United States",
       "United States",
       "Japan",
       "United States",
       "United States",
       "Bangladesh",
       "Congo",
       "Spain",
       "Spain",
       "Netherlands",
       "United States",
       "United States",
       "United States",
       "Chile",
       "United States",
       "Canada",
       "United States",
       "Spain",
       "United Kingdom",
       "Ireland",
       "United Kingdom",
       "Mexico",
       "Namibia",
       "United States",
       "India",
       "United States",
       "Romania",
       "Mexico",
       "Canada",
       "Tanzania",
       "Netherlands",
       "Portugal",
       "Germany",
       "United Kingdom",
       "United States",
       "United States",
       "Australia",
       "Sweden",
       "Japan",
       "Canada",
       "United States",
       "Italy",
       "France",
       "Germany",
       "Germany",
       "Sweden",
       "Mexico",
       "New Zealand",
       "Mexico",
       "South Korea",
       "United States",
       "United States",
       "United States",
       "South Africa",
       "United States",
       "United States",
       "Australia",
       "United States",
       "United States",
       "United States",
       "Iceland",
       "United States",
       "United States",
       "United Kingdom",
       "United Kingdom",
       "Ireland",
       "Germany",
       "United States",
       "United States",
       "Singapore",
       "United Kingdom",
       "United States",
       "Mexico",
       "United Kingdom",
       "Norway",
       "Brazil",
       "United States",
       "United States",
       "Canada",
       "Netherlands",
       "Canada",
       "Sweden",
       "United States",
       "United States",
       "United Kingdom",
       "Germany",
       "United States",
       "United States",
       "United States",
       "United States",
       "Trinidad and Tobago")

The previous R code has created a vector object called x, which contains one country name for each participant in the course.

Let’s work with these data!

 

Manipulate & Analyze Country Data

The following code explains how to calculate summary statistics for our country data using the packages of the tidyverse (i.e. dplyr & ggplot2).

First, we have to install and load the tidyverse packages by running the code below:

install.packages("tidyverse")                        # Install tidyverse package
library("tidyverse")                                 # Load tidyverse

We can use the group_by() and summarize() function if we want to calculate summary statistics by group using the tidyverse.

In this specific example, I’m interested in the country counts. To calculate this, we can use the n() function within the summarize() function.

Furthermore, I’d like to order my grouped output tibble from the most represented countries to the least represented countries. We can do this using the arrange() and desc() functions.

Take a look at the syntax and its output below:

my_tib_grouped <- tibble(country = x) %>%            # Convert vector to tibble
  group_by(country) %>%                              # Group tibble
  summarize(country_count = n()) %>%                 # Calculate country count
  arrange(desc(country_count))                       # Arrange tibble descendingly
my_tib_grouped                                       # Print country data
# # A tibble: 30 × 2
#    country        country_count
#    <chr>                  <int>
#  1 United States             40
#  2 United Kingdom            11
#  3 Canada                     5
#  4 Germany                    5
#  5 Mexico                     5
#  6 Netherlands                4
#  7 Australia                  3
#  8 Ireland                    3
#  9 Spain                      3
# 10 Sweden                     3
# # ℹ 20 more rows
# # ℹ Use `print(n = ...)` to see more rows

As you can see, we have created a grouped tibble with two columns: The first column shows the different countries and the second column shows the corresponding count. For instance, there are 40 participants from the United States and 11 participants from the United Kingdom.

Based on this output, we can also see the number of rows of this tibble. This tells us that our group of participants contains 30 different countries.

Great – that’s very international!

 

Draw Barplot of Country Counts

Now that we know the country counts of the course participants, I would also like to visualize these results using the ggplot2 package.

In the plot below, I specify that I want to draw an ordered barplot with vertical x-axis labels, an x-axis title called “Country”, a y-axis title called “Count”, a main title “dplyr Course Participants by Country”, and a little text message inside the plot.

Let’s do this:

my_ggp <- my_tib_grouped %>%                         # Create ggplot2 plot
  ggplot(aes(x = reorder(country, - country_count),
             y = country_count)) +
  geom_col() +                                       # Specify to draw a barplot
  theme(axis.text.x = element_text(angle = 90,       # Vertical x-axis labels
                                   hjust = 1,
                                   vjust = 0.3)) +
  xlab("Country") +                                  # Change x-axis label
  ylab("Count") +                                    # Change y-axis label
  ggtitle("dplyr Course Participants by Country") +  # Change main title
  annotate("text",                                   # Add text element to plot
           x = 15,
           y = 25,
           label = "Thank You !!",
           size = 15,
           color = "red")
my_ggp                                               # Draw ggplot2 plot

 

ggplot2 plot

 

The graphic above visualizes the country counts of our participants in an ordered barplot. Looks great!

 

Do Everything in One Line of Code

The dplyr pipe’s elegance lies in its ability to handle nearly all tasks within a single line of code. Check this out:

tibble(country = c("United Kingdom",                 # Create tibble with country data
                   "United Kingdom",
                   "Australia",
                   "United States",
                   "United States",
                   "United Kingdom",
                   "Netherlands",
                   "Austria",
                   "United States",
                   "United States",
                   "Ireland",
                   "United States",
                   "United States",
                   "United States",
                   "Japan",
                   "United States",
                   "United States",
                   "Bangladesh",
                   "Congo",
                   "Spain",
                   "Spain",
                   "Netherlands",
                   "United States",
                   "United States",
                   "United States",
                   "Chile",
                   "United States",
                   "Canada",
                   "United States",
                   "Spain",
                   "United Kingdom",
                   "Ireland",
                   "United Kingdom",
                   "Mexico",
                   "Namibia",
                   "United States",
                   "India",
                   "United States",
                   "Romania",
                   "Mexico",
                   "Canada",
                   "Tanzania",
                   "Netherlands",
                   "Portugal",
                   "Germany",
                   "United Kingdom",
                   "United States",
                   "United States",
                   "Australia",
                   "Sweden",
                   "Japan",
                   "Canada",
                   "United States",
                   "Italy",
                   "France",
                   "Germany",
                   "Germany",
                   "Sweden",
                   "Mexico",
                   "New Zealand",
                   "Mexico",
                   "South Korea",
                   "United States",
                   "United States",
                   "United States",
                   "South Africa",
                   "United States",
                   "United States",
                   "Australia",
                   "United States",
                   "United States",
                   "United States",
                   "Iceland",
                   "United States",
                   "United States",
                   "United Kingdom",
                   "United Kingdom",
                   "Ireland",
                   "Germany",
                   "United States",
                   "United States",
                   "Singapore",
                   "United Kingdom",
                   "United States",
                   "Mexico",
                   "United Kingdom",
                   "Norway",
                   "Brazil",
                   "United States",
                   "United States",
                   "Canada",
                   "Netherlands",
                   "Canada",
                   "Sweden",
                   "United States",
                   "United States",
                   "United Kingdom",
                   "Germany",
                   "United States",
                   "United States",
                   "United States",
                   "United States",
                   "Trinidad and Tobago")) %>%       
  group_by(country) %>%                              # Group tibble
  summarize(country_count = n()) %>%                 # Calculate country count
  ggplot(aes(x = reorder(country, - country_count),
             y = country_count)) +
  geom_col() +                                       # Specify to draw a barplot
  theme(axis.text.x = element_text(angle = 90,       # Vertical x-axis labels
                                   hjust = 1,
                                   vjust = 0.3)) +
  xlab("Country") +                                  # Change x-axis label
  ylab("Count") +                                    # Change y-axis label
  ggtitle("dplyr Course Participants by Country") +  # Change main title
  annotate("text",                                   # Add text element to plot
           x = 15,
           y = 25,
           label = "Thank You !!",
           size = 15,
           color = "red")

Great, isn’t it? 🙂

 

Video & Further Resources

Do you want to learn more about the handling of country data using the dplyr & ggplot2 packages? Then I can recommend taking a look at the following video on my YouTube channel. In the video, I explain the contents of this post in more detail.

 

 

In addition, you could read the related articles on this website. A selection of related tutorials can be found below.

 

At this point, you should have learned how to analyze and visualize country data using the packages of the tidyverse in the R programming language. Don’t hesitate to let me know in the comments, if you have further questions or comments.

Furthermore, make sure to visit the course description page and join the waiting list, in case you’d like to take part in such a course in the future as well.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top