Get Frequency of Words in Character String in R (Example)
In this article you’ll learn how to get the word frequencies in a character string in the R programming language.
The article looks as follows:
So let’s get started!
Creation of Example Data
First of all, we need to create some example data:
x <- "hello yay hello what is going on going on on" # Example string x # Print string # [1] "hello yay hello what is going on going on on"
The previous RStudio console output shows the structure of our example data – We have created a character string containing multiple words separated by a space.
Example: Create Frequency Table of Words Using strsplit, unlist, table & sort
This section demonstrates how to count the words in a character string – a very common method in text mining and text analysis.
For this task, we can use a combination of several Base R functions: strsplit, unlist, table, and sort.
Check out the following R code:
freq_x <- sort(table(unlist(strsplit(x, " "))), # Create frequency table decreasing = TRUE) freq_x # Print word frequency # on going hello is what yay # 3 2 2 1 1 1
As you can see, we have created a table showing the counts of each of the words in our character string. The word “on” occurs the most often (i.e. 3 times), and the words “is”, “what”, and “yay” appear only once.
Video, Further Resources & Summary
Have a look at the following video tutorial on my YouTube channel. In the video, I’m explaining the examples of this post.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you may want to read the other posts on my homepage. I have published several articles already.
- Count Number of Words in Character String
- Extract Numbers from Character String Vector in R
- Remove Newline from Character String in R
- All R Programming Tutorials
In summary: In this R tutorial you have learned how to list the word frequencies in a text element. Don’t hesitate to let me know in the comments section, in case you have any additional questions.
Statistics Globe Newsletter