How to Draw Histograms with plotly in Python (Example)
The table of contents is shown below:
Note: This article was created in collaboration with Kirby White. Kirby is a Statistics Globe author, innovation consultant, data science instructor. His Ph.D. is in Industrial-Organizational Psychology. You can read more about Kirby here!
Modules and Example Data
As a first step, please install and load the following packages:
from vega_datasets import data import plotly.express as px
We’ll use the
stocks dataset for this example, which is included with the vega datasets. It contains the daily stock prices for several companies over a few months of time. We’ll store this in a data frame called
df = data.stocks() df # symbol date price # MSFT 2000-01-01 39.81 # MSFT 2000-02-01 36.35 # MSFT 2000-03-01 43.22 # etc...
Let’s create a simple histogram to look at the distribution of all the stock prices in this dataset:
fig1 = px.histogram(data_frame = df, x = "price") fig1
This histogram makes it easy to see that the most common stock prices were between $0-$50, as indicated by the largest bar on the left. We can also see that there is a cluster of days with stock prices between $300-$600, and that no stock prices exceed $750.
Plotting Multiple Groups
To see how the stock prices vary by company, we can create multiple histograms and overlay them on top of each other. We set
color = 'symbol' to plot each company as a different color and
barmode = 'overlay':
fig2 = px.histogram(data_frame = df, x = 'price', color = 'symbol', barmode = 'overlay') fig2.show()
This is not always ideal, as it can still be difficult to distinguish between groups. An alternative method is to create a seperate histogram for each company, sometimes called facets or a ‘small multiples’ plot:
fig3 = px.histogram(data_frame = df, x = 'price', color = 'symbol', facet_col= 'symbol') fig3.show()
This makes it much more clear that the price range of the MSFT stocks are typically lower than the others, and that GOOG tends to have the highests prices.
Bin Sizes and Alternative Graphs
Until now, we have used the default settings for the width of the bars in our histograms. However, you may change the number or size of bins according to your own needs.
In this example, I’ll show how to set the size of each bin. This lets the algorithm determine how many bins to draw. We’ll recreate our first graph but limit the number of bins to 10:
fig4 = px.histogram(data_frame = df, x = "price", nbins=10) fig4.show()
A common alternative for visualizing the distribution of numeric values is to use a violin plot. These kinds of plots are basically just sideways histograms. This can be a simple way to display multiple distributions alongside each other. Consider the example graphic below:
fig5 = px.violin(data_frame = df, x = 'symbol', color = 'symbol', y = 'price',) fig5.show()
This is another way to clearly show that most MSFT prices are in a relatively small range while the GOOG prices are spread across a wide range.
Finally, another distribution visualization is the boxplot. Boxplots emphasize the differences between quartile ranges, and would look like this:
fig6 = px.box(data_frame = df, x = 'symbol', color = 'symbol', y = 'price') fig6.show()
You can check out these other articles for more detailed examples and videos of these popular charts in plotly:
- plotly Barplot in Python
- plotly Boxplot in Python
- plotly Line Plot in Python
- plotly Scatterplot in Python
- Introduction to the plotly Package in Python
- Introduction to the Python Programming Language