How to Draw a plotly Scatterplot in Python (Example)
This tutorial shows several examples on how to draw scatterplots in plotly using the Python programming language.
Note: This article was created in collaboration with Kirby White. Kirby is a Statistics Globe author, innovation consultant, data science instructor. His Ph.D. is in Industrial-Organizational Psychology. You can read more about Kirby here!
Overview
Scatterplots (also called xy-plots) are one of the most fundamental types of graphics. They are used to finding relationships between two numeric variables.
Each pair of values is shown as a dot. Usually, dots that are shown higher and to the right have larger values than dots that are displayed lower or to the left.
Modules and Example Data
Please install and load the following modules, in case you haven’t done so yet:
from vega_datasets import data import plotly.express as px
We’ll use the iris
dataset for this example, which is included with the vega datasets. We’ll store this data frame in an object called df
.
df = data.iris() df # sepalLength sepalWidth petalLength petalWidth species #0 5.1 3.5 1.4 0.2 setosa #1 4.9 3.0 1.4 0.2 setosa #2 4.7 3.2 1.3 0.2 setosa #3 4.6 3.1 1.5 0.2 setosa
Basic Scatterplot
Let’s look at the relationship between lengths and widths of iris petals. A simple plotly chart can be created using the following code:
fig1 = px.scatter( data_frame = df ,x = 'petalLength' ,y = 'petalWidth' ) fig1.show()
The px.scatter()
function initializes the graphing library while data_frame = df
indicates the data frame where the data is stored. x = 'petalLength'
and y = 'petalWidth'
specifies that the length of each petal is plotted along the x (horizontal) axis, while the width of each petal is plotted on the y (vertical) axis.
Adding Color
If we want to encode the species of each data point in the colors of each dot, we can add that with a single argument in our function.
fig2 = px.scatter( data_frame = df ,x = 'petalLength' ,y = 'petalWidth' ,color = 'species' ) fig2.show()
Common Modifications
Marker Shape
When we want our graphics to be more accessible for those with atypical vision (e.g., colorblindness), we may want to change the shape of each marker in addition to changing the color. We can map symbol
to species:
fig3 = px.scatter( data_frame = df ,x = 'petalLength' ,y = 'petalWidth' ,color = 'species' ,symbol = 'species' ) fig3.show()
Marker Size
You can also emphasize certain aspects of your data by mapping the size of each dot to a variable (whether or not it’s your x/y variables). You can specify this by passing a column name into the size
argument:
fig4 = px.scatter( data_frame = df ,x = 'petalLength' ,y = 'petalWidth' ,color = 'species' ,symbol = 'species' ,size = 'petalWidth' ) fig4.show()
Marker Transparency
This can sometimes make your scatterplot overcrowded, though. A common strategy for dealing with this is to adjust the transparency of each dot. This lets areas with more points appear darker than areas with few dots. You can change the opacity
argument from 0 (completely transparent) to 1 (completely opaque).
fig5 = px.scatter( data_frame = df ,x = 'petalLength' ,y = 'petalWidth' ,color = 'species' ,symbol = 'species' ,size = 'petalWidth' ,opacity = .5 ) fig5.show()
Other Possibilities
Group Trendline
Spotting the relationship between your x and y variables is sometimes easier when you plot a trend line in the scatterplot. Plotting the regression slope with ordinary least squares (OLS) is very easy to do:
fig6 = px.scatter( data_frame = df ,x = 'petalLength' ,y = 'petalWidth' ,color = 'species' ,symbol = 'species' ,size = 'petalWidth' ,trendline = 'ols' ) fig6.show()
Overall Trendline
Plotly defaults to creating a separate trendline within each group, but you can ask for a single overall trendline by setting trendline_scope = 'overall'
and setting its color with the trendline_color_override
argument.
fig7 = px.scatter( data_frame = df ,x = 'petalLength' ,y = 'petalWidth' ,color = 'species' ,symbol = 'species' ,size = 'petalWidth' ,trendline = 'ols' ,trendline_scope = 'overall' ,trendline_color_override = 'black' ) fig7.show()
Marginal Distributions
Plotly makes it easy to add a simple distribution plot to the margins of the scatterplot, which can be very useful during the data exploration phase of an analytics project. You can specify whether you want one in the x or y axis (or both) and select from a variety of types (rug, box, violin, and histogram). Here is an example of a marginal plot on each axis:
fig8 = px.scatter( data_frame = df ,x = 'petalLength' ,y = 'petalWidth' ,color = 'species' ,symbol = 'species' ,size = 'petalWidth' ,marginal_x = 'histogram' ,marginal_y = 'rug' ) fig8.show()
Further Resources
You can check out these other tutorials for more details and examples on the plotly library:
- plotly Barplot in Python
- plotly Boxplot in Python
- plotly Histogram in Python
- plotly Line Plot in Python
- Introduction to plotly in Python
- Introduction to Python
Statistics Globe Newsletter