How to Draw a plotly Scatterplot in Python (Example)

 

This tutorial shows several examples on how to draw scatterplots in plotly using the Python programming language.

 

 

Kirby White Researcher Statistician Programmer

Note: This article was created in collaboration with Kirby White. Kirby is a Statistics Globe author, innovation consultant, data science instructor. His Ph.D. is in Industrial-Organizational Psychology. You can read more about Kirby here!

 

Overview

Scatterplots (also called xy-plots) are one of the most fundamental types of graphics. They are used to finding relationships between two numeric variables.

Each pair of values is shown as a dot. Usually, dots that are shown higher and to the right have larger values than dots that are displayed lower or to the left.

 

Modules and Example Data

Please install and load the following modules, in case you haven’t done so yet:

from vega_datasets import data
import plotly.express as px

We’ll use the iris dataset for this example, which is included with the vega datasets. We’ll store this data frame in an object called df.

df = data.iris()
df
 
# sepalLength	sepalWidth	petalLength	petalWidth	species
#0	5.1	3.5	1.4	0.2	setosa
#1	4.9	3.0	1.4	0.2	setosa
#2	4.7	3.2	1.3	0.2	setosa
#3	4.6	3.1	1.5	0.2	setosa

 

Basic Scatterplot

Let’s look at the relationship between lengths and widths of iris petals. A simple plotly chart can be created using the following code:

fig1 = px.scatter(
    data_frame = df
    ,x = 'petalLength'
    ,y = 'petalWidth'
)
 
fig1.show()

The px.scatter() function initializes the graphing library while data_frame = df indicates the data frame where the data is stored. x = 'petalLength' and y = 'petalWidth' specifies that the length of each petal is plotted along the x (horizontal) axis, while the width of each petal is plotted on the y (vertical) axis.

Adding Color

If we want to encode the species of each data point in the colors of each dot, we can add that with a single argument in our function.

fig2 = px.scatter(
    data_frame = df
    ,x = 'petalLength'
    ,y = 'petalWidth'
    ,color = 'species'
)
 
fig2.show()

 

Common Modifications

Marker Shape

When we want our graphics to be more accessible for those with atypical vision (e.g., colorblindness), we may want to change the shape of each marker in addition to changing the color. We can map symbol to species:

fig3 = px.scatter(
    data_frame = df
    ,x = 'petalLength'
    ,y = 'petalWidth'
    ,color = 'species'
    ,symbol = 'species'
)
 
fig3.show()

Marker Size

You can also emphasize certain aspects of your data by mapping the size of each dot to a variable (whether or not it’s your x/y variables). You can specify this by passing a column name into the size argument:

fig4 = px.scatter(
    data_frame = df
    ,x = 'petalLength'
    ,y = 'petalWidth'
    ,color = 'species'
    ,symbol = 'species'
    ,size = 'petalWidth'
)
 
fig4.show()

Marker Transparency

This can sometimes make your scatterplot overcrowded, though. A common strategy for dealing with this is to adjust the transparency of each dot. This lets areas with more points appear darker than areas with few dots. You can change the opacity argument from 0 (completely transparent) to 1 (completely opaque).

fig5 = px.scatter(
    data_frame = df
    ,x = 'petalLength'
    ,y = 'petalWidth'
    ,color = 'species'
    ,symbol = 'species'
    ,size = 'petalWidth'
    ,opacity = .5
)
 
fig5.show()

 

Other Possibilities

Group Trendline

Spotting the relationship between your x and y variables is sometimes easier when you plot a trend line in the scatterplot. Plotting the regression slope with ordinary least squares (OLS) is very easy to do:

fig6 = px.scatter(
    data_frame = df
    ,x = 'petalLength'
    ,y = 'petalWidth'
    ,color = 'species'
    ,symbol = 'species'
    ,size = 'petalWidth'
    ,trendline = 'ols'
)
 
fig6.show()

Overall Trendline

Plotly defaults to creating a separate trendline within each group, but you can ask for a single overall trendline by setting trendline_scope = 'overall' and setting its color with the trendline_color_override argument.

fig7 = px.scatter(
    data_frame = df
    ,x = 'petalLength'
    ,y = 'petalWidth'
    ,color = 'species'
    ,symbol = 'species'
    ,size = 'petalWidth'
    ,trendline = 'ols'
    ,trendline_scope = 'overall'
    ,trendline_color_override = 'black'
)
 
fig7.show()

Marginal Distributions

Plotly makes it easy to add a simple distribution plot to the margins of the scatterplot, which can be very useful during the data exploration phase of an analytics project. You can specify whether you want one in the x or y axis (or both) and select from a variety of types (rug, box, violin, and histogram). Here is an example of a marginal plot on each axis:

fig8 = px.scatter(
    data_frame = df
    ,x = 'petalLength'
    ,y = 'petalWidth'
    ,color = 'species'
    ,symbol = 'species'
    ,size = 'petalWidth'
    ,marginal_x = 'histogram'
    ,marginal_y = 'rug'
)
 
fig8.show()

 

Further Resources

 

You can check out these other tutorials for more details and examples on the plotly library:

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top