R for UX Researchers Series: Article #10
Tutorial: Prioritizing Features with MaxDiff Analysis
Summary: In this tutorial, you'll learn how to analyze MaxDiff survey data to prioritize software features. We'll walk through setting up your R environment, importing data, and generating a great data viz for feature comparison.
About This Tutorial
This article is the first of my reader-requested add-on topics to this series. Articles 1-9 are meant to be read chronologically, but this one is a one-off. Let's get to it!
The Scenario
I used to work for Jungle Scout, the world's most popular analytics platform for Amazon sellers. The product leadership team wanted to prioritize feature development based on user preferences, but this was tricky because the app has so many features. One day, the VP of Product asked me:
"Which features should we focus on to deliver the most value to our users?"
My task was clear. I needed to gather user preferences on various features in comparison to one another, and a simple forced ranking exercise wouldn't be good enough. So, I decided to run a classic MaxDiff survey analysis. MaxDiff surveys are a powerful technique for understanding relative preferences among a set of items. Unlike traditional surveys where users might rate features independently, MaxDiff forces respondents to make trade-offs, providing more granular insights into what they value most.
I used R and RStudio to perform this analysis because it can be done wayyyyyyy faster than in Excel or Google Sheets. By the end of this tutorial, you'll have a clear visual representation of user preferences that you can share with the product team to guide their decisions.
Why Not Just Use Excel or Google Sheets?
When it comes to analyzing MaxDiff data, Excel and Google Sheets just don't cut it. Here's why R is the way to go:
Advanced Analysis Capabilities: R's specialized packages are designed for analysis like this. Excel or Google Sheets can't do this without a ton of workarounds.
Handling Complex Data: R manages and manipulates large datasets a lot better than Excel and Google Sheets.
Reproducibility and Automation: R scripts automate the entire process, keeping projects consistent and saving time when updating reports or applying them to different datasets, which is important for studies that gather data over time.
Advanced Visualization: R can produce complex, with packages like ggplot2, which Excel or Google Sheets cannot match.
Prerequisites
Before starting the tutorial, ensure you have completed the basic setup for R and RStudio as described in the "Getting Started with R & RStudio Tutorial." Additionally, if you are using Windows, you will need to install Rtools.
Step 1: Download the Dataset
First, download this MaxDiff dataset from a Dropbox link.
✏️ NOTE: I couldn't find a good example dataset online, so I made my own.
🚨 Disclaimer: This is not real data. I simulated it from a real project I did at Jungle Scout, but the features and scores were made up specifically for this tutorial.
Click this Dropbox link to the dataset.
Click the Download button in the top right of the page to get the file named MaxDiff_Data.csv.
Save the downloaded file on your computer in a location you can easily access.
✏️ NOTE: Make a note of the full path to the MaxDiff_Data.csv file on your computer. You'll need that path to import the datasets in step 5 below.
Step 2: Start a New RStudio Project
Open RStudio.
Go to File > New Project > New Directory > New Project.
Name your project (e.g., "MaxDiff_Analysis") and choose a location.
Click Create Project.
Set Up Your Project Structure: Within your new project directory folder on your computer, create a new folder and name it data.
Move the downloaded
MaxDiff_Data.csvfile into the new data folder you just created. This location is referred to as the file’s relative path.

Step 3: Install Necessary Packages
Install the required packages by copying and pasting the code snippet below into the RStudio Console:
install.packages("tidyverse")
install.packages("corrplot")
install.packages("ggplot2")Step 4: Load the Libraries
Next, load the necessary libraries in your R script:
library(tidyverse)
library(corrplot)
library(ggplot2)✏️ NOTE: Disregard the Conflicts section shown in the Console.
Step 5: Import the Dataset
Use a relative path to import the dataset.
# Load the dataset
file_path <- "path/to/your/dataset/data/MaxDiff_Data.csv"
data <- read_csv(file_path)
# Explore the dataset
str(data)
head(data)
✏️ NOTES:
Replace "path/to/your/dataset/" with the correct path to the CSV file if you have placed it in a different directory.
Viewing the first few rows of the dataset and checking column names using
head(data)andcolnames(data)is good for verifying the dataset has been imported correctly. This helps ensure that all necessary columns are present and allows you to get an overview of the data before proceeding.
Step 6: Summary Data
In this step, we'll summarize our data to understand its structure and contents before analyzing it.
# Summary of the data
summary(data)Here's a quick overview of the dataset:
Number of Rows: 72
Number of Columns: 14
Key Variables:
win: Indicator for the "Best" choice
The different features being evaluated (Product Tracker, Keyword Scout, Opportunity Finder, etc.)
Block: Groups of questions presented together
Set: The specific group of features presented in each block
✏️ NOTE: Survey data, especially from a MaxDiff survey, doesn't typically benefit from a lot of exploratory data analysis (EDA) visuals like other types of data might. The nature of MaxDiff surveys is to force respondents to make trade-offs, and the main focus is on understanding these preferences directly. That means we won't be using traditional EDA plots in this tutorial. Instead, we'll move straight into analyzing and visualizing the MaxDiff scores for our insights.
Step 7: Conduct the MaxDiff Analysis
Now, it's time to conduct the MaxDiff analysis. We'll manually prepare the data and use simple counts for the analysis so we don't have to use any special MaxDiff packages.
# Prepare data for BWS analysis
bws_data <- data %>%
gather(key = "Item", value = "Value", -win, -Block, -Set) %>%
group_by(Item) %>%
summarise(Best = sum(win == 1 & Value == 1),
Worst = sum(win == 1 & Value == -1)) %>%
mutate(NetScore = Best - Worst)
# View the calculated net scores
print(bws_data)
✏️ NOTES:
First, I prepared the data by reshaping it to calculate the counts of "Best" and "Worst" choices for each feature.
Then, I calculated the difference between the number of times a feature was chosen as the "Best" and the number of times it was chosen as the "Worst." This is called the net score.
Step 8: Visualize the Results
The moment you've all been waiting for, let's visualize the results of our MaxDiff analysis! We'll use ggplot2 to create a bar plot of the net scores.
# Plot the MaxDiff scores
bws_data %>%
ggplot(aes(x = reorder(Item, NetScore), y = NetScore)) +
geom_bar(stat = "identity", aes(fill = NetScore > 0)) +
coord_flip() +
geom_point(aes(y = 0), shape = 21, size = 3, fill = "black") +
labs(title = "MaxDiff Scores for Jungle Scout Features", x = "Feature", y = "Net Score") +
theme_minimal() +
scale_fill_manual(values = c("TRUE" = "palegreen3", "FALSE" = "salmon2")) +
theme(legend.position = "none")As you can see, this visualization is super stakeholder-friendly. It will help them quickly identify which features are most valued by users and should be prioritized in development.
Interpreting the Results
The MaxDiff analysis provides a clear picture of which features are most and least valued by users. The positive scores indicate features that users prefer the most, while negative scores show features that are less preferred. This is the exact kind of information and level of detail the VP of Product was looking for to make decisions around the feature roadmap.
Specific Insights from the Dataset
Here's what the analysis reveals about our Jungle Scout features:
Alerts: With the highest net score of +4, Alerts is the most valued feature among users.
Listing Builder: This feature also scores high with a net score of +3, indicating strong user preference.
Jungle Scout Academy: Another highly valued feature with a net score of +3.
Inventory Manager: With a net score of +2, this feature is also important to users.
Sales Analytics: Scores a net score of +1, showing moderate user preference.
Keyword Scout: This feature has a net score of -3, indicating it is less favored by users.
Promotions: The least preferred feature with a net score of -4.
✏️ NOTE: Features like Opportunity Finder, Product Tracker, Rank Tracker, and Review Automation have neutral or slightly negative scores, indicating they are neither strongly favored nor disfavored by users.
Data-driven Recommendations for the Product Team
Based on these insights, here are some recommendations for the product team:
Focus on High-Value Features: Prioritize working on the Alerts, Listing Builder and Jungle Scout Academy features.
Re-evaluate Lower-Value Features: Consider conducting further foundational research for features like Keyword Scout and Promotions because of their low net scores. Understanding why these features are less favored can help in making necessary adjustments.
Maintain Moderate-Value Features: Features like Inventory Manager and Sales Analytics should be maintained and potentially enhanced based on user feedback, as they show moderate user preference.
Monitor Neutral Features: Keep an eye on features with neutral scores such as Opportunity Finder and Rank Tracker. These features might not need immediate changes but should be monitored for any shifts in user preference over time.
By following these recommendations, the product team can ensure that development efforts are aligned with user preferences, leading to a more user-centric product roadmap.
Conclusion
Imagine a world where teams no longer endlessly debate which features should be prioritized in guiding their day-to-day work. By interpreting MaxDiff scores, we can help our organizations better understand user preferences and make better decisions about feature prioritization and development. This data-driven approach ensures that product roadmaps are user-centered, leading to more effective and satisfying products.
Feedback
I hope you found this tutorial helpful. If you have any questions or feedback, please feel free to reach out. Thanks all!








Hi, thanks for sharing this excellent example and tutorial!
I have a question regarding the raw data structure. I am a little confused with the value (1, 0) under the "Win" column. I understand the value (i.e., -1, 0, 1) under each attribute, which indicates if the attribute was selected as Best, Worest, or Not selected in the survey. Can you elaborate a little further of how the column of "win" differs from the values under each attribute? Thank you.
Hello Trevor! Thank you for this, very useful! I would like to ask questions about the data. I haven't done MaxDiff survey before so I am a confused about the dataset.
(1). Is each row for different participant? I mean what does each row represent? If each row represent a participant, it means the data is for 8 participants?
(2). I also don't understand the set and win columns? I mean if the win column is 0, how come the set column is best? I am using row 2 as example here?
Could you please help me to understand this data?