Scope of Work

Data Analyst:

Shuang Li

Client:

Bellabeat

Purpose:

The goal of this project is to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. The project will discover how these trends apply to Bellabeat customers and how these trends help influence Bellabeat marketing strategy.

Scope / Major Project Activities:

Activity Description
Collect data Collect public Fitbit fitness tracker data from Kaggle
Identify trends Identify trends in smart device usage
Visualize findings visualize key findings in trends
Create marketing recommendations Create marketing strategy recommendations based on these trends
Deliver final report Deliver final report and recommendations to key stakeholders

Deliverables:

  • A clear summary of the business task
  • A description of all data sources used
  • Documentation of any cleaning or manipulation of data
  • A summary of the analysis
  • Supporting visualizations and key findings
  • Top high-level content recommendations based on the analysis

This project does not include:

  • Collecting and/or analyzing user demographics data;
  • Collecting and/or analyzing workout type and session duration data;
  • Projecting seasonal changes;
  • Implementing any solutions or recommendations.

Schedule Overview / Major Milestones:

Milestone Expected Completion Date Description/Details
Data Review 2025-03-25 Review of all data
Data Cleaning and Analysis 2025-03-27 Initial data analysis completed
Identify Trends 2025-03-28 Top trends identified
Create Visualization 2025-03-29 Visualization created
Make Tailored Recommendation 2025-03-30 List of marketing strategy recommendations
Final Report 2025-03-31 Final report detailing all work

Estimated date for completion:

2025-03-31


Prepare and Understand the Dataset

  1. Download the FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius).

  2. Import the dataset into RStudio and rename identical dataset names by adding the month in which the data was collected.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
march_dailyActivity_merged <- read.csv("mturkfitbit_export_march/dailyActivity_merged.csv")
march_heartrate_seconds_merged <- read.csv("mturkfitbit_export_march/heartrate_seconds_merged.csv")
march_minuteIntensitiesNarrow_merged <- read.csv("mturkfitbit_export_march/minuteIntensitiesNarrow_merged.csv")
march_minuteSleep_merged <- read.csv("mturkfitbit_export_march/minuteSleep_merged.csv")
march_weightLogInfo_merged <- read.csv("mturkfitbit_export_march/weightLogInfo_merged.csv")
april_dailyActivity_merged <- read.csv("mturkfitbit_export_april/dailyActivity_merged.csv")
april_heartrate_seconds_merged <- read.csv("mturkfitbit_export_april/heartrate_seconds_merged.csv")
april_minuteIntensitiesNarrow_merged <- read.csv("mturkfitbit_export_april/minuteIntensitiesNarrow_merged.csv")
april_minuteSleep_merged <- read.csv("mturkfitbit_export_april/minuteSleep_merged.csv")
april_weightLogInfo_merged <- read.csv("mturkfitbit_export_april/weightLogInfo_merged.csv")
  1. Review the metrics of the data collected, identify its components and limitations in order to determine the approaches to answering the business questions.
  1. Determine the credibility of the data.

Process and Analyze the Data

➤ Find frequency of device usage

  1. Identify the master sheet from each month and ‘count’ the number of days each user used the fitness tracker per month.
march_activity_record <- march_dailyActivity_merged %>%
group_by(Id) %>%
summarise(non_null_count = sum(!is.na(ActivityDate)))
glimpse(march_activity_record)
## Rows: 35
## Columns: 2
## $ Id             <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1927972…
## $ non_null_count <int> 19, 19, 10, 12, 12, 12, 12, 12, 15, 12, 8, 10, 12, 32, …
april_activity_record <- april_dailyActivity_merged %>%
group_by(Id) %>%
summarise(non_null_count = sum(!is.na(ActivityDate)))
glimpse(april_activity_record)
## Rows: 33
## Columns: 2
## $ Id             <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1927972…
## $ non_null_count <int> 31, 31, 30, 31, 31, 31, 31, 31, 18, 31, 20, 30, 31, 4, …
  1. Crosscheck the user ID in both tables to verify if the sample group from both month are the same.
matching_count <- march_activity_record %>%
inner_join(april_activity_record, by = "Id") %>%
nrow()
print(matching_count)
## [1] 33
  1. Inner join the two tables to get a dataset of frequency of device usage for future analysis.
joined_activity_days <- march_activity_record %>% inner_join(april_activity_record, by = "Id")
glimpse(joined_activity_days)
## Rows: 33
## Columns: 3
## $ Id               <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 19279…
## $ non_null_count.x <int> 19, 19, 10, 12, 12, 12, 12, 12, 15, 12, 10, 12, 32, 3…
## $ non_null_count.y <int> 31, 31, 30, 31, 31, 31, 31, 31, 18, 31, 20, 30, 31, 4…
  1. Find average usage per month.
average_usage <- joined_activity_days %>% rowwise() %>% mutate(usage_per_month = mean(c(non_null_count.x,non_null_count.y)))
glimpse(average_usage)
## Rows: 33
## Columns: 4
## Rowwise: 
## $ Id               <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 19279…
## $ non_null_count.x <int> 19, 19, 10, 12, 12, 12, 12, 12, 15, 12, 10, 12, 32, 3…
## $ non_null_count.y <int> 31, 31, 30, 31, 31, 31, 31, 31, 18, 31, 20, 30, 31, 4…
## $ usage_per_month  <dbl> 25.0, 25.0, 20.0, 21.5, 21.5, 21.5, 21.5, 21.5, 16.5,…

➤ Identify core user groups by fitness level

  1. In the same two master sheets, group users by ID and add FairlyaActiveMinutes and 2x VeryActiveMinutes to find the totoal active minutes per user per month.
march_active_minutes <- march_dailyActivity_merged %>% group_by(Id) %>% summarise(total_active_minutes = sum(very_active_minutes_2 = (2*VeryActiveMinutes), FairlyActiveMinutes))
glimpse(march_active_minutes)
## Rows: 35
## Columns: 2
## $ Id                   <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1…
## $ total_active_minutes <dbl> 1663, 39, 731, 27, 20, 1232, 0, 35, 701, 194, 660…
april_active_minutes <- april_dailyActivity_merged %>% group_by(Id) %>% summarise(total_active_minutes = sum(very_active_minutes_2 = (2*VeryActiveMinutes), FairlyActiveMinutes))
glimpse(april_active_minutes)
## Rows: 33
## Columns: 2
## $ Id                   <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1…
## $ total_active_minutes <dbl> 2994, 718, 1215, 48, 106, 2850, 14, 164, 856, 106…
  1. Find out the weekly average active minutes per user.
weekly_active_minutes <- march_active_minutes %>% inner_join(april_active_minutes, by = "Id") %>% mutate(weekly_active_minutes = as.integer(round(total_active_minutes.x + total_active_minutes.y)/62*7))
glimpse(weekly_active_minutes)
## Rows: 33
## Columns: 4
## $ Id                     <dbl> 1503960366, 1624580081, 1644430081, 1844505072,…
## $ total_active_minutes.x <dbl> 1663, 39, 731, 27, 20, 1232, 0, 35, 701, 194, 2…
## $ total_active_minutes.y <dbl> 2994, 718, 1215, 48, 106, 2850, 14, 164, 856, 1…
## $ weekly_active_minutes  <int> 525, 85, 219, 8, 14, 460, 1, 22, 175, 142, 81, …
  1. Categorize user fitness level into Beginner, Intermediate, and Advanced
weekly_active_minutes <- weekly_active_minutes %>% mutate(fitness_level = case_when(
      weekly_active_minutes < 150 ~ "Beginner",
      weekly_active_minutes >= 150 & weekly_active_minutes < 300 ~ "Intermediate",
      weekly_active_minutes >= 300 ~ "Advanced"
    ))
glimpse(weekly_active_minutes)
## Rows: 33
## Columns: 5
## $ Id                     <dbl> 1503960366, 1624580081, 1644430081, 1844505072,…
## $ total_active_minutes.x <dbl> 1663, 39, 731, 27, 20, 1232, 0, 35, 701, 194, 2…
## $ total_active_minutes.y <dbl> 2994, 718, 1215, 48, 106, 2850, 14, 164, 856, 1…
## $ weekly_active_minutes  <int> 525, 85, 219, 8, 14, 460, 1, 22, 175, 142, 81, …
## $ fitness_level          <chr> "Advanced", "Beginner", "Intermediate", "Beginn…
  1. Count number of users in each category
user_fitness_levels <- weekly_active_minutes %>% count(fitness_level)
glimpse(user_fitness_levels)
## Rows: 3
## Columns: 2
## $ fitness_level <chr> "Advanced", "Beginner", "Intermediate"
## $ n             <int> 10, 17, 6

➤ Key features usage

  1. Review data collected from different features and make note that core features like Distance, Steps, Intensity Minutes, and Calories are automatically being tracked. Identify other key features that are being frequently used are Heart Rate, Sleep, and Weight Log.

  2. Count number of users used each feature in each month.

n_distinct(march_heartrate_seconds_merged$Id)
## [1] 14
n_distinct(march_minuteSleep_merged$Id)
## [1] 23
n_distinct(march_weightLogInfo_merged$Id)
## [1] 11
n_distinct(april_heartrate_seconds_merged$Id)
## [1] 14
n_distinct(april_minuteSleep_merged$Id)
## [1] 24
n_distinct(april_weightLogInfo_merged$Id)
## [1] 8
feature_usage <- tibble(
key_feature = c("Sleep Tracking","Heart Rate Monitor","Weight Log"),
March = c(n_distinct(march_heartrate_seconds_merged$Id), n_distinct(march_minuteSleep_merged$Id), n_distinct(march_weightLogInfo_merged$Id)),
April = c(n_distinct(april_heartrate_seconds_merged$Id), n_distinct(april_minuteSleep_merged$Id), n_distinct(april_weightLogInfo_merged$Id))
) %>% 
  mutate(
    march_percentage = round(March/ 35 * 100, 1),
    april_percentage = round(April/ 33 * 100, 1)
      )
glimpse(feature_usage)
## Rows: 3
## Columns: 5
## $ key_feature      <chr> "Sleep Tracking", "Heart Rate Monitor", "Weight Log"
## $ March            <int> 14, 23, 11
## $ April            <int> 14, 24, 8
## $ march_percentage <dbl> 40.0, 65.7, 31.4
## $ april_percentage <dbl> 42.4, 72.7, 24.2

➤ Find time of day users are more active

  1. Figure out how the intensity is classified.
n_distinct(march_minuteIntensitiesNarrow_merged$Intensity)
## [1] 4
  1. Found out that the intensity is being rated from 0 to 3, in which 0 being sedentary and 3 being vigorous. Therefore, identify the time of day when 3 appeared.
march_high_intensity_time <- march_minuteIntensitiesNarrow_merged %>% filter(Intensity == 3)
glimpse(march_high_intensity_time)
## Rows: 19,098
## Columns: 3
## $ Id             <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "3/12/2016 10:59:00 AM", "3/12/2016 11:00:00 AM", "3/12…
## $ Intensity      <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
april_high_intensity_time <- april_minuteIntensitiesNarrow_merged %>% filter(Intensity == 3)
glimpse(april_high_intensity_time)
## Rows: 19,838
## Columns: 3
## $ Id             <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "4/12/2016 2:51:00 PM", "4/12/2016 2:52:00 PM", "4/12/2…
## $ Intensity      <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
  1. Split the datetime and only keep the hour.
march_high_intensity_time <- march_high_intensity_time %>% 
  mutate(
    datetime_parsed = mdy_hms(ActivityMinute),    # Parse character to POSIXct
    date_only = as.Date(datetime_parsed),         # Extract Date
    hour_only = hour(datetime_parsed)             # Extract Hour (24-hour format)
   ) %>%
select(-datetime_parsed)
glimpse(march_high_intensity_time)
## Rows: 19,098
## Columns: 5
## $ Id             <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "3/12/2016 10:59:00 AM", "3/12/2016 11:00:00 AM", "3/12…
## $ Intensity      <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ date_only      <date> 2016-03-12, 2016-03-12, 2016-03-12, 2016-03-12, 2016-0…
## $ hour_only      <int> 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12,…
april_high_intensity_time <- april_high_intensity_time %>% 
  mutate(
    datetime_parsed = mdy_hms(ActivityMinute),    # Parse character to POSIXct
    date_only = as.Date(datetime_parsed),         # Extract Date
    hour_only = hour(datetime_parsed)             # Extract Hour (24-hour format)
   ) %>%
select(-datetime_parsed)
glimpse(april_high_intensity_time)
## Rows: 19,838
## Columns: 5
## $ Id             <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ ActivityMinute <chr> "4/12/2016 2:51:00 PM", "4/12/2016 2:52:00 PM", "4/12/2…
## $ Intensity      <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ date_only      <date> 2016-04-12, 2016-04-12, 2016-04-12, 2016-04-12, 2016-0…
## $ hour_only      <int> 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15,…
  1. Remove duplicate records (same user, same date and hour)
march_high_intensity_time <- march_high_intensity_time %>% select(-ActivityMinute, -Intensity) %>% group_by(Id, date_only, hour_only) %>% distinct()
glimpse(march_high_intensity_time)
## Rows: 1,348
## Columns: 3
## Groups: Id, date_only, hour_only [1,348]
## $ Id        <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, …
## $ date_only <date> 2016-03-12, 2016-03-12, 2016-03-12, 2016-03-12, 2016-03-12,…
## $ hour_only <int> 10, 11, 12, 14, 15, 16, 10, 11, 18, 19, 23, 9, 23, 9, 12, 20…
april_high_intensity_time <- april_high_intensity_time %>% select(-ActivityMinute, -Intensity) %>% group_by(Id, date_only, hour_only) %>% distinct()
glimpse(april_high_intensity_time)
## Rows: 1,374
## Columns: 3
## Groups: Id, date_only, hour_only [1,374]
## $ Id        <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, …
## $ date_only <date> 2016-04-12, 2016-04-12, 2016-04-12, 2016-04-13, 2016-04-13,…
## $ hour_only <int> 14, 15, 20, 14, 17, 18, 23, 13, 20, 21, 17, 22, 23, 12, 13, …
  1. Count number of each hour appeared
march_time_count <- march_high_intensity_time %>% ungroup() %>% count(hour_only)
glimpse(march_time_count)
## Rows: 23
## Columns: 2
## $ hour_only <int> 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ n         <int> 7, 3, 1, 1, 21, 28, 50, 66, 82, 86, 94, 94, 89, 93, 55, 73, …
april_time_count <- april_high_intensity_time %>% ungroup() %>% count(hour_only)
glimpse(april_time_count)
## Rows: 22
## Columns: 2
## $ hour_only <int> 0, 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, …
## $ n         <int> 7, 4, 2, 31, 41, 50, 83, 70, 85, 78, 105, 94, 101, 55, 63, 1…
  1. Full join the two tables and add counts together
total_time_count <- march_time_count %>% full_join(april_time_count, by = "hour_only") %>% mutate(across(everything(), ~replace_na(., 0))) %>% rowwise() %>% mutate(total_count = sum(n.x, n.y))
glimpse(total_time_count)
## Rows: 24
## Columns: 4
## Rowwise: 
## $ hour_only   <int> 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ n.x         <int> 7, 3, 1, 1, 21, 28, 50, 66, 82, 86, 94, 94, 89, 93, 55, 73…
## $ n.y         <int> 7, 4, 0, 0, 31, 41, 50, 83, 70, 85, 78, 105, 94, 101, 55, …
## $ total_count <int> 14, 7, 1, 1, 52, 69, 100, 149, 152, 171, 172, 199, 183, 19…

Visualize and Share Findings

Trend 1: User Segmentation

🔍 Discover user fitness levels and identify core user groups.

The active minutes requirements for different fitness levels (Beginner, Intermediate, and Advanced) are based on guidelines from WHO, CDC, and ACSM (American College of Sports Medicine), as well as fitness tracker classifications.

# Add percentage and label
user_fitness_levels <- user_fitness_levels %>%
  mutate(
    Category = factor(fitness_level, levels = c("Advanced", "Intermediate", "Beginner")), 
    Percent = n / sum(n) * 100,
    Label = paste0(fitness_level, "\n", round(Percent, 1), "%")
  ) %>%
  arrange(Category)

# Custom color mapping
custom_colors <- c(
  "Beginner" = "#00A5E3",
  "Intermediate" = "#8DD7BF",
  "Advanced" = "#FF828B"
)

# Pie chart with proper color assignment
ggplot(user_fitness_levels, aes(x = "", y = n, fill = Category)) +
  geom_bar(stat = "identity", width = 1, color = "grey40") +  
  coord_polar(theta = "y") +
  geom_text(aes(label = Label), fontface = "bold", position = position_stack(vjust = 0.5)) +
  scale_fill_manual(values = custom_colors) +  
  labs(title = "User Fitness Levels", fill = "Fitness Level") +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16))

Trend 2: Device Usage Patterns

🔍 Discover frequency of device usage per week

# Categorize into usage buckets
average_usage <- average_usage %>%
  mutate(usage_category = case_when(
    usage_per_month >= 0  & usage_per_month < 5  ~ "0-1",
    usage_per_month >= 5  & usage_per_month < 9  ~ "1–2",
    usage_per_month >= 9  & usage_per_month < 13 ~ "2–3",
    usage_per_month >= 13 & usage_per_month < 17 ~ "3–4",
    usage_per_month >= 17 & usage_per_month < 21 ~ "4–5",
    usage_per_month >= 21 & usage_per_month < 27 ~ "5–6",
    usage_per_month >= 27 & usage_per_month < 32 ~ "6–7",
    TRUE ~ NA_character_  # fallback for unexpected values
  ))

# Factor for correct order in plotting
average_usage$usage_category <- factor(average_usage$usage_category, levels = c(
  "0-1",
  "1–2",
  "2–3",
  "3–4",
  "4–5",
  "5–6",
  "6–7"
))
usage_summary <- average_usage %>% 
  count(usage_category) %>%
  complete(usage_category, fill = list(n = 0)) %>% 
  mutate(percent = n / sum(n) * 100)

# Create Histogram-Style Bar Chart for Categorized Usage
ggplot(usage_summary, aes(x = usage_category, y = n)) +
  geom_col(fill = "#FF828B", color = "white", width = 0.5) +
  geom_text(aes(label = paste0(round(percent, 1), "%")), 
            vjust = -0.5) +
  labs(
    title = "Frequency of Device Usage",
    x = "Average Days Used per Week",
    y = "Number of Users"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", margin = margin(b = 15)),
    axis.text.x = element_text(hjust = 1, face = "bold",margin = margin(b = 15)),
    axis.text.y = element_text(hjust = 1, face = "bold", margin = margin(b = 15))
  )

Trend 3: Key Features Usage

🔍 Discover percentage of key features used

# Actual user totals per month (manually defined)
user_totals <- tibble(
  Month = c("March", "April"),
  total_users = c(35, 33)
)
# Reshape for plotting
feature_usage_long <- feature_usage %>%
  pivot_longer(cols = c(March, April), names_to = "Month", values_to = "count") %>%
  left_join(user_totals, by = "Month") %>%
  mutate(
    Month = factor(Month, levels = c("March", "April")),
    percent = count / total_users * 100,
    label = paste0(round(percent, 1), "%")
  )
feature_usage_long <- feature_usage_long %>%
  group_by(key_feature) %>%
  mutate(total = sum(count)) %>%
  ungroup() %>%
  mutate(key_feature = fct_reorder(key_feature, total))

# Custom color mapping
custom_colors <- c(
  "March" = "#FF828B",
  "April" = "#00A5E3"
)
bar_width <- 0.25  # make bars thinner for spacing
dodge_width <- 0.6  # slightly more than bar width

ggplot(feature_usage_long, aes(x = count, y = key_feature, fill = Month)) +
  geom_col(
    position = position_dodge2(width = dodge_width, reverse = TRUE),
    width = bar_width
  ) +
  geom_text(
    aes(label = label),
    position = position_dodge2(width = dodge_width, reverse = TRUE),
    hjust = -0.1,
  ) +
  labs(
    title = "Feature Usage by Month",
    x = "User Count",
    y = "Key Feature",
    fill = "Month"
  ) +
  scale_fill_manual(values = custom_colors) +  
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", margin = margin(b = 15)),
    axis.text.x = element_text(hjust = 1, face = "bold"),
    axis.text.y = element_text(hjust = 1, face = "bold")
  ) +
  xlim(0, max(feature_usage_long$count) * 1.3)

Trend 4: Device Engagement Patterns

🔍 Discover time of day users engage the most

# Define tiers as horizontal bands (ymin to ymax)
tier_bands <- tibble(
  tier = factor(c("Peak", "Great", "Good", "Average", "Low"),
                levels = c("Peak", "Great", "Good", "Average", "Low")),
  ymin = c(200, 150, 100, 50, 0),
  ymax = c(Inf, 199.9, 149.9, 99.9, 49.9)
)

# Assign colors to each tier
tier_colors <- c(
  "Peak"    = "#2C7FB8",  # Deep blue
  "Great"   = "#41B6C4",  # Teal
  "Good"    = "#7FCDBB",  # Soft mint
  "Average" = "#C7E9B4",  # Pale green
  "Low"     = "#EDF8B1"   # Very light green-yellow
)

peak_point <- total_time_count %>% ungroup() %>% filter(total_count == max(total_count))

# Plot
ggplot() +
  # Tier background bands
  geom_rect(
    data = tier_bands,
    aes(ymin = ymin, ymax = ymax, xmin = -Inf, xmax = Inf, fill = tier),
    alpha = 0.3
  ) +
  # Area + line on top
  geom_area(data = total_time_count, aes(x = hour_only, y = total_count), fill = "#FF828B", alpha = 0.4) +
  geom_line(data = total_time_count, aes(x = hour_only, y = total_count), color = "#FF828B", size = 1.2) +
  
  scale_x_continuous(breaks = 0:23, expand = c(0, 0)) +
  scale_fill_manual(values = tier_colors, name = "Rating") +
  labs(
    title = "User Engagement by Hour of Day",
    x = "Hour of Day",
    y = "Engagement Count"
  ) +
  geom_text(
  data = peak_point,
  mapping = aes(
    x = hour_only + 0.3,  # 👈 shifts label slightly to the right
    y = total_count,
    label = paste0("Peak: ", hour_only, ":00")
  ),
  hjust = 0,  # left-align the text
  vjust = 0.5,
  fontface = "bold",
  color = "#2C7FB8",
  size = 3,
  inherit.aes = FALSE
) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", margin = margin(b = 15)),
    legend.position = "right",
    axis.text.x = element_text(hjust = 1, margin = margin(b = 10)),
    axis.text.y = element_text(hjust = 1, margin = margin(b = 10))
  )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.


Marketing Strategy Recommendations for Bellabeat Ivy+

1. Segment by Fitness Level — With a Wellness Twist

Fitness Group Bellabeat Approach
Beginner (51.5%) Focus on gentle wellness, cycle-based movement, and habit building.
Intermediate (18.2%) Introduce guided workouts + cycle syncing and light goal tracking.
Advanced (30.3%) Promote stress tracking, recovery insights, and performance during high-energy cycle phases.

💡 Campaign Ideas:

  • “Your Wellness, Synced to Your Cycle” — educate beginners on aligning habits with menstrual phases.

  • “3 Easy Moves for Luteal Days” — content driven by Ivy+’s cycle and hormone insights.

2. Lean Into Health Features — Heart Rate, Sleep, Stress, Cycle

  • Heart Rate (72.7%) → Already a strong engagement area

    “Your Calm vs. Stress Timeline” → Weekly digest with HR + stress interpretation

    “You’re most recovered during your follicular phase — let’s build on that!”

  • Sleep Tracking (42.4%) → Big opportunity

    Tie sleep into hormonal balance and stress recovery

    “Quality sleep during the luteal phase reduces PMS symptoms — let’s track it together.”

  • Weight Log (24.2%) → Reframe it to body awareness rather than weight loss

    “Track how your body changes through your cycle — it’s not just about the scale.”

3. Time-of-Day Engagement: Pair Wellness with Routine

Time Slot Wellness Strategy
07:00–08:00 “Gentle Morning Routines” → Push mindfulness content, hydration reminders
08:00–14:00 Wellness tracking prompts: “Log your mood + symptoms”
17:00–19:00 Energy is high → promote active minutes, walking challenges, yoga
19:00–21:00 Push evening rituals: guided meditation, sleep prep, breathing exercises
23:00–05:00 Silence pushes, activate “Wind Down” mode messaging

4. Feature-Focused Campaigns

**Convert underused features into daily rituals:**
  • 💤 Sleep Tracking (42.4%)

    “3 nights of quality sleep = 1 full day of hormonal balance”

    Offer guided bedtime audio + insight summaries

  • ⚖️ Weight/Body Awareness (24.2%)

    Replace “weight log” messaging with “body state” — less focus on pounds, more on hydration, inflammation, and bloat through the cycle.

  • 💗 Heart Rate & Stress

    “See how your breathing changed during today’s meeting” → Real-time stress coaching

    Daily “wellness readiness” based on HR + sleep

5. Personalized Wellness Routines Based on Engagement

Device Usage (days/week) Suggested Strategy
3–4 Days Re-engagement nudges: “You’re 2 days from a self-care streak!” + cycle tips
4–5 Days “Wellness Builder” weekly summary + next week’s focus (based on cycle phase)
5–6 Days Push deeper features: “Let’s add meditation to your strong routine”
6–7 Days VIP messages: “You’re part of our 3% elite 🌟 Here’s early access to…”

Appendix & References

Appendix Notes

  1. Variable Definitions
  • Active Minutes: Calculated as fairly active minutes + 2 × very active minutes, based on fitness guidelines that equate 1 minute of vigorous activity to 2 minutes of moderate activity.
  1. Data Cleaning & Preprocessing
  • Sample Size: The March dataset had 35 users, but only 33 remained for April. Analyses (except key feature usage) excluded the 2 users with missing April data.
  • Key Feature Usage: Based on the full March and April datasets.
  1. Model Assumptions
  • Missing Days: Days with no data were assumed to be inactive (0 active minutes), based on the likelihood that users wear the tracker when they work out.
  • Engagement Time: Assumed to align with physical activity, as users typically interact with the device before or after workouts.
  1. Limitations
  • No demographic data (age, gender, location, etc.).
  • Limited time range (March and April only).
  • No data on workout type or session duration.

References