​Case Study : Cyclistic Bike Share Data Analysis
A Google Capstone Project for Coursera
By Shaurya Singh Bhati

Introduction
This is my attempt at the Case Study 1 project-work for the Data Analysis course by Google on Coursera.
I come from a non-programming background and this project is my first attempt.
It took me 3 days in total to finish. So, any feedback or suggestions are welcome. I hope to use this as the starting point for my portfolio to foray into this field.
I have listed the details about the project in the table of contents below. You can skip to the end for my analysis, insights and suggestions for the Cyclistic bike sharing venture.
Table of contents :
-
About the Company
- About my role.
- About my business task. -
Analysis & Insights
-
Suggestions
-
Conclusion
About the company
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Lily Moreno is the director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
My role : A junior data analyst working in the marketing analyst team at Cyclistic.
Business Task : The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand - “how casual riders and annual members use Cyclistic bikes differently”.
Process
I started the project by first viewing the criteria and then reading about the company, my role, the main question - “How do casual and annual members use Cyclistic bikes differently? ”
To answer that question, I began by noting down my own questions based on my reading of the available data -
-
What’s the total number of rides and of that how much percentage each type of member shared?
-
What’s the average duration each travelled for?
-
What was the busiest day, hour, time of day, month or season?
-
What’s the price difference? (But, this data wasn’t available so I kept it out of the scope for my analysis)
So, I used the above questions as core and started working on my analysis. I began with excel and then moved onto Rstudio to convert the multiple csv files of a year into one data frame combining those files into one year.
This helped in finding the answers to the above questions easily and was the basis for my visualisation by graphs.
Microsoft Excel
First, I used excel. I opened all csv files one by one for the most recent year, which in my case was from August 2022 to August 2023.
I edited all these files by adding three columns - day, start_at and end_at [ day of the week, starting time and ending time respectively]. All three columns were derived from the already given column of started_at and ended_at which contained both Date and time as data for the bike rides together.
[ I had tried the separate function in R first but there was a glitch in it as it wasn’t able to read the ‘dttm’ class when I ran the function even though it was visible in the console as describing the column’s class as ‘dttm’. So, after trying other functions and facing the same error I changed the csv files excel first. Then, imported them to Rstudio. ]
Then, I removed some other unnecessary columns like the start_station_id, start_lat, start_lng, etc.
Then, I saved all these files in one separate folder in order to use the multiple files import function in Rstudio.
​
Rstudio
​
Here's the code for 1st Session of R and the 2nd Session of R.
​
FIrst I installed the “dplyr” package and then activated some other already installed packages of “tidyverse”, “lubridate” and “readxl”.
Then, I used the read.csv function combined with pipe function to import the folder containing all the csv files and used bind rows function to combine all those files into one data frame. This helped convert monthly data into a one year data which made it easier for further analysis.
Then I created another data frame derived from the previous formed data frame. I kept them separate to edit and add new columns to it.
Then I added a few more columns to the new data frame which were :
-
Hours
-
Month
-
Year
-
Time of day (Morning, Afternoon, Evening, Night)
-
Season (Summer, Spring, Fall, Winter)
-
Duration (the amount of time riders rode the bike for)
After this I ran a cleanup check of the data to check its credibility by making sure there were no duplicate records and to make sure that there were no NA rows. If there were it would have hampered my analysis as it would imply using incomplete data.
Then, I saved this new data frame as a csv file on the computer. Then I closed this session of R.
Then I started a new session of R and opened a new R script. Then, I imported this csv file using the read.csv function into a new data frame.
I did this to save memory space which is very important to make Rstudio run smoothly and fast. (Those who have used Rstudio would know how it runs when you are working with multiple big data files.)
In this new session I performed my analysis by using group by function mostly to get answers to all those questions mentioned earlier. For that I made more data frames using this new data frame, which covered :
-
Total Rides
-
Total Rides per member
-
Total rides per hour
-
Total rides per hour per member
-
Total rides per day
-
Total rides per day per member
-
Total rides per Time of day
-
Total rides per Time of day per member
-
Total rides per month
-
Total rides per month per member
-
Total rides per season
-
Total rides per season per member
-
Average duration
-
Average duration per member
-
Average duration per bike type
-
Average duration per bike type per member
Then I used these data frames to create graphs. For that I used the “ggplot” function.
This part of the coding was very tricky, especially since in all the graphs I displayed the member type separately side by side for better analysis.
I further edited these graphs and added color, title, subtitle, data points and then saved them as images using the “ggsave” function.











Tableau Public
After creating visuals using R, I created another set of visuals using Tableau.
Here’s the link for Tableau dashboard.
I have attached an image of it here too but for a better understanding I would suggest to visit the link.
I tried importing the whole one year file first but it was too big for tableau. So, I used the separate csv files I had created using R to create the particular visuals. I struggled to apply a common legend like a filter to the whole page. So I had to keep the different legends for all the graphs even though the data in the legend is same.

Analysis :
For both types of riders the busiest time of day was Afternoon and the busiest season was Summer.
But, How do they use bikes differently ?
-
The annual members had a majority in the total number of rides. They drove 56% more of the total rides.
-
The casual riders had a ride duration more than the annual members which was an average duration of 21 minutes.
-
On the weekends, particularly Saturday, the footfall of casual riders has a sudden spike.
So, the figures of annual members were pretty much consistent, the casual riders had sudden spikes during the weekends and Summer season.
The casual riders also drove for a longer duration.
Suggestions :
Based on the above analysis I would suggest the following :
-
To give a price incentive and additional perks for driving a certain distance.
For example, As we know of the behaviour of casual riders of utilising bikes more on weekends and of travelling longer distances as compared to the members, then the company should come up with a special weekend offer so that these potential customers are inclined to take up a whole year’s subscription instead of a one-day pass.
-
The company should also launch student discount plans, tie up with delivery partner companies and tie up with corporate parks to tap that crowd using the daily commute.
-
Assuming that the company uses a mobile app to function and process payments, they should also launch reward points which can be availed as points for more rides or at partner stores as discounts ( for example using the earned reward points to avail a discount at starbucks or similar brands).
All these will help in increasing the company’s market share and help reach those potential customers who drive as casual riders currently.
The tie-ups will work in increasing their brand presence and if executed properly then soon it will become part of the work culture forming a sense of community, just like grabbing a ‘cup of joe’ (coffee) while on the way to work.
Conclusion
I have provided my insights above. I hope my analysis helps establish how these two member types used Cyclistic bikes differently.
Other than that if I were to suggest anything then that would have been based on my personal understanding of the market and not based on the data available. Doing that wouldn’t have served right.
While working on this I learned :
-
Excel and its functions
-
Working with Rstudio, its functions, creating data frames, filtering and sorting those data frames, creating graphs and more design elements.
-
Working with Tableau and its designing elements.
Do leave a feedback or suggestion. It will be a big help to improve my analysis technique and process.
Resources :
-
Coursera course modules.
-
Google Docs as a record log to organise and arrange my work.
-
Google to take help for my errors on R.
-
Bing chat-gpt 4 for help in coding errors of R.
-
I took inspiration from “kelly adams” google capstone project. Her work worked as a guide to help me align and streamline my process.