Wearable data analytics: A year-long exploration of smartwatches and data engineering
Oct. 2, 2023 | fitness, wearables, tech


Intro

Embarking on a project that blends one's passions with professional expertise is often seen as a gateway to innovation and learning. My recent year-long endeavor encapsulates this blend perfectly. In this narrative, I unfold my journey through a meticulous project at the intersection of wearable technology and cloud-based data engineering. The quest involved performing Extract, Transform, and Load (ETL) operations on a colossal dataset, precisely, millions of rows of smartwatch data hailing from an array of Garmin smartwatches (Vivoactive 3, Vivoactive 4, Forerunner 945 LTE), alongside an Apple Watch Series 8. Skip to the bottom of the blog post for an architectural overview of the data engineering workflow in AWS.

The seedling of this project was sown with a vision that spanned across three primary goals. Initially, the endeavor was an open field to unify my fervor for fitness, affinity for wearables, and a knack for technology with my technical prowess in data engineering, data science, and full-stack software engineering. It was a stage set for the convergence of hobbies and expertise.

Moreover, a spark of curiosity ignited the analytical side of me. The comparison of data output from the crème de la crème of smartwatches, the Garmin array and the Apple Watch Series 8, promised a venture into an insightful analytical expedition. The subtle nuances in the data captured by these top-tier gadgets were not only intriguing but were a reflection of technological advancements in the wearable domain.

Lastly, this project was a ticket to a deeper dive into the realms of cloud computing and web development. The journey entailed running robust ETL processes on Amazon Web Services (AWS), crafting analytical dashboards, architecting databases, and weaving API endpoints on a Django site hosted on Google Cloud (GCP). The venture was not just a skill-sharpening exercise, but a playground filled with tech stacks that resonated with my career path.

Background

The Apple watch has been touted as having the most accurate optical heart-rate (HR) smartwatch sensor on the market (1 , 2, 3) thus I was interested to see how close the Garmin data was. For context, the Garmin Forerunner 945 LTE uses Garmin's Elevate forth-generation optical HR monitor, while the Apple Watch Series 8 has Apple's latest third-generation optical HR sensor (see here). Note - Garmin recently (June 2023) released their fifth-generation Elevate optical HR sensor on the Garmin Fenix 7 Pro. These generations of optical HR monitors have been claimed to be close to "gold standard" of non-invasive HR monitoring: chest-straps, that use electrical signals rather than optical measurements. 

Over the last 6 years, I've collected data from 3 Garmin watches that I have owned - Vivoactive 3 (2017-2019), Vivoactive 4 (2019 - 2021), Forerunner 945 LTE (2021 - present). In total, I've had 6 years of Garmin Forerunner data and 1 year of Apple Watch data. With each watch recording data every few seconds 24-hours a day, 7-days a week, 365-days a year, this quickly accumlates to millions of data points (for refernce, there are 500k minutes in a year).

Over the last year (starting September 2022) I have worn both the Apple Watch and the Garmin Forerunner so that I can collect data for direct comparisons between the "latest and greatest" that each brand has to offer in the smartwatch devices. I did not opt for the Apple Watch Ultra as the main sensors that I (and 99% of people) probably care about are identical between the Ultra and the Series 8.

Data pre-processing

Each device has its own unique quirks for obtaining and processing the data.

Garmin by default is not privacy first when it comes to your data as they uploaded all smartwatch data striaght to their cloud infrastructure for analysis then send that data back to your watch / Garmin Connect for you to view. In order to get my hands on my own Garmin data I used a script called GarminDB (thanks Tom Goetz) that allows you pull all Garmin data (6 years in my case) by using your Garmin Connect login credentials. The script also parsed the FIT format into SQLite format which can be easily exported to CSV for uploading to S3 for ETL.

For the Apple Watch, the process of obtaining your data is easier as Apple stores your health data on your iPhone rather than Garmins approach of sending all data to company servers for storage/analysis. In the iPhone Health app you can select to export all of your health data in a single XML file to a MacBook via AirDrop. I wrote a Python script to parse the XML file into multiple CSVs for each catagory within the XML file e.g. HeartRate.csv, Sleep.csv, Steps.csv, etc. I went on to write a custom iOS Shortcut that utilses HTTP requests to post the health data as JSON to an API endpoint. This seemed to work well and would be a more viable option for running dialy or weekly data dumps if this ever span out into a business. 

Now the data was ready for uploading to "the cloud" or AWS S3 buckets in my case. I fired up the AWS CLI, made 2 buckets in AWS S3, and uploaded 100Mbs of data.

Extract, transform, load (ETL)

With the CSVs in their S3 buckets, I wrote an AWS Lambda that fired when new files landed in the "raw" S3 buckets. The Lambda triggers an AWS Glue job that uses PySpark to perform ETL with the transformed data ending up in "clean" S3 buckets. Another AWS Lambda is then triggered to begin a AWS Glue Crawler to index the CSV structure ready for querying in AWS Athena via SQL and eventually dashboarding in AWS Quicksight. The alternative path for data analysis is a AWS Lambda that POSTs the data to my Django website, hosted on Google Cloud, via a Django-Ninja endpoint that parses the data into Cloud SQL ready for my Django site to pick up the new data and visualise it via Django-Plotly-Dash.

 

Insights to come...



- MM3