Pages

Saturday, June 7, 2025

First steps toward a personal health LLM

Google published a manuscript and accompanying blog post introducing a novel large language model tailored for understanding and reasoning about personal health questions and data. Just like the goal in precision medicine is to diagnose and treat each person as a unique individual rather than an average patient, the goal was to fine tune LLMs (large language models) for personal health tasks. What types of tasks? Google’s team focused on insights into sleep and fitness, using data from wearable personal trackers.

One part of the study used wearable data and patient-reported outcomes to evaluate the LLM's ability to predict sleep quality outcomes. The researchers took advantage of a dataset of 4,759 individuals with 20 wearable device measurements and extracted a set of 16 sleep-related outcomes such as tiredness, refreshed, enough sleep, trouble sleeping, etc.  The wearable monitor data included bedtimes and wake times, time spent in various sleep stages, as well as overall physical activity while awake (Figure 1).

The model termed PH-LLM (Physical Health LLM) was trained in the following manner. First, the researchers selected a base model, which was an early version of Google Gemini. Second, the base model was fine-tuned on the patient-reported outcomes from wearable sensor data. More specifically, a multilayer perceptron (MLP) adaptor was used to map the wearable measurements into the latent token space of the LLM, i.e. create an embeddings for the numerical measurements that would translate the numerical values into "latent tokens". The wearable measurement latent tokens could then be provided as context to the prompt for predicting patient sleep outcomes.

Frankly, the results were not particularly impressive with the authors stating that PH-LLM was able to achieve "only modest predictive power for perceived sleep quality metrics." In addition, there were no statistically significant differences in performance between PH-LLM and specialized logistic regression models (simple control models).

Although not anything to write home about, this work represents an important first step toward personalized AI health applications that can be fine-tuned on wearable data. In the future, one can imagine more sophisticated models relating various wearable data to health conditions and behaviors that achieve more accurate predictive performance.
Figure 1. Using data from a wearable monitor such as a smartwatch to train a personalized LLM to predict sleep outcomes (https://research.google/blog/advancing-personal-health-and-wellness-insights-with-ai/). 

No comments:

Post a Comment