Combined analysis of accelerometer and gps data

I am delighted to inform you about a set of software tools I have been working on for the HABITUS project led by Jasper Schipperijn. I already mentioned this project in a blog post from 2020, so I think it is time for an update. The tools I worked on, named hbGPS, hbGIS, and HabitusGUI, aid the combined analysis of wearable accelerometer and GPS data. Here, I focussed on data collected with GPS sensors from the brand Qstartz. However, the accelerometer part of the functionality should be applicable to any accelerometer brand.

Earlier this year, Teun Remmers and Dave van Kann at Fontys University, who co-funded the development, successfully used the tools in a research publication. Nonetheless, we think ongoing community efforts are needed to improve the tools for use in the wider research community. In this blog post I summarise to you:

  • The work that has been done.
  • The additional work that we think is needed.
  • How you as research community can help move the project forward.

A short history

Using GPS and accelerometer data for tracking human behaviour is not new. In fact, the research community has explored various tools and algorithms. Unfortunately, many of the tools that have been used have not been well preserved. For example, the PALMS software for detecting trips from GPS data and the software named palmsplus for matching PALMS output with GIS data are no longer maintained and available.

Efforts have been made to address this by the Python library PALMSpy which aims to reimplement PALMS and by R package palmsplusr which aims to replace palmsplus. Although this has been a great work, a couple of challenges remained:

  • PALMSpy has primarily been developed to work in a cloud environment and not on desktop computers, which small scale usage. Further, PALMSpy has been tailored to older count-based activity monitor data that is no longer collected, has no functionality to estimate indoor versus outdoor, and is only functional for a limited number of GPS file formats.
  • palmsplusr has been developed to be used interactively by the user and not as a dependency inside another software.

Additionally, the need for working with multiple tools written in Python and R complicates the local installation and usage for health researchers.

Initial efforts to address the limitations led me to the development of a set of new tools:

hbGPS

R package hbGPS aims to replace PALMSpy, which in turn was based on the PALMS software. Just like PALMS and PALMSpy, hbGPS identifies trips from GPS data and fuses this with accelerometer data. However, instead of having its own accelerometer data processing functionality it fully relies on R package GGIR. As you may know, GGIR facilitates both historic and modern data formats and offers wide variety of data processing options. Another advantage of hbGPS over PALMSpy, is that hbGPS is much faster and can run on a laptop.

Please note that hbGPS does not attempt to replicate either PALMS or PALMSpy but has its own new algorithm for trip detection. It is a simple knowledge-driven algorithms to offer an intuitive base functionality. Although, we do not exclude adding data-driven techniques in the future.

hbGIS

R package hbGIS is a re-implementation of R package palmsplusr, developed by Tom Stewart, to ease using it as dependency inside other software. For example, the package now offers a single function interface and allows for configuration files. hbGIS is primarily intended to facilitate the analyses of the following scenarios:

  1. Time spent in and travel in between locations that participants are affiliated with (e.g. children and their school) and base locations participants (e.g. home).
  2. Time spent in and travel in between public location participants are not affiliated with (e.g. central park and beach park), or categories of locations (e.g. parks).
  3. Combinations of 1 and 2.

In the output you will find a summary of travel and physical behaviour per time point, per day, per trip and per recording. This part is most based on code provided by Tom Stewart, which I restructured to fit inside the new package.

HABITUSGUI

HABITUSGUI is a Shiny app as graphical user interface to configure and start the analyses of wearable sensor data with GGIR, hbGPS, and/or hbGIS. Originally the app was developed with a cloud environment in mind as PALMSpy was designed for the cloud. However, now hbGPS and hbGIS can both run fast on a desktop computer it may be considered less essential to use the Shiny app as all tools can also be used from the R environment directly.

Documentation

Each of the above mentioned packages has its own documentation, but additionally Josef Heidler, PhD-student at SDU, created a more high-level documentation overview page.

Acknowledgements

I would like to thank a number of people:

  • Josef Heidler and Jasper Schipperijn for the productive brainstorm sessions that helped me develop this tool.
  • Teun Remmers, Line Matthiesen and Josef Heidler for helping with testing.
  • Tom Stewart for developing palmsplusr on which hbGIS is based and for guiding me through the code.
  • Dave van Kann and Jasper Schipperijn for trusting my service as a consultant.

Interested in using the tools in research?

If you are interested in the potential of these tools then I would like to invite you to help me continue the development via any of the following routes. Please let me know if you want to brainstorm about any of these options.

  1. Test the software tools and let me know your observations and problems.
  2. Collect or share data to help formally evaluate the tools in a publications.
  3. Volunteer to help evaluate and describe the software in a publication. To facilitate such effort I included a narrative description inside the software here and here.
  4. Help review and improve taxonomy.
  5. Volunteer to help continue the development and maintenance of the software.
  6. Find funding to sponsor my ongoing involvement in the development. I am happy for someone else to partly take over my role as developer, but if you want me to continue this work then I am in need for funding for my time. I am not an academic and I am self-employed, by which I depend on the research community to find funding for my time investment. A few thousand euros would enable me to provide some limited coordination and maintenance work, while a larger amount would allow for a more active involvement. The software is free, which means that I need an alternative source of income. Please contact me if you are interested to help with this.

 

 

Comparability of raw data from different accelerometer brands and configurations

A new article was recently published from the project I did with Annelinde Lettink and colleagues at Amsterdam University Medical Center.

The gap in knowledge we aim to fill

For many years the field has focussed on finding useful algorithms to extract insight from raw data accelerometry. People, me inclusive, either assume that the algorithm is applicable across sensor brands and configurations given that they all store data in the same gravitational units, or we investigate comparability at algorithm output level. However, the comparability at the most fundamental level, the raw data itself, has never been studied.

How to assess comparability?

For comparing raw data between accelerometers we need to be sure that movement is 100% identical. It is impossible to attach a set of accelerometers on the exact same human body location. Neither can we expect an individual to repeatedly make the exact same body movement while wearing a different accelerometer at each repetition. Therefore, the only way to investigate the comparability at raw data level is by using mechanical movements. For this we dusted off the mechanical shaker table that I previously used during my McRoberts days in 2005-2008.

The next challenge was to get access to a sufficiently large pool of accelerometers from various brands. The pandemic lockdowns offered a solution as many research groups were not using their accelerometers. So, we asked colleagues in the field to lent us a sample of their accelerometers.

What makes this study valuable?

The results of our study reveal that raw data differs between brands in both the time and frequency domain.

However, this is not some geek project with no relevance to the real world! Awareness of differences in raw sensor data allows us to better anticipate how the output of any algorithm applied to it will be affected. And yes, this applies to any algorithm, including both machine learning techniques and domain knowledge driven approaches.

Further, understanding raw data comparability helps us to be more confident when using these algorithms across sensor brands and configurations.

Open access data and open source code

All data, code, and more detailed documentation of experiments have been shared publicly to enable reproducibility of our findings and facilitate future research. On that note, we even share additional experiments which we did not use in the published article. We did these additional experiments in the hope to maximise the potential value of the dataset.

For example, I was curious to know whether it is possible to develop an experiment that anyone can do in any research context that would be equally informative as a mechanical shaker experiment but does not require an actual mechanical shaker. So, I attached a series of accelerometers to the edge of a door and moved the door repeatedly. I know this may sound ridiculous, but what if this little experiment could help studies to identify problematic sensors? I have not had time yet to look at the data and investigate this, but maybe you do?!

Further, it may be worth highlighting that there is a video to summarise the experiments inside the documentation (credits to Annelinde for creating the video!).

If you plan to use any of the data or code and run into difficulties then do not hesitate to reach out!

Federated Data Analysis in ProPASS

The aim of this project is to help implement Federated Data Analysis in ProPASS. ProPASS is the international research consortium for Prospective Physical Activity, Sitting and Sleep. The consortium aims to facilitate pooled analyses of data from members around the world to boost statistical power. Their data includes accelerometer data and various other typical epidemiological data types. ProPASS actively works on addressing challenges of pooled analysis such as data harmonization.

Pooled data analysis when data cannot be shared

Privacy concerns and research regulations can prohibit the sharing of research data for a pooled analysis. Study level meta-analysis (SLMA) is the next best option. However, we know that SLMA becomes inefficient when conducting explorative analysis: A local person at each data site needs to run the analysis script and share results with every iteration of the analysis done by every researcher in the consortium.

Individual level meta analysis (ILMA), often referred to as Federated Data Analysis or Federated Learning, circumvents this problem. Here, secure multi party computation is used to perform statistical analysis on individual data points across multiple data sites without disclosing identifiable information. In other words, a single person can run the analysis across all the data in the consortium without the need for action from staff at the data sites.

Accelting’s involvement

The ProPASS working group has asked me to help coordinate the implementation Federated Data Analysis in ProPASS. We identified DataSHIELD and the Obiba software stack as the most promising technologies to enable Federated Data Analysis. Several other consortia in the life sciences use DataSHIELD and it has an active developer community.

I am not doing this project on my own, but work closely with a task group of early career researchers: Doua El-Fatouhi, Jairo Migueles, Jonah Thomas, and Esther Smits. Also, Joel Nothman from the Sydney Informatics Hub has been making valuable contributions to our discussions.

So far, we have piloted the installation and usage of DataSHIELD on machines that hold non-confidential dummy data. Further, we will soon roll out a survey among a sample of ProPASS members to gain better insight into their statistical needs. For the upcoming year the goal is to set up the infrastructure and try to find additional partners to help set up some of the infrastructure components such as a user management system and tooling to aid process management.

Update August 2023:

Following this post I helped ProPASS to establish a working agreement with the DataSHIELD consortium to develop the infrastructure for ProPASS. As result, my involvement in ProPASS could be concluded.

 

Classifying behaviour in Covid-19 patients

Chronic fatigue is a condition where patients perceive extreme fatigue for long periods of time. Mental activity or physical activity worsen the fatigue, while paradoxically taking rest is not always a solution. The Dutch Expert Center for Chronic Fatigue explores how patients with Chronic fatigue can best be supported. The issue of chronic fatigue is especially timely now many Covid-19 patients also experience chronic fatigue during their recovery, which is the focus of the Centre’s new ReCOVer study. If you are living in The Netherlands and have had Covid-19, please check out the study’s participant recruitment website.

Physical behaviour assessment

A good understanding of fatigue requires insight in a patient’s lifestyle in terms of physical behaviour. To quantify physical behaviour the Expert Center has used an ankle worn accelerometer, named Actometer, since 1997 (see image). In short, the Actometer and the accompanying software classifies an individual into one of three classes: Pervasively passive, fluctuating active, and pervasively active. Clinicians use these classifications together with other information about the patient to inform a treatment program. Unfortunately, the manufacturing of Actometers has stopped. Therefore, a transition to modern accelerometers is needed. Although, it would be great if we can somehow preserve the informative value of the Actometer software output that the clinicians are familiar with and that is supported by literature.

Can we replicate the old technology?

Around the year 2014 I already explored whether Actometer output can be mimicked with modern accelerometer data. For this I relied on the descriptions provided in the original publications by Vercoulen et al 1997 and van der Werf et al. 2000. Unfortunately, an accurate replication proved to be difficult at the time, possibly explained by unknown differences in hardware properties between old and new sensors and the software not being open source.

In 2019, Prof. Dr. Hans Knoop from the Dutch Center for Chronic Fatigue asked me to pick up the work again. However, this time with an additional question: Can we replicate ankle-worn Actometer output based on a wrist-worn accelerometer? Wrist-attachment opens the door for better comparability with many modern studies, especially for sleep assessment. In this blog post I am providing an overview of the work I have done to help the Dutch Center for Chronic Fatigue with this.

Classification model

Given the known challenges with replicating Actometer counts and 1990s sensor output in general, I decided to go for a simple interpretable indicator of body movement. The indicator roughly reflects the Actometer approach as it applies a band-pass filter to the three acceleration signals. Here, I use the same filter settings as described for the Actometer. Next, I calculate the average vector magnitude during waking hours per day. To identify waking hours, I use the sleep detection algorithm I previously developed for UK Biobank, aided by sleep diary (if available) as I described in a separate study. After this I calculate the 91.66th percentile of the distribution of this variable across the days. Here, the 91.66th percentile matches the required 11 out of 12 days of inactivity in the original Actometer algorithm described by van der Werf. Finally, I use the resulting acceleration value to classify a person as pervasively passive with logistic regression. Note that I merge the classes ‘fluctuating active’ and ‘pervasively active’ in one active class as discriminating those was considered less relevant.

Training the model

The logistic regression model was trained with a dataset where patients wore simultaneously one Actometer on the ankle and one accelerometer (ActiGraph) on the wrist. The performance of the classification model was assessed with leave-one-out cross validation. Given that there are no hyper parameters to optimize and given the moderate sample size (N=50), I decided not to create an additional independent test set. However, when more data is available later this year we could use that to improve the model or as independent test set.

Reflections on model performance

The model currently classifies 42 out of the 50 individuals correctly, 6 active individuals are misclassified as pervasively passive, and 2 pervasively passive individuals are misclassified as active. However, the model’s classification of behaviour appears consistent with what we see in the data. Patients classified as active truly move a lot and patients classified as pervasively passive truly move very little. One plausible explanation for the misclassified  patients is the difference in sensor placement. For example, it is not hard to imagine that an ankle worn sensor picks up cycling better than a wrist-worn sensor. On the other hand, an ankle sensor will miss the arm movements when sitting down. We did not encounter any major issues with the quality of the data.

Clinical research implementation

Aside from developing a classification model I worked on its implementation in the clinical research setting. The intended users are research nurses and psychologists who have limited time to be involved in the data handling and are not computer savvy.

In a previous project, named Optimistic, this was addressed with a cloud service, but in the current project I decided to initially work on a local offline solution. This allowed me to focus on the model development first. The development of a cloud service could of course be a future enhancement to facilitate scaling it up.

Software design

I used R package GGIR to pre-process the accelerometer data. Therefore, I also developed the rest of the data analysis in the R programming language and organized it as a complementary open-source R package. For now, the code is in a pragmatic location, with a temporary name, and with limited documentation. I am aiming to improve those aspects later this year.

Software interface

Before using the model for the first time, the user (research nurse/psychologist) will have to install R and RStudio on their laptop. Next, the user must download a specific R script, open it in RStudio, and press the Source button in RStudio. This will trigger the installation of all relevant software dependencies.

Initially I explored RShiny as a more attractive user interface. However, I decided to not pursue this yet as the added software complexity worried me. Complex software is harder to maintain and more bug prone, and also I wanted to prioritize my time towards the model itself.

After the installation, the software asks the user to specify the input and output directories and then the data processing automatically starts.

Software output and speed

At the end of the process the software creates a pdf-report. The pdf-report shows the classification of the patient as pervasively passive or active, a Z-score of the average acceleration relative to a reference population, a visual overview of the day-by-day physical behaviour level, sleep characteristics, and accelerometer wear time. The software completes a 14-day accelerometer recording within 10 minutes on a standard laptop.

Do you want the same for your study?

Recently the ReCOVer study started to use the solution that I am describing in this blog post. If you have feedback, want to know more, or would like to hire me to work on similar wearable data challenges then let me know!

HABITUS

Merging and processing accelerometer and GPS data

HABITUS is a system for merging and processing accelerometer and GPS data hosted by the University of Southern Denmark, and the successor to the PALMS system which used to be at the University of California, San Diego. HABITUS, which stands for Human Activity Behavior Identification Tool and data Unification System, is part of the University of Southern Denmark (SDU) cloud environment build and maintained by SDU eScience. The development of HABITUS is led by Dr. Jasper Schipperijn from the Institute of Sports science and Clinical Biomechanics at SDU.

Implementing GGIR

I became involved in HABITUS during the summer of 2019. My initial task was to make R package GGIR available on the platform. Therefore, I modified GGIR to be able to process multiple data files in parallel via R package foreach. Further, I expanded GGIR such that it accepts configuration file as alternative to specifying function arguments explicitly. Next, I put the new version of GGIR in a Docker container to facilitate reproducible research. Our partners at SDU eScience then put this docker container in an app within the SDU cloud environment.

Facilitating other algorithms

Making GGIR available in HABITUS helped us to identify potential bottle necks for other HABITUS users. The next goal was to explore how we can facilitate more algorithms than just the ones implemented in GGIR. The challenge here is that algorithms developed by many others do not always come with extensive data reading or report generation functionality. To address this I made another enhancement to GGIR such that it can embed external functions. This means that we can now run any algorithm for modern accelerometer data on HABITUS via GGIR. As a first test case we used the Actigraph-count imitation algorithm developed by Dr. Jan Br¢nd and implemented by Dr. Ruben Brondeel in R package activityCounts. In short, Actigraph counts were the output from one commercial accelerometer brand in the 1990s en early 2000s, named Actigraph. Replicating this data metric could facilitate historical comparisons of human behavior. The imitated counts can then serve as input for the PALMSplus software by Dr. Tom Stewart, who is also part of the project team. PALMSplus facilitates the combined analysis of GPS, GIS and Actigraph count data.

Next steps

The project team is applying for funding to support the development of HABITUS. For more information and specific questions about the HABITUS see https://www.habitus.eu.

Sleepsight analytics pipeline

Sleepsight

People living with psychosis often experience problems with their sleep, particularly when symptoms worsen. Sleepsight is an innovative research study taking place in South London which uses wearable and mobile technologies to study the links between sleep, activity and symptom levels in psychosis. The study is led by Nick Meyer a psychiatrist based at the Institute of Psychiatry, Psychology & Neuroscience at King’s College London, and South London and Maudsley NHS Foundation Trust.

Data collection

For the duration of one year patients used a study smart phone and wore a consumer wearable. The smart phone collected data on GPS, app usage, battery status, accelerometer, and screen status. Additionally, patients filled in a daily sleep- and mood questionnaire.

Analytics pipeline

Consultant Chris Karr developed a phone app and server-site software to collect the data. Next, I developed complementary software (GitHub link) to assess data quality, omit data segments as required, and fuse the data types into a single activity score. Further, the software stores it’s output in a research friendly format. And as a final step, it generates a heatmap visualisation with ggplot2 to aid quick exploration of the data.

Project status

The data collection phase has ended and data analysis is in progress.

 

Physical Activity descriptors for Cardiometabolic Health

Earlier this year I started a new project with Dr. Séverine Sabia at the French National Institute for Health Research (INSERM). I have successfully collaborated with Dr. Sabia in the past. Our new project explores how novel descriptors of accelerometer data relate to cardio metabolic health.

Time series extraction

My involvement is to help implement these novel descriptors in R package GGIR. A first step was to revise the existing software code to extract cleaned time series data. For example, we did not want to rely on sleep detection for the first recording night, but we considered it valuable to have some estimate of waking-up time on the second recording day. So, we had to decide what information about the first night to trust and use for that. Further, we want to exclude nights for sleep analysis when an accelerometer is not worn. However, we want to include those nights for 24 hour time-use analysis if sleep diary indicates that the accelerometer was only not worn during sleep.

Next steps

The next step in the project will be to use these time series as input for various behavioural descriptors. The technique I will mainly focus on is the behavioral fragmentation analysis as most recently implemented in R package ActFrag by Junrui Di and colleagues (https://doi.org/10.1101/182337). The plan is to implement these metrics in GGIR and explore opportunities for improvement.

 

Data Quality in German National Health Cohort

Accelerometer data in NAKO

The German National Health Cohort (NAKO) has collected high resolution accelerometer data in a large sample of the German population. After visiting the study centre, the study participant wears the accelerometer on their hip for the next seven days.

Data quality assessment

The University of Regensburg, which is one of the 18 study centres, has asked me to help build and implement a data quality assessment pipeline. It is expected that R package GGIR will provide most of the required functionality. In this project I will closely work with Prof. Dr. Leitzmann’s group to implement GGIR. Also, I will develop new functionalities needed for this specific data set. Another aspect of this project is to educate project partners about accelerometer data analysis. Further, we will create detailed documentation of the data collection and data quality assessment process.

Status and next steps

As a result of COVID-19 we started the project digitally. Nonetheless, we are expecting to perform a first pilot analysis in the upcoming months. Data quality and processing decisions can have major impact on the research done with it. Therefore, we will seek consensus with other experts in the field.