GGIR release 3.1-5

The GGIR 3.1-5 release is now on CRAN. In this blog post I will talk you through the main updates since the 3.1-0 CRAN release from May 2024. For full overview see the GGIR changelog, where # numbers refer to specific issues in the GGIR GitHub repository.

Reverted non-wear detection change from release 3.1-3

In release 3.1-3 I added a new parameter named nonwear_range_threshold. This new parameter controls the allowed range in raw acceleration values as part of accelerometer nonwear detection. While adding this parameter it seemed a good idea to change the default from 150 to 50 mg. However, I soon realised that it was a bad idea as it actually prevents non-wear detection in some recordings. Therefore, I reversed this to what it has been for years: 150 mg.

Thanks to Michael Reuschman (Boston) for spotting the issue and bringing it to my attention.

Controlling inclusion of incomplete last night in GGIR part 5

Parameter require_complete_lastnight_part5 has been added. It controls whether the last window in GGIR part 5 should be included when it ends early. Effectively, this helps us to avoid the risk that recording endings bias the sleep estimates. Parameter require_complete_lastnight_part5  is turned off (FALSE) by default.

As you may remember, GGIR part 5 is not day or night oriented but window oriented. GGIR allows us to define a window in one of the following ways:

  • From waking up to waking up the following day (WW)
  • From sleep onset to sleep onset the following day (OO)
  • From midnight to midnight (MM).

When require_complete_lastnight_part5  is set to TRUE:

  • The last WW window in the recording is excluded if the recording ends between midnight and 3pm, and starts on a date that is on or one day before the recording end date.
  • The last OO and MM window are excluded if the recording ends between midnight and 9am, and starts on a date that is on or one day before the recording end date.

Motivation

The idea for this parameter originated in a project where the recordings ended around 1am. GGIR is triggered to run the sleep analyses on the last night when it includes the midnight timestamp. In GGIR part 4 output this night is likely to be skipped as it has less than 16 hours of valid data. However, part 5 GGIR still uses the estimates from part4_nightsummary_sleep_full.csv in order to define the end of the final window. One could argue that this is not trustworthy as the person may have fallen asleep after the ending of the recording. Similarly, a recording ending at 9am could be argued to complicate reliable assessment of wakeup time. Additionally, there could be concerns that the algorithms have not been demonstrated to work reliable in nights that are cut short by the recording end.

For now I have set it to FALSE by default considering the negative impact on the number of windows included. However, maybe in the future we can decide to turn it on by default.

Recommended time window of data collection?

It seems inevitable that we always end up with partial days at the beginning and/or end of a recording that are hard to use. Therefore, it seems best to aim for more than 7 days of data collection and wear instruction, even if you are only interested in 7 days of data.

This work was sponsored by the University of Regensburg in Germany and I would to thank Jairo Migueles for testing the new functionality.

Handling study protocol where accelerometer is not worn at night

GGIR’s sleep detection is by design split up in algorithm to classify rest and algorithm or diaries to guide the labelling of those rest periods as sleep. For study protocols where participants are asked to take of the accelerometer during the night we now offer the guider “NotWorn”. This will consider the main non-wear/zero movement period as the SPT window, see documentation for details. Use HASPT.algo = “NotWorn” to set this guider. This thus far experimental algorithm underwent some updates:

  1. Value of parameter HASPT.ignore.invalid is automatically set to NA to ensure that truly all non-wear is forced to be a potential part of the guider.
  2. Issue fixed which caused that the use of guider “NotWorn” was not correctly logged
  3. When “NotWorn” is used the corresponding nights are now automatically skipped in the GGIR part 4 csv report for sleep analyses as we know that no meaningful sleep analyses can be derived from these.
  4. Algorithm itself revised to work with both count and raw data.
  5. Option added to specify a second guider next to “NotWorn” via parameter HASPT.algo. This second guider will be used when the accelerometer is unexpectedly worn by the participant for more than 75% of the noon-noon time window. For example, HASPT.algo = c(“NotWorn”, “HDCZA”).

This work was made possible via support from Dr. Verswijveren at Deakin University, Australia, and the University of Regensburg, Germany.

Migration of package vignettes

Various sections from the main package vignette have been migrated to the GGIR github-pages: wadpac.github.io/GGIR/. The new location offers a number of advantages:

  • Easier navigation for the visitor. The documentation is now split up in chapters that can be navigated via tabs and search bar. For example, all existing CRAN vignettes can be found via the Annexes-tab.
  • Easier to maintain as minor updates can be made without the need to create a new package release.

I created most the new documentation but would like to a few people. Gaia Segastin, PhD-student in Amsterdam, provided detailed feedback on the first four chapters. Further, Fabian Schwendinger and Jairo Migueles for proofreading all chapters.

Other updates worth highlighting:

  • External csv data: Timegap imputation option enabled for ad-hoc csv data.
  • Sleep efficiency: GGIR part 5 output variable sleep_efficiency was renamed to sleep_efficiency_after_onset. This was done to better reflect its calculation and to avoid misinterpretation with other definitions of sleep efficiency as used in sleep research.
  • Continuous LXMX window analyses in part 2: LXhr and MXhr which had one hour offset when timing was after midnight, this has now been fixed.
  • Bug fixed in data_cleaning_file functionality: Bug fixed causing night_part4 column to not be used from the data_cleaning_file.

GGIR release 3.1-0

In this blog post we will talk you through the main updates in the GGIR 3.1-0 release. For a full list of updates since the 3.0-0 CRAN release, see the GGIR changelog.

Major review and tidy up of GGIR part 1 code

Lena Kushleyeva conducted a thorough review and tidy-up of all code related to GGIR part 1. Small differences can be expected in the part 1 output compared with prior versions. This is due to small improvements in the management of timestamps and input data block boundaries. The influence on calibration coefficients may be larger as a bug was fixed. To account for this we have changed default value for parameter minloadcrit to 168 hours to improve the quality of the auto-calibration procedure.

We would like to thank Lena Kushleyeva for the enormous amount of effort she has put into this!

Revisions to handling of externally derived epoch data

The code and algorithm have been simplified to better match expected non-wear detection behaviour. Further, the window length used for non-wear detection for externally derived epoch is now modifiable with the third value of parameters windowsizes (in seconds). As a result, this functionality is now consistent with how it is used for raw data. For example child behaviour research may like to set this to 20 minute as is common in the literature.

Further, we now also facilitates handle of Sensewear xls file format. The Sensewear is an accelerometer brand that no longer exists but for which some research groups still have historical data. To use this set dataFormat = “sensewear_xls” .

The addition of the Sensewear xls format has been made possible with support from researchers at the University of Pittsburgh.

Option to specify study date range per individual

Newly added parameters study_dates_file and study_dateformat allow to specify respectively a csv file with start and end dates when accelerometer is expected to be worn and corresponding date format. This can be useful for studies where accelerometers were sent to participants by mail.

This work has been made possible with support from Nancy W Glynn from University of Pittsburgh.

Logging of encountered time gaps

Time gaps are automatically accounted for in GGIR and this is now also logged for quality assurance purposes. More specifically, we log the number of timegaps and total time imputed for ad-hoc csv and ActiGraph data.

Keeping track of GGIR version numbers

So far, GGIR has stored its version number in the config.csv file. However, we realised that this is ambiguous when using different GGIR versions for different GGIR parts. You would then only find the GGIR version number for the latest run. To address this the GGIR version is now always stored inside each individual milestone file and in each csv report to allow for tracking the version used at each stage of the process for each individual data file.

Implemented and removed update to HDCZA algorithm

In 3.0-7 (28 February 2024) an experimental change was made to the HDCZA algorithm, which was reversed in sub-release 3.0-10 (19 April 2024) . The change was assumed not to cause problems, but we then discovered that it did cause problems and reversed it. Therefore, it is advised not to use GGIR part 3, 4 and 5 output from releases 3.0-7 (28 February 2024), 3.0-8 (5 March 2024), and 3.0-9 (19 March 2024).

GGIR is now more friendly towards users without UK/US computer

  • Parameters dec_reports and dec_config were added to configure how GGIR stores csv reports and the csv config file. You may for example want to set this with “;” column separator and “,” decimal separator if you have a computer configured in France or The Netherlands.
  • Further, we noticed that in some parts of GGIR the naming of days can switch to the local computer language. To address this, GGIR now consistently forces the language used for refering to week days to UK/US English.

These developments have been sponsored by an ERC grant led by Dr. Séverine Sabia, Université de Paris, Inserm in Paris.

Day segment analysis able to handle fraction of minutes

Parameter qwindow can now also handle fraction of minutes, which is primarily useful when qwindow is used with activity diaries.

This work has been made possible with support from Marion Gasser and Elena Mathieu from University of Bern

Key updates to GGIR part 5

  • The behaviour of parameter includedaycrit.part5 has been changed for values above 1, these are now interpreted as minimum number of valid waking hours during the waking hours of a day. If you prefer to keep old functionality then divide your old value by 24. The resulting number is the fraction of expected valid data during the waking ours of the day.
  • Automatically stores a dictionary for all variable names in part 5 csv reports.
  • Fragmentation: The computation for fragmentation metric CoV (Coefficient of Variance) was incorrect has now been fixed.
  • Timing of LX is now expressed on a scale between 12 and 36 to allow for meaningful person level summary of this value.
  • Parameter HASPT.ignore.invalid can now take value NA, which means that invalid periods are automatically considered candidate sleep periods when working with algorithms HDCZA, HorAngle, or NotWorn. Note that the default value for parameter HASPT.ignore.invalid is still FALSE which means that invalid time periods are imputed at metric level and then used for the deriving the guider. This is the approach used in our publications, but the new approach might be preferable. More research is needed before we can make this the new default.
  • We have reverted the original decision to prohibit segmentDAYSPTcrit.part5 to be set to c(0, 0). The default remains unchanged and the documentation for segmentDAYSPTcrit.part5 now emphasizes the downside of using c(0, 0).

The  updates to CoV, includedaycrit.part5, and LX have been made possible with support from an ERC grant project led by Dr. Séverine Sabia, Université de Paris, Inserm in Paris. The automated storage of a variable dictionary has been made possible with support from Nancy W Glynn from University of Pittsburgh.

GGIR part 5 inclusion criteria for the entire day

So far, GGIR part 5 only had inclusion criteria for waking hours of the window. This was motivated by the fact that daytime behaviours are the most important aspect of part 5 and need to be representative. However, on top of this you can now also specify inclusion criteria for the entire window. Remember that in part 5 the definition of a window depends on parameter timewindow, which can be OO (sleep onset – sleep onset), WW (waking up – waking up), or MM (midnight to midnight).

To use this new functionality you need to specify parameter includdaycrit with a vector of two values, e.g. `c(16, 16)`. The first value will be used in part 2 and the second value will be used in part 5. If the second value is not specified then it will default to 0 hours, which means that no inclusion criteria is used.

This work has been made possible with support from Nancy W Glynn from University of Pittsburgh.

GGIR release 3.0-0

In this blog post we will talk you through the main updates in the GGIR 3.0-0 release. For a full list of updates since the 2.9-0 CRAN release, see the GGIR changelog.

New version of non-wear detection

GGIR’s non-wear detection has been the same since its first release in 2013. It works as follows: The standard deviation of a rolling signal window needs to be close to the noise level of sensor. If the condition is met, GGIR labels the middle 15 minutes of that 60 minute window as non-wear. You can find a more elaborate description here. This last step of only labelled the middle 15 minutes was intended to make the method conservative in its classification. However, in practise we see that it often leads to missing the first and last 22.5 minutes of a non-wear period.

To address this we have implemented an alternative version of the algorithm to label the entire 60 minute window as non-wear if the threshold condition is met. If you want to keep using the older approach then set argument nonwear_approach to “2013”, but by default it will now be “2023” to label the full window as non-wear.

Onset-Onset (OO) windows in part 5

As you probably know GGIR part 5 facilitates two definitions of a day window:

  • From midnight to midnight referred to as MM.
  • From waking-up to waking-up, referred to as WW.

This functionality has now been extended with:

  • From sleep onset to sleep onset, referred to as OO.

To use this, specify argument timewindow = “OO”.

The addition of the OO window has been made possible with support from Dr. Séverine Sabia (ERC grant), Université de Paris, Inserm in Paris.

Sleep efficiency metric

For a while now, GGIR part 4 derives the sleep efficiency if time in bed is recorded in a sleep diary. The calculation assumes that the end of diary-based time in bed defines the end of the night. Pieter-Jan Marent pointed out to us that this may be incorrect if the person stays in bed after waking up. Therefore, GGIR now offers a new argument to specify how to deal with this: sleepefficiency.metric = 1 (default and consistent with previous GGIR versions) will calculate sleep efficiency as the total sleep duration during SPT divided by the diary-based duration in bed, while sleepefficiency.metric = 2 (new) will divide the total sleep duration in SPT by the accelerometer-detected duration in sleep period time plus the sleep latency (i.e., difference between diary-derived sleep onset and accelerometer-detected sleep onset).

The addition of sleepefficiency.metric option has been supported by Pieter-Jan Marent from KU Leuven

Readability of GGIR code

With GGIR being developed over many years with ever growing functionality it is tempting for us as developers to only focus on fixing bugs, adding new functionalities, and tidying up the old code we encounter during these efforts. Nevertheless, the fact that some parts of the code function well does not mean they are efficiently written. Lena Kushleyeva has been conducting a thorough review of some of the older parts of GGIR and has been proposing and implementing a number changes that have improved readability and efficiency. Thank you Lena!

Reading Axivity cwa files

The readAxivity function from the GGIRread package has been revised and is not only much faster now but also conducts an integrity check for each data block. If the integrity check does not pass for a block, which typically equals 2-3 seconds of data, the block is imputed with a constant value. A log is kept of the blocks that are imputed and incorporated in the data_quality_report.csv file. Corresponding new column names are now documented in the GGIR vignette in the section that discusses GGIR results.

These developments have been sponsored by Dr. Séverine Sabia (ERC grant), Université de Paris, Inserm in Paris, and we would like to thank Lena Kushleyeva for her help with optimising the implementation.

Bug fix in processing Actigraph .gt3x files

A bug was discovered in how .gt3x data (ActiGraph) where processed. More specifically, the autocalibration routine was performed but the calibration correction coefficients were not actually applied to the data. Importantly, this has affected all analysis done on gt3x data since the gt3x reading functionality was added to GGIR in version 2.5-4. We are sorry for not spotting this anytime sooner.

Deprecated functionality for reading .wav and old GENEA .bin files

We have deprecated functionality for processing file formats .wav (Axivity Ltd) and .bin (Unilever Discover). Keeping the corresponding code up to date with newer R versions is currently too time consuming. If you are interested in re-inserting this functionality and helping us to make a plan for ongoing maintenance then please contact us. Note that this does not affect processing .cwa (Axivity Ltd) and .bin (ActivInsights Ltd) files.

Day segment analysis in part 5

Argument qwindow that aids day segment analysis in part 2 now also works for part 5.

However, to do this meaningfully it only considers day segments that overlap for 90% with waking hours. This is controlled by argument segmentDAYSPTcrit.part5 which by default is set to c(0.9, 0) to require at least 90% of the window to overlap with daytime. Alternatively, to focus on overlap with SPT window change this to c(0, 0.9). The documentation for argument segmentDAYSPTcrit.part5 provides further clarifications.

This new functionality has been made possible with support from Marion Gasser and Elena Mathieu (University of Bern, Institute for Sports Science)

Intensity gradient over full window in part 5

Intensity gradient is now also calculated for the full window in part 5 and column names now indicate the window (i.e., “_day_” or “_day_spt_”). Subsequently, the functionality has been investigated in a recent publication.

These developments have been sponsored by Dr. Séverine Sabia (ERC grant), Université de Paris, Inserm in Paris.

Inclusion of first day in part 5 midnight-midnight (MM) analysis

When recording starts after midnight and before 4am, GGIR used to skip the wake-up time detection of this first window and classify the full window as awake. Now, parts 3-4 ensure that this first wake-up time is derived to be used in part 5. This first night starting between midnight and 4am is skipped in the part 4 csv reports as it might be biased, yet the wake-up time is used in part 5 to better define the awake and sleep windows.

Identification of column names in advanced sleeplog format is now case insensitive. Further, arguments sleeplogsep and nnigths are now deprecated as the separator and number of nights are detected automatically in sleep logs without the need for user input.

Appending accelerometer recordings

Some studies use multiple consecutive accelerometer recordings in order to monitor a person for longer periods of time. However, GGIR did not offer a clear strategy for handling such study designs. To address this, GGIR now facilitates appending recordings with matching ID and within a specified maximum time interval in between recordings. To do this it first processes the individual files in GGIR part 1 and then appends the resulting time series in the milestone data, imputes time gaps or uses the newer recording in case of overlap. For more information see documentation for argument maxRecordingInterval.

This additional functional has been made possible with support from Dr. Spilsbury at the Dept. of Population & Quantitative Health Sciences within Case School of Medicine, Ohio, US

Handling of externally derived epoch data

Although GGIR focussed on the processing of raw data, some groups are still in the possession of older epoch level data produced by external software or firmware such as Actiwatch and ActiGraph. To support these groups, GGIR is now also able to process these epoch data. This involves the following arguments:

  • dataFormat, which can be set to “actigraph_csv”, “ukbiobank_csv”, “actiwatch_csv”, “actiwatch_awd”
  • extEpochData_timeformat to specify format of the timestamps

For more information see argument documentation page.

This functionality has been made possible with support from Dr. Mylène Bohmer at Erasmus MC in The Netherlands and additional feedback from Chris King based in Cincinnati, US.

Updated OSS license

Open-Source Software License for GGIR changed to Apache 2.0 as discussed in our previous blog post.

Comma separator handling

  • You can now specify the separator used by GGIR to write csv-report files with argument sep_reports. This is used for reports in part 2, part 4 and part 5, data_quality_report.
  • You can now specify the separator used by GGIR to write the .csv config file with argument sep_config.
  • GGIR further has increased flexibility to read csv files related to sleeplog, activity log, data cleaning file, and csv containing calibration coefficients, by automatically detecting any separator argument from the set of [,\t |;] with function data.table::fread.

Visual report

Various bug fixes in the “visualreport” which is a pdf files with plots of the data and classification (argument visualreport). Further, a new Boolean argument was added visualreport_without_invalid which allows you to only visualise the days that are considered valid by GGIR part 5.

The work has been sponsored by the Program in Sleep Medicine Epidemiology at Brigham and Women’s Hospital, Boston, United States 

New guidance on how to use count cut-points in GGIR

The vignette on cut-points has been expanded with guidance on how to work with count metrics and corresponding cut-points.

New options to account for study protocol in the masking of data

Argument ‘strategy=3’, that already existed before, looks for the most active continuous window of X days. This window can start at any time in a day. We have now added the option ‘strategy=5’. Strategy 5 facilitates selecting the most active X calendar days. Further, strategy 3 and 5 can now be used in combination with arguments hrs.del.start and hrs.del.end which then ignores the specified number of hours at the start and/or end of the derived window, respectively.

The first half of the work has been made possible with support from Nina Vansweevelt from KU Leuven. The second half of the work has been made possible with support from Nancy W Glynn from University of Pittsburgh.

Planned GGIR documentation enhancements

We are planning to partially migrate and revise the GGIR documentation to a new location https://wadpac.github.io/GGIR/. At the moment this is only an empty framework and the exact URL may change. Advantages of this effort should be:

  • Opportunity to revisit the documentation structure and readability, of the explanations currently given in the CRAN vignettes.
  • Easier for the user to navigate.
  • Easier for us to maintain.

The existing CRAN vignettes will not disappear, but parts of it will eventually migrate to this new documentation framework.

To develop this free resource for the community we are looking for volunteers to help us with:
  • Review and provide feedback on current draft table of contents.
  • Review and provide feedback on chapter drafts.
  • Create of visualisations to support the text.
  • Find funding to hire editor and/or graphics artists to help optimise the documentation.
What we expect from you:
  • Experience with GGIR and the functionalities you will help us to document.
Advantages for you:
  • By interacting with us you will test and possibly expand your own GGIR knowledge.
  • Opportunity to learn R markdown or advance your existing R markdown skills.
  • You will be listed as documentation contributor.
Please get in touch if you are interested to help!

10th Anniversary of GGIR!

Today marks the 10th anniversary of GGIR as R package on CRAN. A lot of time and effort has gone into GGIR over the years, and it is still being used! Therefore, this is a moment to celebrate and to reflect on how we got here.

The road to GGIR’s first release

GGIR did not start out of nothing in 2013. In some way it started back in 2003 when I was a Bachelor student in Human Kinetic Technology. At the end of that year, I gained my first experience with raw data accelerometry via my internship and later job at the wearable technology company McRoberts B.V. in The Hague. It was an inspiring time where I learnt a lot about accelerometers and their data. Experiences that would proof valuable for the work on GGIR.

In 2008 I moved to the UK to do my PhD within the MRC Epidemiology Unit at the University of Cambridge. My PhD was themed around the question: How to process the raw data from wrist-worn accelerometers for large scale population research? At that time the ambition of all involved was twofold:

  • To convince UK Biobank, a large-scale biomedical database and research resource in the United Kingdom, that had recently started to also ask a subsample of the 500k study volunteers to wear an accelerometer on their wrist that stores raw data. By ‘raw’ we mean minimally pre-processed. The challenge in making this argument was that both the attachment at the wrist and the collection of raw data in population research were unprecedented.
  • To develop algorithms needed for making sense of this new data type.

In the following years, we (I, my supervisor Søren Brage, and collaborators) established that movement indicators derived from the wrist accelerometer data are acceptably correlated with daily energy expenditure (doubly labelled water study), and had promising feasibility. Further, we explored how to best summarise these data into a single indicator of human body acceleration, the so called acceleration metrics. Additionally, we did four other studies which I will skip here as they are less directly related to GGIR.

All seemed on track in terms of algorithm development, but as a result of the financial crisis the original accelerometer manufacturer pulled out and a new manufacturer had to be found. A UK Biobank accelerometry working group was formed to come up with a solution and consisted of Mike Catt, Søren Brage, Marcelo Pias, Salman Taherian and myself. As part of this group I helped draft the requirements on the accelerometer. Patrick Olivier’s team in Newcastle later developed the Axivity (AX3) accelerometer that was used by UK Biobank.

However, at the time the original manufacturer pulled out, it complicated things for me as I was planning for the analysis of data that may never be widely collected. Luckily by the end of my PhD, alternative datasets emerged. The Pelotas birth cohorts in Brazil and the Whitehall study II in the UK had collected wrist-worn GENEActiv accelerometer (raw) data with a summed sample size of 13000. These groups welcomed me to lead on the development of a data processing pipeline for this novel raw data type. So, in 2012 I started putting together pieces of code from my earlier projects to form the pipeline, which at the time consisted of just two (long) R scripts written by me and a dependency on ActivInsights Ltd’s R package GENEAread that had been written by Joss Langford and Zhou Fang for reading the binary data files.

At that time nobody in my environment was familiar with best practises around Open Science, so the story could have ended there with a written description of my code in a paper and some R scripts on a university website. However, luckily, I met Cassim Ladha, a movement sensor researcher at the Open Movement Lab within Newcastle University (UK) at the time, who encouraged me to turn my R code into an Open Source R package. I did it and R package GGIR was born on the 8th of August 2013. In case you are interested, the name ‘GGIR’ was not inspired by the double ‘g’ in the name of the ggplot2 package as explained here.

GGIR in the beginning

GGIR was the first open-source licensed pipeline for processing multi-day raw data from wearable accelerometers. The experience with the two larger datasets as mentioned above as well as numerous smaller datasets in the Newcastle area helped to identify challenging use cases, which in turn fuelled GGIR’s development. Additionally, Séverine Sabia from the Whitehall study II provided tremendous value with her feedback on various aspects of the software. In the following years GGIR would act as a blueprint for a number of other software projects. For example, most of GGIR’s algorithms and logic were used by a second UK Biobank accelerometry working group for their Java/Python pipeline to process the UK Biobank accelerometer data. Meanwhile, I worked in Newcastle on a first sleep detection algorithm specifically tailored to the strengths of modern raw data collection, which would further boost the value of GGIR in the years to come.

Leaving the traditional academic career path and trying to keep GGIR alive

During those early years of GGIR (2013-2015) I became aware that if I wanted to focus on GGIR as a generic research tool to support the research community, a traditional academic career path would not be intuitive: In my perception the fields of physical activity and sleep research celebrated closed over open source software and societal impact, e.g. clinical technology or health guidelines, over efforts aimed at enhancing research methods such as research software. Also it did not feel logical to me that I was expected to go through the time consuming effort of asking for funding for my time to help other researchers via GGIR. It would seem much more intuitive if those who benefitted from my efforts asked the funders for payment for my time, because that would be the ultimate demonstration of the value of my work. Similarly, I considered it more valuable to see researchers publish work based on GGIR without my co-authorship as proof that I developed something they could use without my support. Instead the academic reward system seemed to encourage an attitude of making other researchers depend on my support and co-authorship.

So, I quit my job and started as Research Software Engineer at the Netherlands eScience Center. The eScience Center is a non-profit organisation that aims to advance science by sharing expertise around advanced digital technologies. Being surrounded by many talented engineers, the job boosted my software engineering and data science skills. Additionally, the work in this new place taught me a lot about how to make open software sustainable (i.e., software that keeps working with minimal maintenance effort), software licensing, and how to manage research software projects.

At this point in time the future of GGIR was a big question mark as my time was now fully committed to the clients of the eScience Center who worked in other academic fields with no link to accelerometry. If I wanted to work on GGIR I had to do it unpaid in my spare time, which was not realistic because:

  • GGIR is a generic pipeline for a variety of data formats and research scenarios, each of which requires user support and ongoing code maintenance.
  • There was still a lot of work to be done on the functionality. I had focussed for many years on the algorithms, but putting those algorithms effectively together in a user-friendly package is a different story.
  • The typical GGIR users were not skilled in programming, by which every bug fix or code enhancement had to be done by someone with an understanding of and ability to change the code.

Therefore, the most logical next step was to look for ways to get GGIR stakeholders to pay my employer in exchange for my time during office hours. This worked out well and allowed me to attract several projects. One of these projects was with the Diabetes Genetics group at the University of Exerter (UK). The task was to develop an algorithm for detecting the sleep period time windows in the UK Biobank accelerometer dataset without traditional access to sleep diary. This project was very successful and led to several prominent publications including the first ever genome-wide association study on a broad set of device-based estimates of sleep. A major strength of our project was also that we published the methodological work separately from the applied work, by which each was scrutinized by peer-review by experts in the respective fields.

Projects like the project with the University of Exeter were valuable to keep GGIR moving forward but did not offer the flexibility needed to sustain it long term. Also, doing this while employed meant that I had to get approval for every new project from the Center’s managers as the new project had to align with the interests and time budgets of the Center.

Transition to a sustainable framework

The year 2017 was a year of transformation. All the preceding years I had worked alone on the GGIR code. Every software engineer knows that working as a one-person team on code is a major risk. Therefore, I was very happy that Jairo H. Migueles reached out to me in 2017 to help maintain and develop GGIR. Over the years Jairo has been a much valued contributor. In the years that followed also others would make contributions via GitHub (e.g. Matthew Patterson, Taren Sanders, and Evgeny Mirkes).

The second significant thing that happened that year, started at the social part of an expert group meeting in Santa Cruz, California, organised and sponsored by the Bill and Melinda Gates foundation. The event was aimed at discussing the global eradication of Malaria. My partner was actually invited and I was only there because I needed a holiday. While mingling with the attendants I noticed that several attendants at the meeting had no regular jobs but were self-employed consultants. This made me realise that this could be a way to sustain GGIR. As soon as I arrived home, I asked my employer (still the eScience Center) whether they would allow me to change my contract from full time to part-time in order to explore this idea. They supported me and it went so well that in 2019, I made the full transition to freelance work.

Sustaining GGIR via freelance work

In the following years GGIR improved a lot. The list of enhancements is long and you can find specific blog posts on those in my updates. Some of the income I earnt I invested back in the quality of GGIR:

  • I revised how the input parameters are internally managed and checked, which has become a lot more tidy.
  • Jairo and I improved the GGIR training course that I had been given and we turned it into a full training service.
  • I involved Patrick Bos, a former colleague at the eScience Center and freelance data scientist, to advance the file processing functionalities: (1) We enhanced R package read.gt3x to read ActiGraph gt3x files in blocks to ease computational memory management when loaded by GGIR; (2) we wrote a new function to read in binary files from GENEActiv twice as fast as before, and; (3) the speed of reading Axivity .cwa files has by now improved eight-fold. While doing all this I moved the key file reading functions from GGIR to a new R package named GGIRread. Recently also Lena Kushleyeva has helped to make some further improvements on this and those will be part of the upcoming release.

How you can support GGIR

My transition to freelance work has been effective but is not the magical solution to all challenges. For example, I cannot work more than full time and even though I can count on Jairo and Patrick to occasionally help me, this would not be enough to act as 24/7 service to an entire research community. There are a couple of things you can do to help:

  1. Report the issues you encounter and provide a detailed descriptions, preferably with a reproducible example. For this we have a google group. If you are more familiar with the R code and GitHub you can report your issue in the issue tracker instead. We may not have time to address the issue right away but publicly sharing knowledge about the issue is essential for an Open Source community to work. Unfortunately, I have encountered groups who dedicated an entire year of work on secretely developing their own custom enhancement to GGIR without telling me. By the time they had finished their work they discovered that in the meantime I had made the same enhancement, but more generically applicable, inside GGIR.
  2. Even if you do not encounter issues yourself, register with the GGIR google group and try to help other GGIR users with their questions or problems. Helping them to clearly formulate their question or confirming that the issue is real can already of value to the community.
  3. If you require enhancements to GGIR, consider hiring me, or someone else, to help out. Also, you may find it practical to reserve some budget on your grant applications to pay for your local staff to help test and improve GGIR.
  4. For those of you who do methodological studies to evaluate a GGIR functionality: Remember to acknowledge that you are most likely not evaluating GGIR as a whole but a specific GGIR use-case for a specific GGIR release version, in a specific population, with a specific accelerometer brand, and specific evaluation criteria. The package itself has many use-cases and we constantly try to improve GGIR via new releases. Secondly, it is not helpful for software- or algorithm developers in general to tell them that one method is different from the other. The only way to help us is by trying to understand why methods differ: When and where do differences occur? Does it go away if you configure GGIR differently? Is there anything else that may explain the difference? If you want to contribute to improving software- or/and algorithms then it is essential that you try to answer these questions. Quite often it turns out that the problem is not with GGIR but with the expectations users have about GGIR, keep this in mind: You are not only ‘validating’ a research method but also your own understanding of and expectations about the method. For example, if you believe that a wrist-worn sensor can capture leg movement then it is not the accelerometer or the software that will disappoint you but your own invalid understanding and expectations.
  5. Acknowledge the use of GGIR in your publications with a citation, not only in your first publication but in every publication that follows. This provides recognition for the value of GGIR.

Final word

It has been an amazing journey and I hope to be able to continue to help advance movement and sleep research via GGIR for many years to come! A big thank you to all code contributors, users who reported bugs and/or acknowledged the use of GGIR in their publications, those who paid for my time to work on GGIR, the countless study participants without whom there would be no data to process, the creators and maintainers of all the packages on which GGIR depends (Tuomo Nieminen, Junrui Di, John Muschelli, Dirk Eddelbuettel, Matt Dowle, and many others), the r-package-devel email list for being such a supportive community, and last but not least the CRAN team for building and maintaining the amazing R ecosystem!

Vincent

GGIR adopts Apache 2.0 licence

Since its inception in 2012/2013, R package GGIR has been available with an LGPL software license. The LGPL license has allowed for re-use of GGIR as a software library in both commercial and academic settings. The move to the Apache 2.0 license will make the GGIR source code even more reusable.

Why do we need an Open Source Software license?

An Open Source Software (OSS) license protects contributors and users as it clarifies the conditions under which software is shared. Without a license, there is no permission to use the software.

Why a standard Open Source Software license?

The use of standard OSS licenses is strongly encouraged by the Open Source community as it eases re-use of code. A full list of approved licenses can be found here. There are two main standard license categories:

  1. Permissive licenses like MIT, BSD and Apache allow for a high level of freedom to re-use code.
  2. Copyleft licenses like GPL and LGPL are more restrictive about combining code with external code. For example, you cannot re-use copyleft licensed code in a closed software or inside permissively licensed software.

Why enable re-use of parts of the code?

Open Source Software (OSS) requires an ongoing maintenance and user support effort. If we would develop similar software tools in every new project it will become impossible to maintain all those tools. Unfortunately, this is a common scenario in research projects where software is typically project specific, e.g. a PhD-studentship, and no longer maintained after the project ends. When not effectively re-used and maintained, this leads to a waste of time and funding as someone else will then have to develop similar functionality for the next project. Therefore, the possibility to combine and re-use existing software code is critical for the sustainability of OSS. A permissive license such as Apache 2.0 provides least obstacles to this.

Why not restrict the license to academic use only?

Some may argue that all code developed in or for academia should be restricted to academic use only, via a custom academic-use-only license.

To me an academic-use only license seems inefficient and unfair for use with GGIR:

  1. More users, means more people to identify, report and help fix problems. The more restrictive the user community the fewer opportunity there is to sustain the software as a community.
  2. A custom academic-use-only license would introduce an unlevel playing field in relation to permissively OSS Licensed software projects as the first one can benefit from the second but not the other way around.
  3. Many Open Source software tools are developed and maintained with the help of for-profit businesses. For example, my own PhD studentship in Cambridge (2008-2012) was co-funded by industry and resulted in various algorithms and insights that have benefited the research community via their implementation in the GGIR package. Further, my commercial freelance work allows me to maintain and keep improving GGIR.
  4. Industry has played an enormous role in funding and sustaining the transition to raw data accelerometry in research. It would seem unfair to allow academic groups to benefit from all that work but not to invest back in the very same OSS ecosystem co-sustained by industry.
  5. Industry also pays the taxes from which most of academic research is paid, by which it is unclear why industry should not be allowed to benefit from those tax payments. Open Source Research software are like the public roads: Paid collectively and used collectively.

Why this blog post?

The source code for several large software projects in the physical activity and sleep research community is not shared with a standard OSS license or not shared at all. As a result, GGIR’s license is not self-evident in this field of work. With this blog post I hope to create awareness about the value the GGIR license choice offers. This value already existed with the LGPL license but has become even larger with the Apache 2.0 license.

To change a software license, one needs consent from the intellectual property (IP) holders to the software. GGIR has benefited from numerous IP holders over the years. In recent months I emailed all major IP holders for their permission to change the license. The IP holders that replied where happy for me to implement the license update. So, I decided to go ahead with it. Nevertheless, by publishing this blog post I would like to raise attention to the new license and invite any IP holder with questions or concerns to contact me. The license change is effective as of GGIR release 2.9-4 (available on GitHub only), and will be part of the 2.10-0 CRAN release later this summer.

GGIR release 2.9-0

In this blog post we will talk you through the main updates in the new GGIR 2.9-0 release. For a full list of updates since the previous CRAN release (version 2.8-2), see the [GGIR changelog]( https://cran.r-project.org/web/packages/GGIR/news.html).

Dependency on R package GENEAread deprecated

In the previous GGIR update we introduced R package GGIRread as a faster way to read GENEActiv binary (.bin) files within GGIR. As this seems to be working well we deprecated the dependency on R package GENEAread. R package GENEAread has been of high value for the development and success of GGIR. We would like to thank the GENEAread developers Joss Langford, Charles Sweetland and Zhou Fang for creating and maintaining GENEAread all those years.

Metric “brondcounts” replaced by “neishabouricounts”

We deprecated the “brondcounts”. This count metric was originally developed by Jan Brønd and implemented in R package activityCounts by Jan Brondeel and colleagues. Therefore, we called it brondcounts in GGIR. However, we had to deprecate them in GGIR as R package activityCounts was removed from CRAN because of a series of issues. Instead GGIR now uses the counts proposed by Neishabouri which we refer to in GGIR as the neishabouricounts. GGIR relies on R package actilifecounts for the calculation of the neishabouricounts. To tell GGIR to extract these counts you should use argument do.neishabouricounts = TRUE. Further, argument acc.metric is used to specify which axis is of interest, e.g. acc.metric = “NeishabouriCount_x”. The low-frequency extension of the algorithm can be activated by Boolean argument actilife_LFE (see documentation on argument do.neishabouricounts for details).

Ad-hoc formatted csv input files: Now possible to account for tz of device configuration

Ad-hoc csv files can be read by GGIR via the generic function read.myacc.csv. This already allowed for accounting for the timezone where the accelerometer is worn. This function has been expanded so that the user can now define the timezone in which the accelerometer was configured (argument rmc.configtz). Note that this was already available for various other data formats, but not yet for ad-hoc csv data. Note that argument configtz (without the “rmc.”) is currently not used for this data format. Our plan for later this year is to deprecate rmc.configtz and rmc.desiredtz and make configtz and desiredtz be the only timezone arguments used across all data formats.

This work has been sponsored by a pharmaceutical company based in the United States.

Non-wear detection algorithm: Now possible to skip last step

The last step of the nonwear detection algorithm entails labelling the first or last three hours of the time series as invalid if they contains any non-wear episode. We have added the Boolean argument nonWearEdgeCorrection (default FALSE). When set to TRUE this step of the algorithm is skipped. Skipping the last step can important when working with short recordings, such as laboratory experiments.

This work has been sponsored by Dr. Beets and the Arnold Childhood Obesity Initiative Research Group at the University of South Carolina.

Bout detection algorithm

We identified and fixed a couple of minor bugs in the existing code for bout detection:

  • GGIR overestimated the bout length by 1 epoch if the epoch length was less than 30 seconds.
  • The combination of bout length of 1 minute and epoch length of 1 minute caused bouts of 1 minute to be missed.

While addressing these bugs we deprecated input argument bout.metric that was used to select older versions of the algorithm. We have done this because we believe that the continued re-use of the older versions of the algorithm should be discouraged. To reproduce analysis from the past we recommend using older versions of GGIR. Subsequently argument closedbout has also been deprecated as this was only used in one of the older algorithm versions.

Handling of seconds in activity diary

Day segment analyses based on an activity diary used to only consider the hours and minutes of the time specified. This has been updated to also consider the seconds in the timestamps. For example, your can now define an interval from 10:30:25 to 11:15:40. If the timestamps are not a multiple of the epoch length, then GGIR would use the following epoch. For example, if epoch is 5 seconds and user defines 10:30:27 in the diary, then 10:30:30 would be considered.

Improved handling of DST in part 2

Previously GGIR part 2 considered the day saving time (DST) when extracting the start time of the recording, when deriving the number of (valid) hours in a recording day, and when splitting up the recordings in time series per day. However, within the analysis per day, GGIR did not account for DST. We have now addressed this: The day-level analyses in part 2 now standardises the number of hours in a day to create a fair comparison between days:

  • When the day is 23 hours, we insert one hour of the average day at 2am
  • When a day is 25 hours, we remove the deboul hour between 2am and 3am

As a result, this may affect day-specific analysis of MXLX (e.g. M5, M10, L5, and L10), quantiles (argument qlevels), intensity distribution (argument ilevels), intensity gradient (argument iglevels), and time spent in MVPA for the two days in the year in timezones where DST applies.

This work was sponsored by the University of Regensburg in Germany.

Including last incomplete day in part 5 (follow-up)

In our previous CRAN release 2.8-2 we introduced argument expand_tail_max_hours. However, after additional reflection we realised that this was not intuitive. Therefore, we have now replaced it by argument recordingEndSleepHour being the time (in hours) at which the researcher assumes the participant to be asleep. If the recording ends at or after this time GGIR part 1 will expand the time series with synthetic sleep-like data in order to trigger sleep detection for last night. As previously discussed this functionality should be used with caution. An error is produced if the value is set below 19 (7pm) as such values are not permitted.

The work on including the last day has been sponsored by the University Medicine Greifswald, German Center für Cardiovascular Research (DZHK), site Greifswald, Germany.

Improved determination of file directories in Windows OS

GGIR is now able to handle double backward slash in file paths for the determination of directories and file paths, i.e., datadir, outputdir, qwindow, loglocation and data_cleaning_file arguments. Primarily useful if you work with interactive directory selection via utils::choose.dir or utils::choose.files on a Windows OS.

Handling subfolders and skipped recordings

GGIR did have difficulty keeping track of filename-ID relation when recordings are skipped in parts of GGIR and when the original accelerometer data was stored across multiple subfolders of the data directory (argument datadir). Similarly, the code for the visual report had difficulty matching the right filename to the report when files were skipped in earlier steps of GGIR. These issues have now been addressed.

The work has been sponsored by the Program in Sleep Medicine Epidemiology at Brigham and Women’s Hospital, Boston, United States 

csv-reports

  • GGIR now skips to create csv- and visual-report if the required milestone data is not available.
  • When GGIR part 5 reports exclude first and last night (based on argument excludefirstlast.part5) the code now considers the window type selected (MM or WW).
  • The output variable nonwear_perc_spt does now only appear in csv-reports part4_nightsummary and part4_summary when WW windows are used in GGIR part 5.  We do this to avoid inconsistencies in the definition of the night numbers and this is now also explained in the main vignette.

Other brief updates

  • Verbose argument: User can now specify verbose = FALSE to simplify the messages printed to the console to warnings and errors only. (UPDATE: we have noticed that this currently fails to suppress some of the messages when do.parallel is set to FALSE. We aim to address this before the next release).
  • Speed increase in GGIR part 5: GGIR part 5 speed has been increased for long multi-week recordings.
  • Documentation MX metrics by Rowlands et al.: New section added in the main vignette regarding the MX-metrics. We would like to thank Alex Rowlands and Ben Maylor for providing this!
  • Sleep diary handling: A bug was fixed causing the last night in the sleeplog to be ignored in the advanced sleeplog. Further, argument sleeplogidnum has been deprecated as GGIR should now automatically recognise whether the ID in a sleeplog is in numeric or character format.

 

 

 

GGIR release 2.8-0

In this blog post I will talk you through the main updates in the new GGIR 2.8-0 release. For a full list of updates since the previous CRAN release (version 2.7-1), see the GGIR changelog.

Time gap handling in ActiGraph .gt3x data format and ad-hoc csv-file formats

As discussed in previous GGIR update ActiGraph .gt3x data can come with time gaps as a result of the idle sleep mode mechanism. The handling of these time gaps has now been improved. In addition to various bug fixes, time gaps lasting more than 90 minutes are now dealt with at epoch (metric) level rather than at raw data level to improve memory management.

A special thanks to Dr. Jairo Hidalgo Migueles for this help with getting this work.

Speed increase GENEActiv .bin data

A new function has been written to read GENEActiv binary data approximately two times faster than before. This function has been included in a new R package named GGIRread, which is called from within GGIR. To facilitate direct comparison between new and old functionality I added the GGIR argument loadGENEActiv, which when set to “GENEAread” will use the R package GENEAread, and when set to “GGIRread” (default) will use GGIRread. Eventually this option will be deprecated, so if you have GENEActiv data then please help monitor that both options provide consistent output.

This update has been made possible with help form Dr. Patrick Bos, freelance data scientist.

Speed increase in estimation of orientation angles

The calculation of the rolling median on the raw time series has been one of the most time intensive steps of GGIR part 1. I have now modified this calculation to only extract the rolling median from a 10 Hertz version of the signal. My justification for this change is that angle information is only reliable during non-movement conditions with movement frequencies near zero, by which frequencies above 5 Hertz are unlikely to make a meaningful difference anyway.

Improved facilitation of multi-timezone studies

For the data types GENEActiv .bin and ActiGraph .gt3x, GGIR now also facilitates specifying the timezone were the accelerometer was configured in addition to where it was worn. Note that this was already possible for Axivity .cwa before. We have not looked yet into facilitating this for other brands or data formats. Please contact us if you think this would be of value and/or if you see opportunities to fund such a development.

Vignettes

  1. The main GGIR package vignette has had a section with guidance on the use of physical activity intensity cut-points. This section has now been expanded and migrated to a new vignette including look up tables, see Published cut-points and how to use them in GGIR.
  2. The documentation for all GGIR input arguments has been expanded with default values as centrally stored in the load_params function. Further, the arguments are now also displayed in a new vignette page named GGIR configuration parameters (html) in addition to the default GGIR package reference manual (pdf). Rather than copy-pasting the documentation, GGIR automatically generates the documentation to ensure it is consistent between html and pdf.

This update has been made possible thanks to a lot of hard work form Dr. Jairo Hidalgo Migueles.

Cole-Kripke algorithm

The Cole-Kripke sleep detection has been implemented and vignette text on sleep analysis has been expanded accordingly.

This new functionality has been developed in collaboration with Prof. Dr. Uher’s group at the Department of Psychiatry, Dalhousie University, Halifax, Canada

GGIR part 2 pdf

The pdf visualization in the results/QC output folder has been part of GGIR since the early days but has undergone relatively little updates over the years. The visualization has now been enhanced with a legend and improvements to the layout.

Improved facilitation of longitudinal studies on sleep

In the sleep assessment (part 4), person level aggregation now takes place per filename instead of per recording ID, which should prevent that identical IDs in repeated recordings in the same study get treated as a single recording. Also ID matching between accelerometer file name and sleep log ID column has been improved by ignore space characters. Here, the assumption is that a space is never a defining character in a participant ID.

GGIR part 5 can now include last incomplete day (use with caution!)

You may have noticed that GGIR does not include the last recording day in the part5 output if the recording ends before midnight. This is done because lack of clarity about sleep onset makes it impossible to say anything meaningful about the balance between waking and sleeping hours.

However, there is one scenario where you may argue that this is too stringent: Studies based in populations assumed to fall asleep before or shortly after the end of the recording, such as maybe children. To facilitate this specific scenario it is now possible to tell GGIR to add artificial data to the end of the recording in order to trigger a dummy sleep analysis for the last non-existing night of data and by that generate part 5 output on time-use. You can do this with argument expand_tail_max_hours. For example, setting expand_tail_max_hours = 3 will cause the last day of recordings that end after 9pm to be included if the rest of the day meets the non-wear criteria. Note that the synthetic sleep estimates for the last day are skipped in the part 4 and part 5 reports. All GGIR relies on is the users’ belief that the participant fell asleep before or shortly after the end of the recording.

Needless to say that this argument should be used with high caution as it can easily bias your results.

The addition of this feature was sponsored by University Medicine Greifswald, German Center für Cardiovascular Research (DZHK), site Greifswald, Germany.

Movisens

GGIR now also extracts recording ID from Movisens data, and can handle Movisens temperature values expressed in Fahrenheit.

Activity diary

  • The user is now warned when the date format is inconsistent across the diary.
  • Missing values in activity diary are now skipped by which non-neighboring cells with valid data can form new day segments. So, for example it is now possible to segment the day for one person in three and for another person in five.
  • Activity diary used to treat empty cells as missing values, but now also does that for cells with only a dot in them.

This work has been sponsored by Dr. Beets and the Arnold Childhood Obesity Initiative Research Group at the University of South Carolina.

OSx arm64

Fixed timezone sensitivity for OSx in Pacific time zone, which prohibited the CRAN release 2.7-0 and 2.7-1 for this architecture.

This bug was fixed with help from Dr. Taren Sanders based at the Australian Catholic University

Version out-of-date notification

When loading GGIR library(GGIR) you will receive an R console notification if your GGIR version is behind the CRAN version. In that way I hope to encourage users to use the latest version whenever possible. I understand that this is not always desirable, which is why it is a notification only that you are free to ignore.

 

 

GGIR release 2.7-1

In this blog post I will talk you through the main updates in the new GGIR 2.7-1 release. For a full list of updates since the previous CRAN release (version 2.6-0), see the GGIR changelog.

File handling

  • Functionality for reading GENEA (Unilever Discover) binary files has been temporarily deprecated because one of it’s software dependencies has been temporarily offline. This has recently been resolved and we will restore the GENEA reading functionality in the next GGIR release on CRAN (probably in August/September). This does not affect GENEActiv (ActivInsight) data because that data is stored in a different binary data format.
  • Functionality for reading GENEActiv .csv files has been deprecated. You can still use GGIR’s read.myacc.csv functionality for reading the GENEActiv .csv files, and GGIR default settings for reading GENEActiv .bin files.
  • Functionality to read ActiGraph CentrePoint .GT3X files added.
  • Fixed bug that prohibited processing files in subfolders of the data directory. As a result, you should now again be able to process multiple sub-studies with a single GGIR command using only the parent folder to specify argument datadir.
  • Fixed bug relating to how the date format of an activity diary is interpreted.
  • A new vignette was added describing how to use GGIR function read.myacc.csv to process your ad-hoc .csv format data.
  • All GGIR input arguments are now documented as part of the GGIR function documentation. With ?GGIR on the R(Studio) command line and use the search field or scroll through the documentation.

Function g.shell.GGIR() simplified to GGIR()

Given that g.shell.GGIR() was difficult to pronounce we simplified it to just GGIR() as now clarified in the vignette. The underlying code is identical. Note that function g.shell.GGIR should still work, but please let us know if you have difficulties.

Cosinor analysis

Cosinor analysis entails the fitting of a cosine curve to the accelerometer data to estimate basic characteristics of the circadian rhythm. To enable this with GGIR, a dependency has been added to R package ActCR as developed by Dr. Junrui Di and colleagues. By specifying GGIR argument cosinor = TRUE, GGIR will now conduct both cosinor and extended cosinor analysis (as proposed by Marler et al. 2006) as part of the GGIR part 2. For specific implementation details see our new vignette paragraph on Circadian Rhythm analysis with GGIR.

The addition of the cosinor and extended cosinor analysis has been sponsored by Dr. Séverine Sabia (ANR grant), Université de Paris, Inserm in Paris. Also we would like to thank Dr. Junrui Di for making a new ActCR release to facilitate the integration with GGIR.

Advanced sleeplog

The advanced sleeplog was introduced in 2021 and allows for more information than the basic sleep log. Various updates and bug fixes were made in relation to the advanced sleeplog handling:

  • Better handling of dates in the sleeplog far beyond the accelerometer recording dates.
  • Unnecessary warnings omitted to make console output more informative.
  • Bug fixed when start of the accelerometer recording does not match the start of the advanced sleeplog.
  • Bug fixed when first night is missing in the log.

The improvements to the advanced sleeplog handling have been sponsored by Dr. Beets and the Arnold Childhood Obesity Initiative Research Group at the University of South Carolina.

Output files

  • The pdf QC visualisation that has been part of GGIR for years now automatically scales the horizontal axis to the length of the recording. Previously the user had to specify the axis length with argument maxdur. Note that argument maxdur is still used for specifying the maximum number of days to consider as valid days, but no longer affects the axis length of the QC plot.
  • The output column names related to segmented day analysis are now logically ordered. Previously it ordered the columns alphabetically, as in 1, 10, 11, 2, 20 which has become 1, 2, 10, 11, 20.
  • GGIR .csv reports are now saved with a maximum of 3 decimal places for numeric variables. We do this to improve readability of the csv files, to reduce file size, and to avoid the false impression of super accuracy.

Macos 11 with arm64 architecture not updated yet

We are having difficulties getting the GGIR version 2.7-1 approved for MacOS 11 on arm64 architecture . If this affects you, then you can install the latest GGIR release via GitHub with R package devtools (or the remotes package) instead if you are not using sleeplogs or you could stick with version 2.6-0 as available on CRAN. If you have access to such a machine and are in interested in helping us fix the issue then please let us know via this thread on GitHub.

 

GGIR release 2.6-0

In this blog post I will talk you through the main updates in the new GGIR 2.6-0 release. For a full list of updates since the previous CRAN release (version 2.5-0), see the GGIR changelog.

GGIR now able to read .gt3x files via R package read.gt3x

GGIR now also accepts as input .gt3x files from the accelerometer brand ActiGraph with device firmware version 2.5.0 or higher. This will take away the need to convert .gt3x files to .csv format with ActiLife software and speeds up the data processing of ActiGraph data. To create this new functionality, I added a GGIR dependency to the R package read.gt3x as developed by Dr. Tuomo Nieminen and co-maintained by John Muschelli. Next, Dr. Patrick Bos helped me to modify the read.gt3x code to read the .gt3x files in batches needed to facilitate memory management in GGIR. Memory management is critical when processing multiple files in parallel as GGIR can do.

The addition of gt3x compatibility has been sponsored by the University Medicine Greifswald, German Center für Cardiovascular Research (DZHK), site Greifswald, Germany.

Handling ActiGraph data collected with Idle Sleep Mode

Accelerometers from the ActiGraph brand come with a configuration option called Idle Sleep Mode (ISM). ISM is turned on by default and causes the recording of acceleration to stop when the sensor is not moved for 10 seconds. The idea behind ISM is to preserve battery life and by that allow for longer recordings, but it also results in time gaps in the data. To address these time gaps the Actilife software imputes them with zeros in all three axis when converting the .gt3x file to .csv. Both the ISM and the zero-imputation are problematic.

Why ISM and zero-imputation are problematic:
  • Time gaps caused by ISM make data harder to compare with other accelerometer brands who do not have the ISM functionality. Note that there is no open-source software to reproduce the ISM behaviour across accelerometer brands.
  • The imputation by zeros is physically not plausible. An accelerometer should always capture the gravitational acceleration when in rest. Various acceleration metrics, such as SVMgs and ENMO, come with the assumption that they are applied to plausible acceleration signals. The imputation by zeros violates this assumption and by that will lead to biased estimates.
  • The disappearance and reappearance of gravitational acceleration causes a spike at the beginning and the end of the non-movement period when the signal is high-pass frequency filtered. For example, this applies to acceleration metrics such as MAD, BFEN, HFEN, AI0, or MIMSunits.
  • The lack of non-movement periods will complicate investigating and improving the acceleration sensor calibration, which depends on access to data from non-movement periods. Further, the imputation of the time gaps with non-plausible zeros will bias efforts to improve the calibration of the signals if not omitted.
  • Time gaps and the imputation by zeros will complicate estimating sleep and sedentary behaviour because small magnitudes of acceleration are replaced by zeros and information about the orientation of the accelerometer is lost.

To address the problematic imputation by zeros, GGIR now automatically re-imputes time gaps lasting longer than 2 samples. For this, GGIR uses the last XYZ acceleration value from before the time gap and normalizes it to a vector of 1 g. In other words, GGIR assumes that the orientation of the accelerometer during the non-movement period is identical to the last known recorded point in time. However, ideally the ISM should not be used in the first place and I advise all researchers who use ActiGraph devices to always disable the ISM when starting a new recording and to report clearly in publications whether the ISM was turned on or off.

Methodological consistency:

The new imputation as described above can break methodological consistency with ActiGraph data analysis with the previous approach to zero impute the data. However, the discontinuation is essential as the old approach is not justifiable.

This new functionality was sponsored by the University Medicine Greifswald, German Center für Cardiovascular Research (DZHK), site Greifswald, Germany.

Maximum number of days to consider

GGIR argument maxdur has been part of GGIR for a long time and aids the user in masking all data recorded after an integer number of 24 hour windows relative to the start of a recording. However, some studies may prefer this to be focused around calendar days rather than 24 hour windows. Therefore, GGIR now offers argument max_calendar_days, which is the number of calendar days relative to the beginning of the recording to be included in the analysis.

This new functionality was sponsored by the University of Regensburg in Germany.

BrondCounts added

As announced in my previous blog post about the 2.5-0 release, GGIR 2.6-0 has been expanded with the option to estimate sleep via the Sadeh algorithm and the counts calculated with the R package activityCounts. R package activityCounts was developed by Dr. Brondeel and colleagues. Please note that I refer to the counts inside GGIR as BrondCounts since ‘activity counts’ feels too generic and may lead to confusion when put next to other count metrics. The BrondCounts are calculated in GGIR by setting input argument do.brondcounts = TRUE. Next, they will be available in the time series output and used for sleep analysis when argument HASIB.algo = “Sadeh1994″.

Call for validation studies

Please note that the implementation of both the BrondCounts and the zero-crossing counts are only an attempt to imitate the count measures produced by the Actigraph Motionlogger and the ActiGraph as I discussed in my previous blog post. There is no certainty that the calculations are identical to the original, that the calculations have remained consistent over the past 30 years (almost certainly not), or that this approach provides any level of accuracy for estimating sleep. Therefore, it would be valuable if future studies can be carried out to evaluate the performance of these implementations for sleep assessment.

The implementation of the BrondCounts in GGIR has been sponsored by groups from Paris and South Carolina (see previous blog post for details).

Internal structure of parameters

The functionality of GGIR has grown over the years and with that the number of input arguments (parameters). As a result, the code readability got worse and with that the ease of making updates to the code. To address this I refactored the GGIR code such that arguments are internally grouped in parameter objects. The parameter objects are structured thematically, e.g. all parameters related to sleep are stored in the object “params_sleep”. As a user you will be able to continue using the same R scripts and input arguments for GGIR as before. The parameter objects are only used by GGIR internally. However, you may notice some changes in the structure of the package documentation.

GGIR release 2.5-0

In this blog post I will talk you through the main updates to the new GGIR 2.5-0 release. For a full list of updates since the previous CRAN release (version 2.4-0), see the GGIR changelog.

(NEW) Tutorial on segmented day analysis

Day segment specific analysis are valuable when studying context specific physical activity. A new tutorial vignette has been added on how to perform an analysis per segment of the day. The tutorial discusses how to segment the day based on clock hours of the day, but also how to use an activity log to tailor the segmentation per participant.

The creation of this tutorial has been sponsored by Dr. Beets and the Arnold Childhood Obesity Initiative Research Group at the University of South Carolina.

(NEW) Sleep Regularity Index

Another addition in GGIR 2.5-0 is the Sleep Regularity Index (SRI). This is a measure of sleep regularity between successive days, as first described by Phillips and colleagues. The SRI can have a value between -100 and 100, where 100 reflects perfect regularity (identical days), 0 reflects random pattern, and -100 reflects perfect reversed regularity. The SRI is proposed to only be calculated based on seven, or a multitude of seven, consecutive days of data without missing values. This to avoid a possible role of imbalanced data to the final estimate. However, this renders many datasets unsuitable for analysis and leads to a painful loss in sample size and statistical power.

Sleep Regularity Index – Dealing with unbalanced data

To address this, I implemented the SRI in GGIR per day-pair. Per day-pair GGIR now stores the SRI value and the fraction of the 30 second epoch-pairs between both days that are valid. This fraction can be found in the output under the variable name SriFractionValid. By default, day-pairs are excluded if this fraction is below 0.66. For those familiar with GGIR this threshold is coupled with the 16-hour default value for argument “includenightcrit”. For example, if you set argument “includenightcrit = 12”, the fraction threshold will be: 12 / 24 = 0.5. Note that I have implemented the SRI calculation such that it accounts for the missing values in the denominator. As a result, the SRI value interpretation remains unchanged.

The 30 second epoch setting is automatically applied, even if the rest of the GGIR process works with a different epoch duration.

The day-pair level estimates are stored as variable SleepRegularityIndex in the .csv-report on sleep (GGIR part4 for those familiar with GGIR). Further, GGIR also stores the person-level aggregates such as: the plain average over all valid days, the average of all valid weekend days, and the average of all valid week days. No GGIR input arguments are needed to invoke the SRI calculation. The calculation is automatically performed after updating GGIR and processing your data.

Sleep Regularity Index – Benefits of the revised approach
  • It enables the user to study the day-pair to day-pair variation in SRI, and the role of day-pair inclusion criteria.
  • The access to SRI at day-pair level makes it possible to account for an imbalanced datasets via multi-level regression analysis applied to the output of GGIR, with day-pair as one of the model levels.

The implementation of the Sleep Regularity Index has been sponsored by Dr. Beets and the Arnold Childhood Obesity Initiative Research Group at the University of South Carolina.

(REMOVED) Sensor fusion functionality

In GGIR version 2.2-2 (early 2021), a new functionality was added to facilitate the fusion of accelerometer and gyroscope signals at raw data level. This work was done as an expertise development project, but it has become clear that there is no major added value of this functionality for GGIR. Please note that the code is still in the GitHub and CRAN history, such that it can be revived if ever needed.

(NEW) Advanced sleeplog format now facilitated

For a long time, GGIR only facilitated one specific type of sleeplog format. The format assumed that the first night in the accelerometer recording corresponds with the first night in the sleeplog. As you can imagine, this complicates the analyses of accelerometer recordings that are not aligned with sleeplog entries.

In the newly added advanced sleep log format, recording dates are stored per night and used by GGIR to ensure correct alignment of each single night in the sleeplog with corresponding night in the accelerometer data. Additionally, the advanced sleeplog format allows for optional storing of information such as non-wear and napping behaviour. At the moment, GGIR extracts and stores these additional types of information in the GGIR milestone data. Integration in the GGIR reports is work in progress. See GGIR vignette paragraph on sleep analysis for further details.

The addition of the advanced sleeplog format has been sponsored by Dr. Plancoulaine and Dr. Bernard from the Research Team on Early life origins of health (EAROH) at the Centre for Research in Epidemiology and Statistics, Université de Paris, Inserm in Paris.

(NEW) Sadeh and Galland sleep classification

An attempt has been made to implement the sleep classification algorithms proposed by Sadeh et al. 1994 and Galland et al. 2012. I am calling it an attempt, because both algorithms rely on summary measures of the data produced by proprietary wearable sensor technology, referred to as ‘counts’.

In short, Sadeh and colleagues detailed that their counts were generated by the Motionlogger actigraph by Ambulatory Monitoring Inc, and explained that the ‘counts’ represented a zero-crossing count of the acceleration signal. However, critical parts of the calculation to support scientific reproducibility were not published. The lack of transparency has been typical for many wearable sensor manufacturers from the late 20th century. Consequently, it has complicated comparisons of research findings in both the sleep and physical activity research communities.

Sadeh and Galland sleep classification – Brønd(eel)Counts to the rescue?

To facilitate backward comparability of modern accelerometer data with count-technology, Jan Brønd et al. proposed an algorithm to estimate the count values produced by one of the other manufacturers, and Ruben Brondeel released the algorithm as open source R package. Their counts, let’s call them the Brønd(eel)Counts, do not count zero-crossings but calculate the area under the filtered acceleration signal. Despite this fundamental difference in calculation, there are preliminary hints now that the Brønd(eel)Counts are possibly more suitable for use with the Sadeh algrotihm then the zero-crossing counts. If this would be true than that would indicate that the description given by Sadeh in 1994 did not reflect the actual proprietary calculation done by the Motionlogger actigraph at the time.

My plan is to embed the combination of Brønd(eel)Counts and Sadeh and Galland algorithm in the next GGIR release to facilitate a direct comparison. For details on how to use the current approach see vignette paragraph on sleep analysis arguments.

The implementation of the Sadeh and Galland algorithms has been sponsored by Dr. Plancoulaine and Dr. Bernard from the Research Team on Early life origins of health (EAROH) at the Centre for Research in Epidemiology and Statistics, Université de Paris, Inserm in Paris.

(NEW) sleepwindowType

Until now the GGIR-based sleep analysis came with the assumption that the either of the following conditions is true:

  • The user has not collected sleeplog data to guide the accelerometer-based sleep analysis.
  • The user collected sleeplog data, which asked for sleep onset and waking-up time (Sleep Period Time window).

The disadvantage of both scenarios is that sleep latency and sleep efficiency cannot be estimated. GGIR version 2.5-0 addresses this for the two scenarios described below. In both scenario 1 and 2, sleep latency and sleep efficiency will be estimated and included in the .csv-report on sleep (GGIR part 4).

Scenario 1: The sleeplog records time in bed

To inform GGIR about this scenario specify input argument: sleepwindowType = “TimeInBed”. Note that the default value is “SPT” (Sleep Period Time window).

Scenario 2: Accelerometer worn on the hip

In this scenario we can attempt to detect lying down based on estimating the accelerometer orientation. To use this functionality specify argument: sensor.location=”hip”.  For a more elaborate description see vignette paragraph on guiders.

Note: The sleepwindowType feature was accidentally disabled in the 2.5-0 release on CRAN. This has now been corrected in the development version of GGIR on GitHub. To install the updated development version use the following commands in the R(Studio) console:

install.packages(remotes)
library(remotes)
remotes::install_github("wadpac/GGIR", ref = "2.5-1")
library(GGIR)

The addition of the sleepwindowType option has been sponsored by Dr. Plancoulaine and Dr. Bernard from the Research Team on Early life origins of health (EAROH) at the Centre for Research in Epidemiology and Statistics, Université de Paris, Inserm in Paris.