News Archives - Accelting

Today marks the 10th anniversary of GGIR as R package on CRAN. A lot of time and effort has gone into GGIR over the years, and it is still being used! Therefore, this is a moment to celebrate and to reflect on how we got here.

The road to GGIR’s first release

GGIR did not start out of nothing in 2013. In some way it started back in 2003 when I was a Bachelor student in Human Kinetic Technology. At the end of that year, I gained my first experience with raw data accelerometry via my internship and later job at the wearable technology company McRoberts B.V. in The Hague. It was an inspiring time where I learnt a lot about accelerometers and their data. Experiences that would proof valuable for the work on GGIR.

In 2008 I moved to the UK to do my PhD within the MRC Epidemiology Unit at the University of Cambridge. My PhD was themed around the question: How to process the raw data from wrist-worn accelerometers for large scale population research? At that time the ambition of all involved was twofold:

To convince UK Biobank, a large-scale biomedical database and research resource in the United Kingdom, that had recently started to also ask a subsample of the 500k study volunteers to wear an accelerometer on their wrist that stores raw data. By ‘raw’ we mean minimally pre-processed. The challenge in making this argument was that both the attachment at the wrist and the collection of raw data in population research were unprecedented.
To develop algorithms needed for making sense of this new data type.

In the following years, we (I, my supervisor Søren Brage, and collaborators) established that movement indicators derived from the wrist accelerometer data are acceptably correlated with daily energy expenditure (doubly labelled water study), and had promising feasibility. Further, we explored how to best summarise these data into a single indicator of human body acceleration, the so called acceleration metrics. Additionally, we did four other studies which I will skip here as they are less directly related to GGIR.

All seemed on track in terms of algorithm development, but as a result of the financial crisis the original accelerometer manufacturer pulled out and a new manufacturer had to be found. A UK Biobank accelerometry working group was formed to come up with a solution and consisted of Mike Catt, Søren Brage, Marcelo Pias, Salman Taherian and myself. As part of this group I helped draft the requirements on the accelerometer. Patrick Olivier’s team in Newcastle later developed the Axivity (AX3) accelerometer that was used by UK Biobank.

However, at the time the original manufacturer pulled out, it complicated things for me as I was planning for the analysis of data that may never be widely collected. Luckily by the end of my PhD, alternative datasets emerged. The Pelotas birth cohorts in Brazil and the Whitehall study II in the UK had collected wrist-worn GENEActiv accelerometer (raw) data with a summed sample size of 13000. These groups welcomed me to lead on the development of a data processing pipeline for this novel raw data type. So, in 2012 I started putting together pieces of code from my earlier projects to form the pipeline, which at the time consisted of just two (long) R scripts written by me and a dependency on ActivInsights Ltd’s R package GENEAread that had been written by Joss Langford and Zhou Fang for reading the binary data files.

At that time nobody in my environment was familiar with best practises around Open Science, so the story could have ended there with a written description of my code in a paper and some R scripts on a university website. However, luckily, I met Cassim Ladha, a movement sensor researcher at the Open Movement Lab within Newcastle University (UK) at the time, who encouraged me to turn my R code into an Open Source R package. I did it and R package GGIR was born on the 8^th of August 2013. In case you are interested, the name ‘GGIR’ was not inspired by the double ‘g’ in the name of the ggplot2 package as explained here.

GGIR in the beginning

GGIR was the first open-source licensed pipeline for processing multi-day raw data from wearable accelerometers. The experience with the two larger datasets as mentioned above as well as numerous smaller datasets in the Newcastle area helped to identify challenging use cases, which in turn fuelled GGIR’s development. Additionally, Séverine Sabia from the Whitehall study II provided tremendous value with her feedback on various aspects of the software. In the following years GGIR would act as a blueprint for a number of other software projects. For example, most of GGIR’s algorithms and logic were used by a second UK Biobank accelerometry working group for their Java/Python pipeline to process the UK Biobank accelerometer data. Meanwhile, I worked in Newcastle on a first sleep detection algorithm specifically tailored to the strengths of modern raw data collection, which would further boost the value of GGIR in the years to come.

Leaving the traditional academic career path and trying to keep GGIR alive

During those early years of GGIR (2013-2015) I became aware that if I wanted to focus on GGIR as a generic research tool to support the research community, a traditional academic career path would not be intuitive: In my perception the fields of physical activity and sleep research celebrated closed over open source software and societal impact, e.g. clinical technology or health guidelines, over efforts aimed at enhancing research methods such as research software. Also it did not feel logical to me that I was expected to go through the time consuming effort of asking for funding for my time to help other researchers via GGIR. It would seem much more intuitive if those who benefitted from my efforts asked the funders for payment for my time, because that would be the ultimate demonstration of the value of my work. Similarly, I considered it more valuable to see researchers publish work based on GGIR without my co-authorship as proof that I developed something they could use without my support. Instead the academic reward system seemed to encourage an attitude of making other researchers depend on my support and co-authorship.

So, I quit my job and started as Research Software Engineer at the Netherlands eScience Center. The eScience Center is a non-profit organisation that aims to advance science by sharing expertise around advanced digital technologies. Being surrounded by many talented engineers, the job boosted my software engineering and data science skills. Additionally, the work in this new place taught me a lot about how to make open software sustainable (i.e., software that keeps working with minimal maintenance effort), software licensing, and how to manage research software projects.

At this point in time the future of GGIR was a big question mark as my time was now fully committed to the clients of the eScience Center who worked in other academic fields with no link to accelerometry. If I wanted to work on GGIR I had to do it unpaid in my spare time, which was not realistic because:

GGIR is a generic pipeline for a variety of data formats and research scenarios, each of which requires user support and ongoing code maintenance.
There was still a lot of work to be done on the functionality. I had focussed for many years on the algorithms, but putting those algorithms effectively together in a user-friendly package is a different story.
The typical GGIR users were not skilled in programming, by which every bug fix or code enhancement had to be done by someone with an understanding of and ability to change the code.

Therefore, the most logical next step was to look for ways to get GGIR stakeholders to pay my employer in exchange for my time during office hours. This worked out well and allowed me to attract several projects. One of these projects was with the Diabetes Genetics group at the University of Exerter (UK). The task was to develop an algorithm for detecting the sleep period time windows in the UK Biobank accelerometer dataset without traditional access to sleep diary. This project was very successful and led to several prominent publications including the first ever genome-wide association study on a broad set of device-based estimates of sleep. A major strength of our project was also that we published the methodological work separately from the applied work, by which each was scrutinized by peer-review by experts in the respective fields.

Projects like the project with the University of Exeter were valuable to keep GGIR moving forward but did not offer the flexibility needed to sustain it long term. Also, doing this while employed meant that I had to get approval for every new project from the Center’s managers as the new project had to align with the interests and time budgets of the Center.

Transition to a sustainable framework

The year 2017 was a year of transformation. All the preceding years I had worked alone on the GGIR code. Every software engineer knows that working as a one-person team on code is a major risk. Therefore, I was very happy that Jairo H. Migueles reached out to me in 2017 to help maintain and develop GGIR. Over the years Jairo has been a much valued contributor. In the years that followed also others would make contributions via GitHub (e.g. Matthew Patterson, Taren Sanders, and Evgeny Mirkes).

The second significant thing that happened that year, started at the social part of an expert group meeting in Santa Cruz, California, organised and sponsored by the Bill and Melinda Gates foundation. The event was aimed at discussing the global eradication of Malaria. My partner was actually invited and I was only there because I needed a holiday. While mingling with the attendants I noticed that several attendants at the meeting had no regular jobs but were self-employed consultants. This made me realise that this could be a way to sustain GGIR. As soon as I arrived home, I asked my employer (still the eScience Center) whether they would allow me to change my contract from full time to part-time in order to explore this idea. They supported me and it went so well that in 2019, I made the full transition to freelance work.

Sustaining GGIR via freelance work

In the following years GGIR improved a lot. The list of enhancements is long and you can find specific blog posts on those in my updates. Some of the income I earnt I invested back in the quality of GGIR:

I revised how the input parameters are internally managed and checked, which has become a lot more tidy.
Jairo and I improved the GGIR training course that I had been given and we turned it into a full training service.
I involved Patrick Bos, a former colleague at the eScience Center and freelance data scientist, to advance the file processing functionalities: (1) We enhanced R package read.gt3x to read ActiGraph gt3x files in blocks to ease computational memory management when loaded by GGIR; (2) we wrote a new function to read in binary files from GENEActiv twice as fast as before, and; (3) the speed of reading Axivity .cwa files has by now improved eight-fold. While doing all this I moved the key file reading functions from GGIR to a new R package named GGIRread. Recently also Lena Kushleyeva has helped to make some further improvements on this and those will be part of the upcoming release.

How you can support GGIR

My transition to freelance work has been effective but is not the magical solution to all challenges. For example, I cannot work more than full time and even though I can count on Jairo and Patrick to occasionally help me, this would not be enough to act as 24/7 service to an entire research community. There are a couple of things you can do to help:

Report the issues you encounter and provide a detailed descriptions, preferably with a reproducible example. For this we have a google group. If you are more familiar with the R code and GitHub you can report your issue in the issue tracker instead. We may not have time to address the issue right away but publicly sharing knowledge about the issue is essential for an Open Source community to work. Unfortunately, I have encountered groups who dedicated an entire year of work on secretely developing their own custom enhancement to GGIR without telling me. By the time they had finished their work they discovered that in the meantime I had made the same enhancement, but more generically applicable, inside GGIR.
Even if you do not encounter issues yourself, register with the GGIR google group and try to help other GGIR users with their questions or problems. Helping them to clearly formulate their question or confirming that the issue is real can already of value to the community.
If you require enhancements to GGIR, consider hiring me, or someone else, to help out. Also, you may find it practical to reserve some budget on your grant applications to pay for your local staff to help test and improve GGIR.
For those of you who do methodological studies to evaluate a GGIR functionality: Remember to acknowledge that you are most likely not evaluating GGIR as a whole but a specific GGIR use-case for a specific GGIR release version, in a specific population, with a specific accelerometer brand, and specific evaluation criteria. The package itself has many use-cases and we constantly try to improve GGIR via new releases. Secondly, it is not helpful for software- or algorithm developers in general to tell them that one method is different from the other. The only way to help us is by trying to understand why methods differ: When and where do differences occur? Does it go away if you configure GGIR differently? Is there anything else that may explain the difference? If you want to contribute to improving software- or/and algorithms then it is essential that you try to answer these questions. Quite often it turns out that the problem is not with GGIR but with the expectations users have about GGIR, keep this in mind: You are not only ‘validating’ a research method but also your own understanding of and expectations about the method. For example, if you believe that a wrist-worn sensor can capture leg movement then it is not the accelerometer or the software that will disappoint you but your own invalid understanding and expectations.
Acknowledge the use of GGIR in your publications with a citation, not only in your first publication but in every publication that follows. This provides recognition for the value of GGIR.

Final word

It has been an amazing journey and I hope to be able to continue to help advance movement and sleep research via GGIR for many years to come! A big thank you to all code contributors, users who reported bugs and/or acknowledged the use of GGIR in their publications, those who paid for my time to work on GGIR, the countless study participants without whom there would be no data to process, the creators and maintainers of all the packages on which GGIR depends (Tuomo Nieminen, Junrui Di, John Muschelli, Dirk Eddelbuettel, Matt Dowle, and many others), the r-package-devel email list for being such a supportive community, and last but not least the CRAN team for building and maintaining the amazing R ecosystem!

Vincent

Since its inception in 2012/2013, R package GGIR has been available with an LGPL software license. The LGPL license has allowed for re-use of GGIR as a software library in both commercial and academic settings. The move to the Apache 2.0 license will make the GGIR source code even more reusable.

Why do we need an Open Source Software license?

An Open Source Software (OSS) license protects contributors and users as it clarifies the conditions under which software is shared. Without a license, there is no permission to use the software.

Why a standard Open Source Software license?

The use of standard OSS licenses is strongly encouraged by the Open Source community as it eases re-use of code. A full list of approved licenses can be found here. There are two main standard license categories:

Permissive licenses like MIT, BSD and Apache allow for a high level of freedom to re-use code.
Copyleft licenses like GPL and LGPL are more restrictive about combining code with external code. For example, you cannot re-use copyleft licensed code in a closed software or inside permissively licensed software.

Why enable re-use of parts of the code?

Open Source Software (OSS) requires an ongoing maintenance and user support effort. If we would develop similar software tools in every new project it will become impossible to maintain all those tools. Unfortunately, this is a common scenario in research projects where software is typically project specific, e.g. a PhD-studentship, and no longer maintained after the project ends. When not effectively re-used and maintained, this leads to a waste of time and funding as someone else will then have to develop similar functionality for the next project. Therefore, the possibility to combine and re-use existing software code is critical for the sustainability of OSS. A permissive license such as Apache 2.0 provides least obstacles to this.

Why not restrict the license to academic use only?

Some may argue that all code developed in or for academia should be restricted to academic use only, via a custom academic-use-only license.

To me an academic-use only license seems inefficient and unfair for use with GGIR:

More users, means more people to identify, report and help fix problems. The more restrictive the user community the fewer opportunity there is to sustain the software as a community.
A custom academic-use-only license would introduce an unlevel playing field in relation to permissively OSS Licensed software projects as the first one can benefit from the second but not the other way around.
Many Open Source software tools are developed and maintained with the help of for-profit businesses. For example, my own PhD studentship in Cambridge (2008-2012) was co-funded by industry and resulted in various algorithms and insights that have benefited the research community via their implementation in the GGIR package. Further, my commercial freelance work allows me to maintain and keep improving GGIR.
Industry has played an enormous role in funding and sustaining the transition to raw data accelerometry in research. It would seem unfair to allow academic groups to benefit from all that work but not to invest back in the very same OSS ecosystem co-sustained by industry.
Industry also pays the taxes from which most of academic research is paid, by which it is unclear why industry should not be allowed to benefit from those tax payments. Open Source Research software are like the public roads: Paid collectively and used collectively.

Why this blog post?

The source code for several large software projects in the physical activity and sleep research community is not shared with a standard OSS license or not shared at all. As a result, GGIR’s license is not self-evident in this field of work. With this blog post I hope to create awareness about the value the GGIR license choice offers. This value already existed with the LGPL license but has become even larger with the Apache 2.0 license.

To change a software license, one needs consent from the intellectual property (IP) holders to the software. GGIR has benefited from numerous IP holders over the years. In recent months I emailed all major IP holders for their permission to change the license. The IP holders that replied where happy for me to implement the license update. So, I decided to go ahead with it. Nevertheless, by publishing this blog post I would like to raise attention to the new license and invite any IP holder with questions or concerns to contact me. The license change is effective as of GGIR release 2.9-4 (available on GitHub only), and will be part of the 2.10-0 CRAN release later this summer.

10th Anniversary of GGIR!