How to keep GGIR alive?

If we want Open-Source research software to remain functional and relevant then an ongoing maintenance effort is needed. At a basic level, the maintenance effort would include the preservation of functionality. For example, fixing major bugs and keeping the software up to date with changes in dependencies. At a more advanced level, the maintenance effort would also cover the improvement of existing functionality, adding new features, and providing support to users.

Software sustainability is a term closely related to software maintenance and refers to two aspects:

  • Designing software in a way that eases maintenance.
  • The strategy to enable software maintenance over time.

In this blog post I will focus on the second aspect in relation to Open-Source R package GGIR.

Sustainability models

As Open-Source software is free, we cannot use software sales to fund the maintenance effort. Therefore, I had to choose a different model. The model I chose is to maintain GGIR based on paid consultancies. It works as follows: GGIR users contract me to make specific enhancements to GGIR or to provide training on the use of GGIR. The experience and income I generate from doing these consultancies allow me to remain committed to the maintenance of GGIR. Although this has been and still is an effective model, it may be good to reflect on what other models I could use in addition to the consultancy service.

1. Individual sponsorship?

In an individual sponsorship the developer of the software would ask benefactors of the software to sponsor them. An example of this is the GitHub sponsorship model. This model offers two versions:

1a. Sponsorship as form of appreciation:

This sponsorship comes without any commitment from the developer. It is just a way for sponsors to demonstrate their appreciation for the developers’ ongoing efforts to maintain the Open-Source software.

At first, this approach feels tempting to adopt: It is easy to set-up and it reinforces the open science idea that the community sustains the software. However, I also see some problems with the individual sponsorship model:

  • Conflict with commercial activities: It could create the confusing impression that I am a charity. I am not sure how I would be able to keep a clear distinction between what I do as a paid consultant and what activities could reflect the sponsorship donations.
  • Income insufficient: It is hard to imagine that this form of sponsorship can generate sufficient income to live from.
  • Unfair in relation to co-contributors: Sponsorships typically go to well-known developers and less prolific or more junior contributors may then miss out.

1b. Sponsorship as a service:

In this sponsorship type the sponsor expects a specific commitment from the developer. For example, to be available for a call once per month or to give higher priority to the needs of a sponsor. Sometimes this is done in the form a retainer agreement. The challenges I foresee here are:

  • Users may not need long term support: Once the client knows their way around GGIR the need for support will reduce, while the need for maintenance efforts continues. As a result, the service will effectively turn into the before mentioned ‘sponsorship as a form of appreciation’.
  • Hard to define a standard service description: Some GGIR users may only need minor support, while others GGIR user require substantial time investments. This makes it difficult to come up with a standard service model. Asking for €1000 per year would seem expensive for the first group but cheap for the second group.

2. Project-based sponsorship?

Examples of sponsorship at project level are the sponsorships provided by NumFocus and CZI. Project-based sponsorship makes it easier to share sponsorship income across developers compared with the individual sponsorship as discussed above. Further, if I would register as a foundation, which some of these sponsors require, it would become easier to separate project-based sponsorship from commercial activities. Nonetheless, for GGIR a project-based sponsorship may not be the most intuitive approach:

  • Sponsorships typically designed for teams: It appears that most sponsorships are designed to help teams-based organisations, while in the case of GGIR maintenance is primarily done by me alone. Of course, I could start assembling a team, but in the initial phase all the administrative and coordination burden will be on me.
  • Limited amount of funding opportunities: Both NumFocus and CZI are based in the United States. I am not aware of any European sponsorships for Open Source software that I am eligible for. Most opportunities seem to be restricted to academic researchers only.

Project-based sponsorships are potentially valuable but are best led by an academic researcher familiar with grant applications.

3. Linking Open-Source software to commercial products and services?

At the moment I have no plans in this direction, but in theory I could offer the following services:

  • Cloud service to provide data analysis.
  • Exclusive faster second version of GGIR that can only be used with a specific accelerometer brand.

A part of the revenue generated from these services could then be used for maintenance of the software. My feeling for now is that both ideas require a level of expertise that I do not have.

4. Advertisements?

Incorporating advertisements in software is common with mobile phone apps. In the case of GGIR, I already make subtle advertisement for my consultancy services as Accelting. If I wanted to get external advertisers, then I am not sure who would be a good candidate advertiser. The strength of GGIR is that it works across accelerometer brands, so allowing accelerometer manufacturers to advertise their product inside GGIR does not seem fitting. However, other forms of advertising may work fine. Interested? I am happy to have a chat.

Conclusion

The best route forward seems keeping the focus on my paid consultancy for the following reasons:

  • A clear commitment between me, doing the work, and the client paying me.
  • Paying clients ensure that all maintenance efforts are tailored to the most urgent needs of the user community.
  • It counters the idea that Open-Source software is inherently unsustainable and should be supported like we support charities. By doing business based on Open-Source software I demonstrate the value and sustainability of GGIR as Open-Source software.

That said, this shouldn’t stop anyone else from contributing to GGIR in complementary ways.

How you can contribute?

  • Are you in a position to apply for funding to help improve GGIR? If yes, please do so, and I’d be more than happy to collaborate and support you where needed.
  • Do you have software development skills and are you in a position to help contribute to GGIR then, again, please do so. Being part of the open source software community is rewarding in itself!
  • If you know of any other sustainability model not listed here then let me know.

 

How to measure PA guideline adherence?

I often get asked the question: How can we quantify a person’s adherence to the physical activity (PA) guidelines with an accelerometer? In this blog post I will try to answer that question.

How we arrived here

As you may know, there are many ways to process accelerometer data towards an estimate of a person’s physical activity. Also, a decision needs to be made on what accelerometer to use and where to place it on the body. An accelerometer-based research method is the combination of sensor type, attachment location, algorithms used, and the software implementation of those algorithms. There is no universally agreed method for physical activity assessment, by which a variety of methods have been used across the physical activity research literature. PA guidelines have been constructed based on the evidence provided by this literature.

PA guideline reports usually do not guide us on what method we should use to measure adherence to the guideline. This is understandable: The underlying evidence does not use a consistent method, which makes it hard to provide such guidance. The resulting confusion about how guideline adherence should be quantified leads many people in this research field to wonder: What method should I use?

Why there is no simple answer

Before I continue, I would like to point out that a measurement method determines the definition of what is being measured. Publications sometimes start with an elegant definition of physical activity, but it is the measurement method that defines what physical activity is, not the researcher. The researcher only chooses the method.

This effectively means that the definition of physical activity according to the PA guidelines is a weighted average of all the methods that underlie the PA guidelines. Consequently, we can only quantify adherence to the PA guidelines if we apply the same weighted average of methods to our study participants. To me this seems infeasible as you would have to instruct your study participant to wear accelerometers from a variety of brands on multiple body locations simultaneously. Note that I am not even discussing self-report methods for physical activity, which would further complicate this effort.

Therefore, monitoring adherence to PA guidelines is not straightforward. Any attempt to measure guideline adherence will result in a biased estimate relative to the multi-method evidence on which a guideline is based. We should not blame methods for not being comparable, we should blame ourselves as a research community for having an ambiguous definition of physical activity by which it is impossible for any method to comply with it.

Surely a more accurate method is what we need?

Now imagine that we had access to a super method able to measure people’s energy expenditure, count their steps, and detect activity types without any error and with the highest level of feasibility. Even such a super method would be unsuitable for evaluating adherence to the current PA guidelines, for the simple reason that this super method was never used to develop the PA guidelines in the first place. Instead, we would need a method that replicates the pooled imperfections of the methods that underlie the current PA guidelines. As I mentioned before, this would be an impossible task.

Looking for a solution

I am keen to help the community to find a solution. If only, it was to avoid having to justify all the time why we cannot accurately quantify PA guideline adherence. I already started drafting a short communication for a journal with colleagues. Well, for now it is a good exercise in trying to shape our thoughts, even if we decide not to submit it in the end.

In the meantime, I think it is important to make people aware that this is a complex challenge, which should not be treated as just a methodological problem, just a harmonisation problem, just a research communication problem, or just a PA guideline construction problem. We may even need to rethink the whole research ecosystem to address the actual issue. Here, we may benefit from looking for inspiration in other professional fields that produce public health guidelines. If you have thoughts on this then I would love to hear from you.

High frequencies in an acceleration signal

It is often argued that high frequency components in an acceleration signal should be omitted. The motivation given is that these high frequencies are a consequence of machine noise or vibrations, or at least not human. With this comes the assumption that human movement can only cause low frequency components. Here, the division line between low- and high frequencies varies across studies from 5 to 20 Hertz.

Intermezzo – What do we mean by ‘frequencies in an acceleration signal’?

Joseph Fourier showed that a time series can be described as the combination of multiple sine waves, each with their own frequency characteristic. A periodogram is a visualisation of the contribution of all those sines to the total variance in a time series. The frequencies are shown on the horizontal axis and their contribution to the variance is displayed on the vertical axis.

Are high frequencies truly just noise?

At first, the idea that high frequencies are not caused by human movement sounds plausible: A washing machine at top speed rotates at around 23 Hertz. Surely our body would not be capable of producing that kind of rotations.

Nevertheless, very little empirical research has been done to support the idea that all high frequencies are redundant. To me it does not feel satisfying to embrace this widespread belief without having clear evidence for it. For example, it would be helpful to know the relative contribution of human movement, machine noise, and non-human vibrations to the frequency content of a signal. Also, it would be good to know whether these contributions vary across the frequency spectrum. Such insights would help to make an informed decision on what frequency filter to use, if any.

To gain a better understanding I did the following short experiments together with my project partners in Regensburg:

Accelerometer data from walking and non-wear

The acceleration signal was collected with a hip-worn accelerometer in walking. This tri-axial accelerometer was configured to collect data at 100 Hertz. Additionally, I have a recording of that same accelerometer when it was lying still on a table (not worn). Both signals have a length of 45 seconds and I only look at the vertical (longitudinal) signal. The periodograms show that the frequency density is consistently higher for walking than when the accelerometer is not worn. Not just during the low frequencies below 10 Hertz, but all the way up to 50 Hertz. This indicates that the high frequencies in the signal for walking are not simply explained by a natural level of noise seen when the sensor is not worn.

Enhancing theoretical understanding with artificial data

I complemented these accelerometer experiments with an artificial signal. The artificial signal is a rectified 1 Hertz sine wave, which looks like a pendulum motion. Pendulum motions are often used as a model for human movement where smooth rotations are interrupted by abrupt changes in movement direction. Next, I sum this artificial signal with the experimental non-wear data (discussed above). The resulting periodogram shows not only a peak for the resulting 2 Hertz frequency of the pendulum, but also it’s harmonics at higher frequencies. Harmonics are a natural consequence of waves that do not have a perfect sine shape. In sound, the specific shape of the wave is what gives our voice and musical instruments their characteristic sound. In acceleration signals the harmonics give the wave it’s movement specific shape.

Interpretation

The periodogram of the artificial signal looks suspiciously similar to what we see in the periodogram of the signal for walking. The peaks in both have a gradual decline towards the top frequency.

Based on this it would seem likely that higher frequencies in an acceleration signal are not only caused by machine noise but also by the harmonics of movement. In fact, the contribution of the harmonics is much larger than the contribution of signal noise in this little experiment. Further, the experiment shows that even a simple 2 Hertz pendulum movement can produce high frequency components. However, it is not a single human body joint that is rotating at these high frequencies. It is the combined impact of multiple joint movements and the interaction of the human body segments with each other and with the environment (floor in this case) that causes the pendulum shape. The pendulum shape in turn produces a broad frequency profile. If we would filter out these harmonics in the signal, we would lose the detailed representation of body movement in the original signal.

How about sensor vibrations?

The other explanation offered for high frequencies is that they represent vibrations of the accelerometer relative to the human body. For example, as a result of loose attachment with an elastic strap or because of skin movement relative to the underlying bones. In general, when objects vibrate they tend to vibrate at their, so called, eigenfrequency. However, when we look at the periodogram of walking we see a homogeneous increase in density of all frequencies. If the high frequencies would only be caused by a vibrations then we would expect to see an increase near the eigenfrequency or a set of eigenfrequencies for those vibrations.

Conclusion

In conclusion, high frequencies in an acceleration signal can represent body movement. Therefore, filtering out high frequencies may not necessarily be a good thing. Not filtering the signal will preserve the detailed representation of the movement. Additional advantages of not filtering are that it is computationally faster, and that signal processing is potentially easier to reproduce as there are less computational steps.

In the specific use case of assessing human daily physical activity I recommend the following:

  • Do not filter out the high frequencies unless there is clear evidence that filtering gives better estimates of physical activity. To my knowledge there are no studies that show the added value of filtering.
  • Only consider filtering out high frequencies if data comparability between accelerometer brands/generations is critical to your research question. Accelerometer brands and generations can have different sensitivities to human body acceleration. By filtering out high frequencies you create a less precise but potentially more comparable acceleration signal.

The exploration described in this blog post is part of the project I am doing for the University of Regensburg, Germany.