The ‘cut-points’ approach is one of the most criticized analytical approaches in the field of physical activity research. Despite the criticism, cut-points are still widely used. R package GGIR facilitates the use of cut-points and by that contributes to the continued use of cut-points. So, you may wonder: Why does GGIR facilitate such a controversial method? Do the people behind GGIR not know about the limitations of cut-points? To answer these questions, it may be good that I first explain what the ‘cut-points’ approach is.
What is the cut-points approach?
Wearable accelerometer data can be processed towards an indicator of body acceleration over time. Although acceleration is a meaningful kinematic indicator, researchers have not incorporated acceleration directly into physical activity guidelines. Instead, the research community prefers to phrase physical activity guidelines in terms of time spent in levels of energy expenditure. Levels of energy expenditure are defined based on a construct named Metabolic Equivalent of Task (MET).
There are various ways to calculate MET. The most common approach is to use Oxygen consumed in milliliters per minute per kilogram body mass divided by 3.5. To obtain the MET value for an activity type we divide this Oxygen value during the activity by the value in rest. As a final step we convert the MET values on a continuous scale to the categorical intensity levels named: sedentary, light, moderate, and vigorous. With thresholds typically chosen at 1.5, 3.0, and 6.0 MET.
Direct measurement of Oxygen consumption in the real life setting of study participants is not feasible. Therefore, other methods are used to estimate the MET levels. Accelerometry is one of those methods as it is feasible to implement under real life conditions. When using accelerometers, the cut-points are technically acceleration thresholds that attempt to segment the acceleration values into the before mentioned intensity levels. Cut-points are identified based on an optimisation procedure utilizing small studies involving the combined measurement of MET values with indirect calorimetry and acceleration with a wearable accelerometer.
Known criticism of the cut-point approach
- Cut-points assume that body acceleration differs by intensity (MET) level, which is not always true. Two activity types can have a different acceleration but the same intensity level, or come with the same acceleration while being in different intensity levels.
- Cut-points are derived from small sample size studies and by that not well representative for the wider population.
- Cut-points need to re-derived for each new accelerometer data processing approach, each age group, and each accelerometer attachment location.
- Cut-points collapse rich time series information into time spent in only three or four categories of behaviour.
- Cut-points may seem simple to implement but actually come with various non-trivial decisions. For example, epoch length, bout length, bout algorithm, and whether to allow for breaks in bouts.
As a side note – To some degree all these limitations also apply to machine learned models. But this blog post is about cut-points, I will save my thoughts about machine learning for a future blog post.
History of cut-points in GGIR
The cut-points approach was intentionally missing in the early versions of the code that would later become R package GGIR. I left them out because I hoped that that would help push the field away from cut-points. However, my somewhat idealistic ambition did not survive long. I abandoned it in 2012 when I was asked to design and implement the accelerometer data processing for the Pelotas birth cohorts in Brazil and the Whitehall study II in the United Kingdom. Especially the Pelotas birth cohort team was keen to have time spent in moderate or vigorous physical activity incorporated. Looking back, I had the following justifications to give in:
- As an early career researcher I was keen to make an impact. Leading on the design of data processing pipeline would definitely be a nice boost to my experience. So, it seemed not worth the effort to make a big issue about including a widely used method like the cut-points method.
- By letting the GGIR user choose their own cut-points, the responsibility for that choice lies with the user and not with me.
- The cut-points approach can be interpreted as time spent in acceleration ranges. Therefore, I considered the debate around cut-points methods largely as an interpretation problem and not a methodological problem as such.
Nevertheless, a consequence of this decision is that GGIR users are confronted with the confusing reputation of cut-points as being both popular and widely criticised.
Why this blog post?
In this blog post I aim to share my personal views on the topic of MET level estimation with cut-points. I hope that by doing so I can provide some guidance to (new) GGIR users facing the confusing reputation of cut-points.
The problematic MET construct
To navigate the discussion around cut-points, I think it is important to first convince you that the MET construct itself is the main problem:
As you may have noticed the MET calculation comes with the assumption that dividing Oxygen consumption by body weight addresses variation in Oxygen consumption explained by body weight irrespective of the activity type being performed. However, it is well known in the field of exercise physiology that the relation between Oxygen consumption and body weight differs by activity type. For example, the classical textbook ‘Work Physiology’ by Åstrand and Rodahl already discussed this for VO2-max measurement. To truly normalise for body weight, we may need to divide by body weight to the power of X, where X differs per activity type. The problem of that, however, is that a different coefficient X for each activity type renders MET values incomparable across activity types. As a result, it is impossible to normalise Oxygen consumption for body weight across activity types.
Now you may say that the step in the MET calculation to divide by Oxygen consumption rest will address all this because it makes METs unit less. That would only be true if the over- or under correction for body weight in rest would be identical to the over-or under correction for body weight during each type of activity. As discussed above this is not what we would expect. Furthermore, if the direction of over- or under correction is reversed between rest and the activity type of interest then dividing by MET in rest will amplify the error.
Why is this a problem?
This is not a nuance. The inability to normalise for body weight invalidates MET as universal criterion method. It causes MET levels to be different between groups that differ in body weight despite performing exactly the same activity type, with bias that is activity type specific.
If you have access to measured MET values per activity type then you can see this for yourself by plotting the MET values from a single activity type as a function of body weight. In the activity type ‘sitting still’ you will probably see a negative slope, which indicates that MET overcorrects for body weight. As a result, it will push lighter individuals above the 1.5 MET cut-point towards light intensive activity. Similarly, you may see that MET also overcorrects for body weight causing heavier cycling individuals to be ranked as light active. In contrast, light individuals performing the same cycling activity could be classified as moderately active.
A second problem
A second problem with the MET construct based on Oxygen consumption is that it does not account for carbon dioxide production. As a result, it is a poor measure of energy expenditure when comparing groups that differ in diet. This is well known in exercise physiology and the reason why exercise physiologist typically try to standardise the diet of study participants prior to experiments. On a positive note, this second problem can partly be addressed by revising MET to incorporate both Oxygen and Carbon dioxide measurements.
Any method developed to estimate MET values will struggle to remain valid when cross-validated in individuals with different body weight, different composition of activity types, and/or different diet. This is not because methods are inaccurate but a direct result of the MET construct itself.
Note that I am not discussing measurement errors in indirect calorimetry. It is the MET construct derived from the Oxygen data that is problematic, not the Oxygen data itself.
Also, it is important to note that the limitations of MET are not specific to cut-points but affect all attempts to estimate or classify MET levels, ranging from accelerometers, to heart rate sensors, to self-report methods, and from cut-point techniques to machine learning methods.
Biases in studies that evaluate cut-point methods
If you are somehow not convinced yet by my critiques on the MET construct then there is a second reason to be careful with studies that propose or evaluate methods to convert accelerometer data to MET values. Many of them come with one or both of the biases as listed below.
1. MET at epoch level unjustified
Energy metabolism can only be measured reliably during state aerobic energy metabolism. However, some studies incorrectly ignore this principle and assign the derived average MET level (during steady state) to each 5-60 second epoch within the steady state window. So, they assume that the MET level was constant during the steady state period. This is problematic because indirect calorimetry cannot provide evidence that the MET level is constant at such a high resolution.
Similarly, there have been studies who use the MET-compendium to assign MET values to epoch level data. This is equally problematic because those MET-compendium values were derived as average from steady state data and cannot be used as reference for epoch-level data. The epoch-level data represents the full within-individual variation in epoch-level energy expenditure, while the MET compendium only provides the average across a steady state window.
Both scenarios penalize accelerometer-based methods that are sensitive to true epoch-by-epoch variations in energy expenditure. Indirect calorimetry cannot capture this. Therefore, any study that evaluates a MET-classification method at a 1-minute or shorter epoch duration without proof of steady state energy metabolism for that epoch should be treated with high suspicion.
2. Inconsistencies in method implementation
The cut-points themselves are only one component of the cut-points method. The performance is also defined by other components such as:
- acceleration sensor version
- manufacturer software version and its configuration
- software to read and process the data
- efforts to monitor or correct for calibration error
- scripts to run the statistical analysis.
The implementation of cut-points methods often differs in more than only one of the components. For example, a common inconsistency across studies is the exact way MET in rest are derived. Using a different method for deriving MET in rest may cause a minor absolute bias, but could introduce a significant bias in the intensity levels that are extracted from it.
As a result, differences in performances across studies may then no longer be explained by the cut-points alone. This invalidates the comparison itself, the validation study is invalid. So, studies who do not pay attention to possible methodological differences should be treated with high suspicion.
Why then still facilitate cut-points in GGIR?
Despite all these limitations cut-points allow us to discriminate individuals with different behavioural time-use profiles. The estimates are interpretable as incremental levels of body acceleration. Body acceleration is known to be a kinematic characteristic of human behaviour. Further, it is proven to be a crude proxy for other physiological processes. The problem has been the interpretation of the method output as direct measure of those physiological processes. If we acknowledge that it is not true we actually create value.
Additionally, it may be important to point out that cut-points are not the only method GGIR facilitates. GGIR has also been used for:
- deriving and comparing average acceleration during the entire day or during waking hours
- deriving and comparing the distribution of acceleration values
- preparing data for use with functional data analysis
- deriving and comparing other behavioural metrics such as the intensity gradient and characterisation of the most active X hours per day
- Before investing more effort into MET classification we may first need to identify and internationally agree on a good measurement construct. Ideally, we need a measurement construct that is comparable across populations of different body weight and across activity types.
- It may be time to start expressing physical activity guidelines in terms of required acceleration levels. An advantage of such a guideline is that it can directly be incorporated into consumer wearables. Further, it avoids relying on the problematic MET construct. Especially when methods like cut-points are linear there is no added value of converting to MET, because both will explain the exact same variance in the data.
- More wide scale adoption of alternative analytical approaches. If needed, in parallel to the conventional methods such as cut-points. Reporting multiple outcomes in publications helps to build up reference values for those outcomes.