The aim of this project is to help implement Federated Data Analysis in ProPASS. ProPASS is the international research consortium for Prospective Physical Activity, Sitting and Sleep. The consortium aims to facilitate pooled analyses of data from members around the world to boost statistical power. Their data includes accelerometer data and various other typical epidemiological data types. ProPASS actively works on addressing challenges of pooled analysis such as data harmonization.
Pooled data analysis when data cannot be shared
Privacy concerns and research regulations can prohibit the sharing of research data for a pooled analysis. Study level meta-analysis (SLMA) is the next best option. However, we know that SLMA becomes inefficient when conducting explorative analysis: A local person at each data site needs to run the analysis script and share results with every iteration of the analysis done by every researcher in the consortium.
Individual level meta analysis (ILMA), often referred to as Federated Data Analysis or Federated Learning, circumvents this problem. Here, secure multi party computation is used to perform statistical analysis on individual data points across multiple data sites without disclosing identifiable information. In other words, a single person can run the analysis across all the data in the consortium without the need for action from staff at the data sites.
The ProPASS working group has asked me to help coordinate the implementation Federated Data Analysis in ProPASS. We identified DataSHIELD and the Obiba software stack as the most promising technologies to enable Federated Data Analysis. Several other consortia in the life sciences use DataSHIELD and it has an active developer community.
I am not doing this project on my own, but work closely with a task group of early career researchers: Doua El-Fatouhi, Jairo Migueles, Jonah Thomas, and Esther Smits. Also, Joel Nothman from the Sydney Informatics Hub has been making valuable contributions to our discussions.
So far, we have piloted the installation and usage of DataSHIELD on machines that hold non-confidential dummy data. Further, we will soon roll out a survey among a sample of ProPASS members to gain better insight into their statistical needs. For the upcoming year the goal is to set up the infrastructure and try to find additional partners to help set up some of the infrastructure components such as a user management system and tooling to aid process management.