Published on November 28, 2013 by

Editorial: Opportunities and challenges in data analysis training

Statistical concepts and techniques are applied in virtually every scientific domain. It is therefore not surprising that introductory courses in applied statistics form part of most undergraduate training programmes. As essential as these are, such courses do not, however, prepare postgraduate students and academic staff for research questions and data structures from the real world. The rapid expansion of research and discovery in South Africa leads to greater research opportunities, but also requires interdisciplinary research teams to manage, explore, and analyse large and complex datasets. Sadly, there is an acute shortage of biostatistics expertise in South Africa, and globally for that matter, and few researchers have access to statisticians in their own institutions in this country (1).

Capacity building in data analysis

Since its inception, SACEMA has organised and facilitated many courses on (bio)statistics, both for SACEMA-affiliated students and external participants. In this issue brief reports on two recent 5-day courses are included: Bayesian Biostatistics, and Joint Modelling of Survival and Longitudinal Data. Through these and previous courses as well as our financial and mentoring support for MSc and PhD students in statistics, we have attempted to contribute to scientific capacity building in data analysis. Each of the courses that we have organised has been evaluated carefully, using post-course evaluation forms and semi-structured group discussions with participants, as well as through feedback sessions with tutors and lecturers. Along the way we have observed how a large fraction of course participants, across a wide range of academic backgrounds and levels of seniority, is operating in an error-prone and inefficient fashion: examples include ad-hoc manipulation of data in excel sheets, application of advanced black box techniques without thorough prior exploration of the data, and time-consuming, manual repeats of commands to construct models and generate tables and figures. Learning to efficiently manage, explore and visualise data does not only boost productivity, it is also critical for data quality assurance purposes, and enables one to construct (mental) pictures of the data from various angles; an essential starting point for building statistical models.

In our experience, low competence and poor self-efficacy in the use of statistical software packages is a major obstacle to acquiring and expanding expertise in statistical analysis. We have therefore decided to increase our efforts to strengthen hands-on capacity and confidence in data management, exploration and visualisation, using the versatile, open-source package R. Specifically, we plan to offer an intensive one-week course in June 2014 which will include computer practicals with participants’ own data. We hope to achieve a fundamental shift in self-efficacy and empowerment – from “I can’t do it; I need a biostatistician to help me!” to “I can do it; I know where to look for answers!”

Conducting educational research

Besides the immediate outcome of capacity building, the course also offers an opportunity to conduct research in education – a priority area for the South African National Research Foundation. Theoretical skill acquisition models from education science suggest that active-exploratory training is superior in many contexts to classical guided training (2). In guided training, the learner is assumed to be a passive participant, and a comprehensive, step-by-step approach with explicit instructions is used (3). Proficiency comes through repeated practice and making mistakes is avoided. Active-exploratory training, on the other hand, views trainees as active participants (Bell & Kozlowski, 2008), far less instruction is given, and errors are viewed as beneficial, since they promote exploration, help develop the know-how to avoid and overcome mistakes (4).

For software training, it is natural to assume the superiority of active-exploratory over guided training, because the former promotes the development of essential skills to keep performance anxiety and frustration at bay during task engagement. However, the performance of active-exploratory training has not yet been assessed for training in data management and visualisation using R.

Furthermore, any effective statistical package training should promote the development of self-help skills and a positive attitude to continued learning, beyond the time window of the initial training course. This process, known as adaptive transfer, may be promoted successfully using an active-exploratory training approach, but again, this has not yet been confirmed in the context of statistical package competence training with R. Lastly, while motivation to acquire statistical package competence can presumably be enhanced by using real-world datasets provided by the course participants, the added value of employing such datasets is yet to be confirmed (5).

In light of these unanswered questions, the course itself will be the subject of an education research project. And to maximise the linkage between capacity building and research, course participants will be invited to manage and visualise some of the course monitoring and evaluation data that they provided on day one and four of the course.

If you feel this course is exactly what you have been waiting for, and you  want to be part of this exciting initiative, check out the course details and dates for opening and closing of course applications on the SACEMA website.