On The Future Of Statistical Languages

Seth Brown, a data scientist in the telecommunications industry, has recently written an article on his blog entitled ‘On The Future Of Statistical Languages’

This article analyses the current state of statistical languages in use, and gives a justified response in predicting the future path in this field.

There currently exists a crossover between programming and statistical languages, with most statistics languages containing limited programming functions and vice versa. In order for this area to further develop, bridges need to be built to improve this crossover and make each area less exclusive, with a view to creating ‘an efficient, modern data analysis workflow’

Taking the authors example, use of languages ‘R’ and ‘Python’ amongst others, and the need to transfer between them top complete different tasks within the framework of the same project is inefficient. Whilst not intending to critique the current languages on offer, the author goes on to advocate ‘rich data analysis API no top of a more general open source programming language’.

But what does this mean for the future?

New inventions, technologies and increased reliance on digital devices mean that the amount of data collected is growing exponentially. To this end, the author proposes languages to focus on the statistical side, and to leave ‘the nuts and bolts’ of language design to its own experts. The language needs to be easy to understand, and approachable for students/statisticians/scientists using it so that they can build on the data collected and methodology instead of focusing on the language.

It should be free. Similar to current examples such as MATLAB, SPSS and Stata, to prevent monopolization and ensure that data can be shared across platforms and research can be advanced, avoiding as the author suggests, ‘The Microsoft Word problem’.

New tools should not be limited by domain specific languages or sunken-cost projects – as the Haskell case has proved.

The author then, suggests that going forward, ‘Python is the most obvious choice’ as it is currently already widely used, and already has tools in place. But even this needs improving in terms of accessibility to non-programmers, adopting a more user friendly environment, and deeper reach into academia, and thus instigate the construction of the programming-statistical bridge.

On The Future Of Statistical Languages’ / Seth Brown. 18th December 2013. On the blog “Dr. Bunsen