John Kratz of the California Digital Library recently published an article entitled ‘Fifteen ideas about data validation (and peer review)’
He describes it as a “longish list of non-parallel, sometimes-overlapping ideas about how data review, validation, or quality assessment could or should work, ” and lays out fifteen observations and recommendations to improve the process.
Problems with data validation can sometimes arise, as academic researchers often only publish raw datasets alongside their articles.
As a result it sometimes becomes difficult to assess the reliability and relevance of this data.
Whilst, as the author notes, there are some mechanisms in place to validate data, they are severely lacking in comparison to those in place for example in terms of citations ; where several widely recognised styles are already present.
This is somewhat surprising ; data validation is clearly of high importance in assuring the credibility of an academic article, and therefore strong mechanisms and even a standard procedure should be in place to ensure that this is the case.
One of the ongoing themes which runs throughout Kratz’ ideas is the depth of which the data needs to be reviewed ; not only by one person, but divided up among people or even organisations. Both data and metadata should be reviewed, not only by other academics, but experts in the field, the community and the users of the data. Similarly, aside from mere validation, actual use of the data is a form of review in itself, and works to confirm the true relevance and application of the data to conclude whether it really is fit for purpose.
View this video coming from Nature for more information on the subject: