A recent story in Scientific American, Twitter to Release All Tweets to Scientists, explores the concept of harnessing historical tweets to advance research. This is an interesting notion which would certainly open the door to massive, longitudinal data sets. In theory it could help spot medical trends by analyzing disease progression and outbreaks. But how much more information would this approach really offer over and above what is already in place?
The industry already has the UK Health Protection Agency and Centers for Disease Control and Prevention in the US. Both of these have direct access into screening and testing services, enabling them to quickly spot potential outbreaks and intervene. This begs the question: how useful is data from Twitter when we already have these organizations in place?
One problem that instantly springs to mind is the compliance issue. How would you anonymize the data coming from Twitter? As with translational sciences, the same patient confidentiality issues apply and would need to be overcome. The burden from this compliance issue is huge and will need to be handled with the utmost care in order to avoid infringing people’s right to privacy.
This proposition also raises questions around data trustworthiness. Researchers would have to infer a lot from the language used in the tweet. On the other hand, systems used to conduct medical testing and capture medical information are arguably more robust in the validation of what is actually being said about the patient – as the data is entered by a trained individual – nurse, carer, doctor, surgeon, researcher etc.
A big data analytics approach to extract any patterns or generate testable hypotheses would also be essential. Harnessing Twitter data to advance research is an interesting proposition but whether it would work in practise remains to be seen. Just because you can, doesn’t mean you should…