Skip to McMaster Navigation Skip to Site Navigation Skip to main content
McMaster logo

Ethics Issues and Big Data

Big Data is a term that refers to large and diverse sets of data, characterized by the so-called V’s:  variability, variety, velocity, veracity, volume, validity, and volatility. In public health, there are many possible uses of Big Data, and three main streams can be identified: traditional medical data, “omics” data, and data from social media. Despite the potential benefits of leveraging Big Data to improve human health, there are myriad unresolved ethical issues to consider such as privacy, consent, transparency, and trustworthiness, as well as the risks of deepening “digital divides” and global power imbalances.

Expandable List

What is Big Data?
Boyd and Crawford (2012) define Big Data as “a cultural, technological, and scholarly phenomenon that rests on the interplay of: 1) Technology: Maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large data sets; 2) Analysis: Drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims; and 3) Mythology: The widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy”.1
However, Big Data is not simply defined by the large volume of its datasets. Currently, other characteristics (V’s) attributed to Big Data can be found in the literature, including:2
Variability: Lack of structure, consistency, and context
Variety: Includes audio files, imagery, numerical data, and text data
Velocity: Real-time processing and very high speed of transmission
Veracity: Accuracy, noise, and uncertainty in data
Volume: Extremely large data sets

In addition, two other V’s are also mentioned:3
Validity: Data correctness and accuracy for the intended use
Volatility: How long are the data valid and how long are they stored? At what point is data obsolete to current needs?

How can Big Data be used in health?
Several studies have illustrated the potential of Big Data in diverse health fields including epidemiology, precision medicine, and public health.
For instance, of the timely tracking of emerging epidemics can be achieved via online queries on disease symptoms posted on social media platforms like Facebook and Twitter. In clinical trials for precision medicine, large-scale population studies and Big Data analytics can lead to more precise drug development and more accurate treatments.2
In this regard, Hansen et al. (2014) broadly identify three Big Data streams in health:4
• Traditional medical data primarily originating from health systems (e.g., EMRs, personal and family health history, medication history, lab reports, pathology results).
• “Omics” data from large-scale datasets in the biological and molecular fields (e.g., genomics, microbiomics, proteomics, and metabolomics).
• Data from social media (e.g., Twitter, Facebook, etc.) and quantified self-movement (e.g., FitBit, mobile applications, wearable devices, etc.).

What are the ethics issues related to Big Data?
The current literature points to several ethics issues in the context of Big Data:
Privacy: In the era of Big Data, the ability to protect individual privacy is limited. This creates potentially important consequences for privacy rules in scientific research.5 As such, the potential benefits and progress of Big Data “cannot be ensured unless the cultural and ethical issues related to patient privacy and the need for individual institutions to maintain some degree of control over the data they collect are addressed.”6
Moreover, while privacy protection is a right, the risk of data security breaches may be increased in settings where legislation is underdeveloped, which is of particular concern for vulnerable populations and communities.7
Consent: Traditionally, consent is taken for participation in a single study, while Big Data aims to reveal unforeseen connections among many data points, involving multiple uses of individuals’ data over time.8 Consequently, researchers are exploring novel consent such as “broad” and “blanket” consent.
Transparency and Trustworthiness: Richards and King (2013) argue that Big Data creates a “Transparency Paradox” because while it promises to make the world more transparent, “its collection is invisible, and its tools and techniques are opaque.”9 People have the right to know how their data is being used; therefore, transparency and accountability in Big Data must be enhanced.
The commercial use of health data is particularly concerning, due to risks of data misuse, profiteering, and commercialization. Hence, trustworthiness and transparency of processes are key.10
Deepening “Digital Divides” and Global Power Imbalances: Reaching the full potential of Big Data depends on proper infrastructure and human resource capacities to store, process, analyze, and exchange data.
However, because IT infrastructure requires a significant investment of time and money, ensuring these capacities can be challenging in resource-constrained settings.11 Most of the necessary hardware for Big Data is located in high-income countries “and access to information and resources is skewed by a very unequal distribution of telecommunication capabilities to access them.”12

Richards and King (2013) argue that Big Data generates a “Power Paradox” that will create winners and losers, likely benefiting “institutions who wield its tool over the individuals being mined, analyzed, and sorted”.9 Moreover, there are ethical questions related to ensuring data accuracy, limiting bias, and population benefiting from Big Data analytics. For example, findings derived from social media may not be representative due to limited access by minority groups2 or by digital and data divides between urban and rural areas.13

The Big Data phenomenon continues to take place in an environment of uncertainty and rapid change. Although there are various studies underway to analyze the potential benefits and costs of Big Data, critical questions need to be further explored—in particular, we need to know “what all this data means, who gets access to what data, how data analysis is deployed, and to what ends.”1

Documents from multilateral organizations on Big Data, health, and ethics issues:

·       UNESCO. “Report of the IBC on Big Data and Health,” September 15, 2017. https://unesdoc.unesco.org/ark:/48223/pf0000248724

·       United Nations. “UNSDG | Data Privacy, Ethics and Protection: Guidance Note on Big Data for Achievement of the 2030 Agenda,” November 2017. https://unsdg.un.org/resources/data-privacy-ethics-and-protection-guidance-note-big-data-achievement-2030-agenda, https://unsdg.un.org/resources/data-privacy-ethics-and-protection-guidance-note-big-data-achievement-2030-agenda

·       World Health Organization. “Ethics and Governance of Artificial Intelligence for Health,” June 28, 2021. https://www.who.int/publications-detail-redirect/9789240029200

World Health Organization. “Big Data and Artificial Intelligence for Achieving Universal Health Coverage: An International Consultation on Ethics,” October 15, 2018. https://www.who.int/publications-detail-redirect/WHO-HMM-IER-REK-2018-2

1.      Danah Boyd and Kate Crawford, “Critical Questions for Big Data,” Information, Communication & Society 15, no. 5 (June 1, 2012): 662–79, https://doi.org/10.1080/1369118X.2012.678878.

2.      Kurt Benke and Geza Benke, “Artificial Intelligence and Big Data in Public Health,” International Journal of Environmental Research and Public Health 15, no. 12 (December 2018): 2796, https://doi.org/10.3390/ijerph15122796.

3.      Asokan, G.V., and Vanitha Asokan. “Leveraging ‘Big Data’ to Enhance the Effectiveness of ‘One Health’ in an Era of Health Informatics.” Journal of Epidemiology and Global Health 5, no. 4 (2015): 311–14. https://doi.org/10.1016/j.jegh.2015.02.001.

4.      Hansen, M. M., T. Miron-Shatz, A. Y. S. Lau, and C. Paton. “Big Data in Science and Healthcare: A Review of Recent Literature and Perspectives.” Yearbook of Medical Informatics 9, no. 1 (August 15, 2014): 21–26. https://doi.org/10.15265/IY-2014-0004.

5.      Schadt, Eric E. “The Changing Privacy Landscape in the Era of Big Data.” Molecular Systems Biology 8 (September 11, 2012): 612. https://doi.org/10.1038/msb.2012.47

6.      Larson, Eric B. “Building Trust in the Power of ‘Big Data’ Research to Serve the Public Good.” JAMA 309, no. 23 (June 19, 2013): 2443–44. https://doi.org/10.1001/jama.2013.5914

7.      Wyber, Rosemary, Samuel Vaillancourt, William Perry, Priya Mannava, Temitope Folaranmi, and Leo Anthony Celi. “Big Data in Global Health: Improving Health in Low- and Middle-Income Countries.” Bulletin of the World Health Organization 93, no. 3 (March 1, 2015): 203–8. https://doi.org/10.2471/BLT.14.139022

8.      Mittelstadt, Brent Daniel, and Luciano Floridi. “The Ethics of Big Data: Current and Foreseeable Issues in Biomedical Contexts.” Science and Engineering Ethics 22, no. 2 (April 2016): 303–41. https://doi.org/10.1007/s11948-015-9652-2

9.      Richards, Neil M., and Jonathan H. King. “Three Paradoxes of Big Data.” Stanford Law Review Online 66 (2014 2013): 41–46.

10.      Dove, Edward S., and Vural Özdemir. “What Role for Law, Human Rights, and Bioethics in an Age of Big Data, Consortia Science, and Consortia Ethics? The Importance of Trustworthiness.” Laws 4, no. 3 (September 1, 2015): 515–40. https://doi.org/10.3390/laws4030515

11.      Roski, Joachim, George W. Bo-Linn, and Timothy A. Andrews. “Creating Value in Health Care Through Big Data: Opportunities and Policy Implications.” Health Affairs 33, no. 7 (July 1, 2014): 1115–22. https://doi.org/10.1377/hlthaff.2014.0147

12.      Luna, D. R., J.C Mayan, M.J. García, A.A. Almerares, and M. Househ. “Challenges and Potential Solutions for Big Data Implementations in Developing Countries.” Yearbook of Medical Informatics 9, no. 1 (August 15, 2014): 36–41. https://doi.org/10.15265/IY-2014-0012

13.        McDonald, Sean. “Ebola: A Big Data Disaster,” 2016. https://cis-india.org/papers/ebola-a-big-data-disaster

November 2021