Chapter 2 Privacy and Security

AI is revolutionizing business practices and processes at an ever-faster pace. At present, we’ve already seen banks using AI tools to assess customer credit scores and insurance firms evaluating customer risk profiles based on credit transactional history or payment history.

Marketing campaigners are using data not only to understand past trends but also to predict future behavior. Predictive analytics is providing brands with the ability to automate marketing responses in any given customer situation, such as live webs, with intelligent agents connecting the dots and finding products, content or medical treatment specifically targeted to the consumer. In some more advanced applications, AI is taking another step-up with the use of unstructured data and client interactions through chatbots. So all in all, AI is enabling machines to interact with users at an even more intimate level.

Privacy has been traditionally recognized as a prerequisite for the exercise of human rights such as freedom of expression, freedom of association, and freedom of choice. In the AI era, privacy hinges on our ability to control how our data is being stored, modified, and exchanged between different parties. In this chapter, we study the crucial role of data in most modern AI applications. Through a couple of case studies, we explore how data breaches, consent issues can affect the outcomes derived from AI technologies.

Data is at the core of AI

“The recent advances in key AI capabilities such as deep learning have been made possible by vast troves of data. This data has to be collected and used, which means issues related to AI are closely intertwined with those that relate to privacy and data.” – (main?)

AI offers new capabilities, but these new capabilities also have the potential to breach privacy regulations in new ways. If AI technologies can identify anonymized data, for example, this has repercussions for what data organizations can safely use.

2.2 Identification and tracking

AI enables the ever more efficient identification of individual persons by both public and private entities. It can be utilized to identify, and potentially be used to track and monitor individuals across multiple devices, e.g., whether they are at work, at home, or a public location. Even if your personal data is anonymized in a part of a large data set, it can potentially be de-anonymize (aka. re-personalized) based on inferences from other devices.

2.2.1 Case study: Identification through geo-profiling

The following case study is extracted from Chapter 3.3.2 (main?).

Locating people via geo-profiling

A recently published research uses geo-profiles generated by publicly available information to (possibly) identify the artist Banksy, who has chosen to remain anonymous. The study was framed as an investigation of the use of geo-profiling to solve a “mystery of modern art.” The authors suggest that these methods could be used by law enforcement to locate terrorist bases based on terrorist graffiti.

2.2.2 Case study: The usage of biometric data

Voice recognition and facial recognition are two common scalable methods of identification using biometric data that AI is becoming increasingly adept at executing. Identification of individuals is sometimes the desirable outcome, if aligned with ethical principles (for example in detecting fraud, money laundering, or terrorist financing). However, these methods have the potential to severely compromise anonymity in the public sphere. For example, law enforcement agencies can use facial recognition and voice recognition to find individuals without probable cause or reasonable suspicion, thus circumventing legal procedures that they would otherwise have to uphold.

The following case study is extracted from the article in https://www.theguardian.com/technology/2014/may/04/facial-recognition-technology-identity-tesco-ethical-issues;

Targeted Marketing through Face Recognition

At present, advertisements are made more personal by using facial recognition technology. In 2014, several companies are already bringing these ideas to (digital) life. Companies like UK grocer Tesco has introduced a digital screen outside its Express store in Lincoln, allowing it to promote its own ranges and engage with visiting customers. These screens will use inbuilt cameras equipped with facial recognition algorithms to ascertain the age and gender of individual shoppers.

A Californian startup called Emotient meanwhile focuses on the area of facial expression analysis. Incorporated into next-generation TVs by way of a webcam, this technology could potentially be used to monitor viewer engagement levels with whatever entertainment is placed in front of them.

2.3 Privacy breach through inference

While we have explicit data recorded about us, and we expect that data to be secure, there is also implicit data available on us for those with the predictive tools to do the inference.

Explicit vs. Implicit data

  • Implicit data is information that is not provided intentionally but gathered from available data streams, either directly or through analysis of explicit data.
  • Explicit data is information that is provided intentionally, for example through surveys and membership registration forms.
    Source from https://whatis.techtarget.com/definition/implicit-data

2.3.1 Case study: Social media and the loss of confidentiality

Watch the following TED talk from Jennifer Golbeck, discussing how they are able to predict many things about users from social media.

2.3.2 Case study: Retailers’ Predictions

Every time we go shopping, we are implicitly sharing intimate details about our consumption patterns with retailers. And many of those retailers are studying those details to figure out what we like, what we need, and which coupons are most likely to make us happy, etc..
Watch the following video, where Charles Duhigg details how some retailers profit by predicting major changes in our lives

2.3.3 Case study: Facebook manipulating the mood states and perceptions of users

A controversial research study (Kramer8788?) on “emotional contagion,” published in a peer-reviewed journal, used Facebook’s platform to demonstrate that users’ moods can be manipulated by filtering their feeds (comments, videos, pictures and web links posted by their Facebook friends). The social media company altered the news feeds (the main page users land on for a stream of updates from friends) of nearly 700,000 users. Feeds were changed to reflect more “positive” or “negative” content, to determine if seeing more sad messages makes a person sadder. The study shows that reducing exposure to feeds with positive content led to the user posting fewer positive posts, and the same pattern occurred for negative content. This study was publicly criticized for failing to gain informed consent from Facebook users. However, setting aside the issue of informed consent, the study highlights the power of AI-driven ‘filtering practices’ to shape user mood, which can be used to enhance the impact of targeted advertising.

The above case study reference the following articles:

2.4 Data breaches/leakage

A data breach happens when personal information is accessed, disclosed without authorization, or is lost. For example, when:

  • a USB or mobile phone that holds an individual’s personal information is stolen
  • a database containing personal information is hacked
  • someone’s personal information is sent to the wrong person

A data breach can harm an individual whose personal information is affected. They can, for example, suffer distress or financial loss. On the other hand, data breaches can cause devastating financial losses and affect an organization’s reputation for years. From lost business to regulatory fines and remediation costs, data breaches have far-reaching consequences. Refer to the figures highlighted in the Cost of a Data Breach Report 2019, published by IBM.

Read through the following section in (main?) to understand more about data breaches, along with a case study about the Equifax data breach.

PRESCRIBED READING  
Chapter 3.2 (main?)

2.5 Open data sharing and re-identification

Open data, especially open government data, is a tremendous resource that is as yet largely untapped. A fundamental aspect of open data is that it’s available for re-use with formats and licensing that allow others to re-use and remix the data. Many individuals and organizations collect a broad range of different types of data to perform their tasks. Government is particularly significant in this respect, both because of the quantity and centrality of the data it collects, but also because most of that government data is public data by law, and therefore could be made open and made available for others to use.

There are many benefits of having open data and there are already a large number of areas where open data is creating value. The most common aspect is transparency. Open Data supports public oversight of governments and helps reduce corruption by enabling greater transparency. For instance, Open Data makes it easier to monitor government activities, such as tracking public budget expenditures and impacts. You can explore many other benefits of open data:

In general, these data are released after applying some anonymization techniques to ensure the sources’ privacy, e.g., removing personally identifiable information (PII) such as names, addresses, and social security numbers. This assurance of privacy then allows the government to legally share limited data sets with third parties without requiring written permission. Such data has proved to be very valuable for researchers, particularly in health care. However, there are ethical issues to consider, due to the fact that many of these forms of data could be re-identifiable with the capability of AI: in some scenario, machine learning models can be used to detect patterns and infer personal information from non-personal data. This information can be exploited in unethical ways that infringe on the right to privacy.

2.5.1 Case Study: De-identified Medicare and PBS open data

The following case study is extracted from the article in https://www.oaic.gov.au/privacy/privacy-decisions/investigation-reports/mbspbs-data-publication/

Re-identification possible with Australian de-identified Medicare and PBS open data

On 1 August 2016, the Department of Health published on data.gov.au a collection of Medicare Benefits Schedule and Pharmaceutical Benefits Schedule related data. The data consisted of claims information for a 10% sample of people who had made a claim for payment of Medicare Benefits since 1984, or for payment of Pharmaceutical Benefits since 2003.

The Department of Health knew that this data would be extremely valuable for medical research and policy development purposes. It appears that the Department’s decision to release the data was made in good faith for the public interest, on an understanding that the privacy interests of all relevant individuals had been protected.

A range of steps were taken by the Department of Health to de-identify the dataset before its public release. However, one month after the dataset was published, Drs Chris Culnane, Benjamin Rubinstein and Vanessa Teague of the University of Melbourne identified a weakness in the technique used to encrypt Medicare service provider numbers in the dataset, allowing the encryption to be reversed. This provided a potential for Medicare service providers referred to in the data to be identified. After this discovery, the dataset was further examined by an interagency task force involving experts from the Australian Bureau of Statistics and Data61, and separately by Drs Culnane, Rubinstein, and Teague. This analysis identified that the detailed nature of the information in the dataset created a risk that some individuals may be identified by linking the dataset with other information sources.

2.6 Privacy Protection and Security

When working with personal data, protecting the consent process is fundamental to protecting privacy. Firstly, due to the sensitive nature of personal data, consent should be adequately addressed at the point of data collection. According to (main?), the Privacy Act stipulates that consent may be express or implied and that it must abide by four key terms:

  • The individual is adequately informed before giving consent
  • The individual gives consent voluntarily
  • The consent is current and specific
  • The individual has the capacity to understand and communicate their consent

Also, best practices on data de-identification and the design and use of data analytics are required to ensure effective data governance in line with the Privacy Act.

Data Governance

“Data governance is crucial to ethical AI; organizations developing AI technologies need to ensure they have strong data governance foundations or their AI applications risk being fed with inappropriate data and breaching privacy and/or discrimination laws.” – (main?)

According to the AI Ethical Framework, the priciple to ensure privacy protection and data security:
Privacy protection and security

Throughout their lifecycle, AI systems should respect and uphold privacy rights and data protection, and ensure the security of data.

This principle aims to ensure respect for privacy and data protection when using AI systems. This includes ensuring proper data governance, and management, for all data used and generated by the AI system throughout its lifecycle. For example, maintaining privacy through appropriate data anonymization used by AI systems. Further, the connection between data, and inferences drawn from that data by AI systems, should be sound and assessed in an ongoing manner.

This principle also aims to ensure appropriate data and AI system security measures are in place. This includes the identification of potential security vulnerabilities and assurance of resilience to adversarial attacks. Security measures should account for unintended applications of AI systems, and potential abuse risks, with appropriate mitigation measures.

2.6.1 Privacy Protection and Security Technologies

With vast amounts of data being collected on individuals, the importance of protecting privacy – and of knowing when privacy has been compromised – is crucial.

Read the following article published in Forbes in Oct. 2017 to learn more on the common Data Security and Privacy Technologies:

2.6.1.1 The Federated Learning approach

The need for privacy and protection brought in lots of attention from the AI research community, and there are recent advances in privacy-preserving, secure machine learning and artificial intelligence systems. One example is Federated Learning. Federated Learning is a machine learning technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging their data samples. This approach stands in contrast to traditional centralized machine learning techniques where all data samples are uploaded to one server, as well as to more classical decentralized approaches which assume that local data samples are identically distributed.

Federated learning enables multiple actors to build a common, robust machine learning model without sharing data, thus addressing critical issues such as data privacy, data security, data access rights and access to heterogeneous data. Its applications are spread over a number of industries including defense, telecommunications, IoT, or pharmaceutics.

The following online comic from Google gives a straightforward interpretation of Federated Learning: