Magma topp logo Til forsiden Econa

Artificial Intelligence and Data Regulations

Can stricter privacy laws provide a lifeline for European companies?

figur-authorfigur-authorfigur-authorfigur-author

Summary

A popular narrative is that stricter data protection laws in Europe puts European ai- and data-driven companies at a disadvantage vis-à-vis international tech giants like Google, Facebook and Alibaba, that are not subject to similar restrictions in their home markets. In this paper, we propose an alternative narrative, namely that that stricter European data protection laws may actually prove to be beneficial for European firms.

Two terms that quickly come up in any discussion about Fintech are big data and Artificial Intelligence. And with good reason. Due to recent advancements in data storage, collection and processing, combined with new applications of AI algorithms, applications such as algorithmic trading, chatbots, voice recognition and automatic translation have gone from science fiction to an app in your pocket. While such technological advances have opened up a sea of opportunities, they also bring with them some challenges. And one of the biggest challenges faced by many firms in the financial sector is the increasing threat that international tech giants like Google, Facebook and Alibaba, which possess both really big data and superior capabilities in analysing them, have set their sights on markets in the European financial sector (Angelshaug, Knudsen and Saebi, 2019).

One popular narrative concerning this threat is that the stricter European data protection laws create a disadvantage for European firms vis-à-vis the international tech giants. The reason is that European companies will have more restrictions on the data they can use and what they can use them for, which will ultimately hamper their ability to train AI algorithms and develop AI-driven business models that stand up to competition. While this narrative seems intuitively plausible, there is also an alternative, more hopeful story that could be told. This alternative narrative is that stricter European data protection laws can actually prove to be an advantage for European firms. The logic here is that stricter European data regulations will force European companies to develop capabilities, algorithms and business models that will differ from those developed by technology giants that mainly operate in markets with looser data regulations. And if core capabilities, algorithms and business models needed to effectively compete in Europe are distinctively different from those needed in other markets, this will raise the entry barriers. In essence, this will buy European companies time to build advantages that are harder for international newcomers to overtake.

The purpose of this paper is to look more closely at this alternative narrative, and discuss whether and to what degree the stricter European data regulations might actually be beneficial for European companies in their anticipated battle with the American and Chinese tech giants. To address this issue, we first give a brief introduction to data theory, before providing a general introduction to AI and its history. Then we give an overview of some key aspects of European data protection laws and the potential restrictions they pose on AI-powered services and business models, followed by an argumentation for why we believe the tighter EU data regulations might be good news for European companies.

BIG DATA AND AI

What are data?

Data refers to facts and statistics collected for reference or analysis (Oxford English Dictionary, 2019). According to classic information theory, data are the raw material used to generate information and knowledge (Rowley, 2007). Individual data points begin to generate meaningful insights – information – when they are combined in ways that make sense semantically. For example, the four words hot, cat, dog, eats tell us next to nothing, but if we use them to form the phrase “cat eats hot dog”, the data points are combined in a meaningful way and we have generated information. Furthermore, when information is logically connected to other chunks of information (e.g. other phrases and sentences), we are approaching knowledge, which can provide valuable input for decisions.

In the early phases of digitalisation, data did not play a dominant role in many business models, for the simple reason that there were technical limitations on computing power, data storage and data processing.

In the late 90s, the technical limitations associated with data slowly started to evaporate. Improved tools for collecting, structuring, storing, retrieving and analysing data opened up a multitude of new applications, fuelling the rise of Web 2.0 (O’Reilly, 2007) and the advent of e-business and e-commerce (Chaffey, 2007). However, the increasing volume, velocity and variety of data that had started to become available also created a set of new challenges for firms.

One challenge in particular was that a large share of potentially useful data remained of limited use. The reason was that much of the “new” types of data that were being produced were unstructured, meaning that existing data processing paradigms and corresponding tools were unable to exploit it to generate information and knowledge (Beath et al., 2012; Feldman and Sanger, 2007). Accumulation and processing of unstructured data was contingent on further technological advancements in computational power and analytical tools, such as big data analysis techniques and what we today refer to as deep learning AI models, which are currently considered the most promising machine learning approach (Brynjolfsson and McAfee, 2012; McAfee and Brynjolfsson, 2013; Witten et al., 2016).

A history of AI

While many think of AI as a fairly recent phenomenon, its origins can be dated back to the mid-20th century. Originally, AI was thought to be a straightforward application of computer power, a software engineering project like any other. Famously, the MIT professors Seymour Papert and Marvin Lee Minsky assigned image recognition as a summer programming project for undergraduates in 1966 (Papert, 1966).

Once researchers came to understand that AI was a hard problem, they struggled to make progress. While the early years of AI research (1956–1966) were characterised by rapid progress and high levels of enthusiasm, developments experienced a bumpier ride over the next 30 years. Researchers experienced regular setbacks, which led to so-called AI winters, longer periods when funding and interest in AI research suffered considerable blows.

1950s–1990: Symbolic AI

For much of the period from 1956 to 1993, the leading paradigm in AI research was known as symbolic AI. The idea here was to first reduce human cognition to the manipulation of symbols, and then implement that symbolic manipulation on a computer. The inspiration was formal logic, which captured a portion of conscious reasoning that dated all the way back to Euclid and Aristotle. The appeal of symbolic AI is that conscious reasoning can be explained. The computer could not only decide, but also explain how it made that decision. Symbolic AI also had intellectual appeal because of its connections with mathematics. The approach was so dominant that it was nicknamed GOFAI, for Good Old-Fashioned AI (Luger, 2005; Walmsley, 2012).

A simple symbolic approach to automatic translation would be to look up each word in a bilingual dictionary, but as anyone who has ever learned a foreign language knows, the correct translation can depend on context. One source of context is grammatical – a noun is different from a verb. The symbolic approach to grammar is to parse a sentence into a subject and predicate, and parse the subject and predicate into parts of speech such as nouns, verbs and adjectives. While this approach works for simple problems, linguists have struggled to explicitly describe the rules of grammar. As an illustration, the most elaborate attempt to describe the rules of English, the Cambridge Grammar of the English Language, is 1860 pages long!

1990s–2010s: Machine learning

An alternative approach to symbolic AI is to formally model the learning process. This is what we refer to as machine learning. While the term machine learning was coined as early as the 1950s, and many of the ideas in use today can be traced back to the 1960s and 1970s, it was not until the 1990s that machine learning applications really started to gain traction. In contrast to symbolic AI, machine learning relied heavily on statistics and probability theory – and large amounts of data. Instead of learning e.g. a language by starting with the grammatical rules (as in symbolic AI), people learn the grammar of their native dialect, not by consciously learning a multitude of rules, but by example. We know which sentences are grammatical because they “sound right”, for reasons that can be hard to articulate.

Machine learning algorithms mimic this logic, and for a long time the leading machine learning approach was for researchers to design a statistical model appropriate for the application domain, and then fit that model using data. That is, the model used statistics and probability theory to make predictions based on patterns in data about what “sounded most likely to be right”. Such statistical approaches also allow for imperfect fits, and are more resistant to mistakes, while a more rule-based approach is necessarily more fragile.

While such machine learning approaches represent considerable advancements, they still require considerable software engineering and more computational power (more powerful hardware). The model must be carefully designed for the domain. For example, a model for grammar must still know that that sentences are divided into words, and that words are put together into hierarchical units such as “direct object” or “noun phrase”, even if the exact correspondence is learned statistically. The approach can also require considerable feature engineering. A feature is an aspect of the data that is not readily visible, but easy to extract. Feature engineering is the art of identifying the useful features and extracting them. A simple example of feature engineering is breaking the word “jumping” into the verb root and the suffix. Voice recognition, in particular, required sophisticated feature engineering to separate the relevant speech information from the ambient sound setting.

2010s: Deep learning and neural networks

As AI has progressed, it has come to rely more and more on data – which makes intuitive sense because more and more data are available (Dean, 2014). The latest advance that big data makes possible is to replace the careful engineering of models and features with very flexible general-purpose models that are trained on enormous amounts of data. The current generation of AI models represents the endpoint of that progression, with deep learning. Deep learning replaces a specialised model with a general deep neural network that encodes only the most high-level information (Goodfellow, Bengio, & Courville, 2016). For example, a neural network for image recognition will know which parts of a picture are near each other, while a neural network for automatic translation will know that it takes in a sequence of words and that it outputs a sequence of words. Everything else is learned by example.

While deep learning with neural networks today is the state-of-the-art machine learning technique, neural networks actually predate the invention of the computer, and were introduced as a simple model of the human brain by McCulloch and Pitts in 1943 (McCulloch and Pitts, 1943). Since then, the field of AI has had a stormy relationship with the idea.

A neural network consists of a collection of nodes, which are roughly analogous to neurons. Input nodes represent information coming into the neural network (such as text, images, or sound), while output nodes represent the conclusion the network draws from its input. There can also be one or more hidden layers, which represent internal processing. In the simplest form of network, the input layer is directly connected to the output layer. In more complicated networks, the input layer is connected to a hidden layer, which can be connected to another hidden layer, etc., which is finally connected to the output layer. Information flows through the connections. A neural network is “deep” if it has many hidden layers where each successive layer takes the resulting output from the fitness functions (special math-based “filters”) of the previous layer and employs it as a new set of input data for the next layer.

Neural networks, like other methods for machine learning, by themselves do not know how to learn from example. Solving the learning problem is why it took so long for neural networks to become the dominant machine learning technique. Along the way they were abandoned several times. A method to train neural networks without hidden layers was invented in the 50s, but by the end of the 60s it was clear that many relationships required hidden layers, which led to neural networks being abandoned. In the 70s a method to train neural networks with one hidden layer was invented (Gurney, 2014; Aggarwal, 2018). In theory, one hidden layer is enough, but in practice it frequently is not. The limitations of the one-hidden-layer approach led to neural networks being abandoned again in the 90s. All this changed in 2010. Then, incremental advances in computational power and technique allowed deep neural networks to be trained. They quickly superseded previous techniques of image recognition, voice recognition and automatic translation.

The secret to the success of deep neural networks is that the hidden layers allow the neural network to run the complex problem through the multi-layered “tunnel” of extremely large numbers of “filters”, where each filter represents a mathematical model (i.e., fitness function) adapted to solving the specific initial problem (such as multi-class classification or non-linear regression). For example, in image recognition, when running through multiple sets of filters, small details of the picture, such as “this is an edge”, are picked out on the lower levels of the neural network while big features, such as “this is a cat”, are picked out on the higher levels. The most remarkable property is that this behaviour is not programmed into the network – the network learns it from examples. This is only possible because today networks can be trained with tremendous amounts of data.

AI, BIG DATA AND DATA REGULATIONS

The technological progress of AI, combined with advancements in data storage, processing and generation, means that technology now has evolved to a point where virtually all kinds of structured and unstructured data can potentially be stored, processed, analysed and combined (Zikopoulos & Eaton, 2011). This has opened up a sea of opportunities for both established actors and newcomers in the financial sector. One set of opportunities concerns automating or improving services and activities that are traditionally conducted by humans. Even as we speak, algorithms are trading on stock exchanges all over the world; they automatically identify credit card scams, provide consumers with investment management and advice, and are even used by investment banks to detect rogue traders before they go astray. Another set of opportunities is related to the pursuit of new services and business models that were previously practically infeasible, or even science fiction. For example, AI and big data can be used to tailor services and products to individual consumers’ preferences and to handle customer contact via chatbots, plus a whole range of other applications that are seemingly only limited by our imagination.

But the technological advancements also come with some challenges. One particular and frequently mentioned challenge currently faced by many firms in the financial sector is that international tech giants like Google, Facebook and Alibaba, which have both really big data and superior capabilities in analysing them, have an unprecedented opportunity to dominate any sector of the economy they set their sights on – including finance. Angelshaug et al. (this issue) go as far as to say that business model innovations from such existing actors with large data advantages in other markets may represent one of the greatest threats currently faced by incumbent financial firms.

Another related challenge is that European companies face stricter data regulations than their American and Chinese counterparts. While American and Chinese companies have a free hand in collecting data on their domestic clients, Europe has moved in a different direction. Privacy concerns motivated the European Union to implement the General Data Protection Regulation (GDPR) in 2018, and in 2019 the EU also released a set of ethical guidelines for “trustworthy AI”, explicitly advocating a “human-centric” approach to AI.

The General Data Protection Regulation (GDPR)

The GDPR was introduced in May 2018, and replaced the EU’s existing Data Protection Directive. The purpose of the GDPR, as stated by the EU, is threefold. First, to harmonise data privacy laws across Europe; second, to protect and empower the data privacy of EU citizens; and third, to change how firms and organisations approach data privacy within the EU.

While the main purpose of the GDPR is to protect EU citizens, its implementation of the GDPR effectively places a number of constraints on how AI can be used in Europe. It limits the data firms can collect, how they can use data, and the extent to which algorithms can make automated decisions, and it also increases costs related to compliance. It is beyond the scope of this paper to give a detailed overview of the GDPR and how the European regulations differ from regulations elsewhere. Instead we highlight two central aspects of the regulation that are particularly relevant to our discussion in order to exemplify how stricter European data regulations restrict the use of big data and AI firms operating in Europe.

The first is that the GDPR requires firms to limit the amount and type of data they collect about individuals, and to refrain from using data for other purposes than what was originally intended when collected. The first part here is self-explanatory, but the latter restriction is more interesting. If we recall the earlier discussion between data and information, this requirement basically means that data cannot be used to generate information (or knowledge) beyond what the data were originally intended to be used for. If a firm wants to use data for other purposes than what the users have consented to, it needs additional consent. To illustrate the consequences of this, think of Facebook. While Facebook’s original purpose was to collect all sorts of data on our personal and social media activity to provide advertisers with access to micro-segments, the data in Facebook’s possession can be used for so much more. Researchers have shown that data from Facebook can be used to make accurate predictions of a person’s psychological profile (Youyou, Kosinski and Stillwell, 2015) and likelihood of depression (Choudry et al., 2014), and can also be used as a basis for assessing credit risk, pricing insurance, etc. (Cnudde et al., 2019). And the list could continue indefinitely. With GDPR, such innovative use of data is forbidden, unless the user has given specific consent to data being used for this purpose. In practice, this means either that data-driven innovation and experimentation will be curbed, and/or that a firm needs to run to its customers with updated “terms of use” on a regular basis – which is not popular with customers. Combined with the requirement that firms must limit the amount and type of data on individuals they collect, this aspect of the GDPR implies that certain areas of the big data opportunity space are out of reach in Europe.

The second core aspect of the GDPR we want to highlight is that it restricts the algorithmic automation of decisions that have a considerable impact on indivi­duals. This essentially means that “intelligent machines” cannot automatically make decisions about e.g. whether a person should get a loan or not, without that decision having been subject to human review. Furthermore, the GDPR also stipulates that individuals have the right to an explanation for why a given decision was made, thus placing additional constraints on the types of AI models that can be used as decision support. Recall from our earlier discussion of deep learning that the state-of-the-art models of today usually have many hidden layers. Neural networks with many hidden layers may produce very precise and effective algorithms, but this also implies that it is often difficult (if not impossible) for a human being to understand exactly how a neural network arrived at a given prediction. This, combined with the requirement that individuals have the right to have decisions reviewed by a human, poses considerable limitations on the use of personal data in deep learning models. In fact, the EU’s recently released ethical guidelines for trustworthy AI explicitly state that the EU aims for a “human-centric” approach to AI where human agency and oversight are key features.

The GDPR and the Competitiveness of European firms

The restrictions on AI put forward by the GDPR have led people to outline their concerns about the potentially negative implications of the GDPR for the competitive ability of European AI firms, compared with their international counterparts. The most popular narrative is that the restrictions on the use of AI and big data placed by the GDPR effectively reduce the ability of European firms to develop AI models and AI-driven services that can stand up to competition from their Chinese and American counterparts (e.g. Chivot and Castro, 2019). This concern is intuitively plausible. As mentioned earlier, modern AI has become increasingly data intensive, and restrictions on the use of AI, and on the type and amount of data available to train AI models, should put European firms at a disadvantage.

However, there is also an alternative narrative, namely that the strict European regulations may instead provide European firms with a lifeline in their battle against the American and Asian technology giants. This potential narrative is that due to the EU data restrictions, European companies have to rely on different data, and develop different specialised capabilities, AI algorithms and business models in order to succeed with the “human-centric” European version of AI. International competitors face exactly the same restrictions as European companies when competing in European markets. This essentially means that these firms cannot necessarily just “copy and paste” solutions, capabilities and business models from the US and China, but have to develop specialised capabilities, algorithms, etc. for the European market. And if the core capabilities, algorithms and business models needed to effectively compete in Europe are distinctively different from those needed in other markets, this will raise the entry barrier for potential newcomers by driving up the need for irreversible investments.

A key insight here is that deep learning with big data leads to a completely different style of product design. The actual software development aspect is pushed into the background as the most important elements to succeed are having large amounts of training data for the neural network to learn from and sufficient computational resources to teach it. Success comes to those with big data and the powerful computers to use it. The fewer restrictions on the use of data and automated decision making, the greater the relative importance of big data and processing power. On the surface, this provides a significant competitive advantage for US and Chinese companies.

However, as mentioned, European firms have to adopt a different style of AI, one that relies on fewer data and that incorporates human agency and oversight. The more optimistic view taken by European companies is that this is a research opportunity. Firms operating in countries with large amounts of data have no incentive to develop less data-dependent methods, while firms operating in Europe are forced to push the research frontier in “less big data-centric” approaches to AI. This will likely lead European companies to develop data capabilities that differ from those of their American counterparts, with different algorithms and different business models. The data advantage accumulated by the big technology giants in their home markets in the US and China might therefore constrain their ability to develop products and services that work in a GDPR-compliant environment.

A related, and admittedly more speculative, point worth mentioning is that the capabilities and business models developed in a GDPR-compliant Europe may eventually prove beneficial in markets outside of Europe as well. While the EU has been at the forefront of data regulations, it seems far from unlikely that other developed countries like the US will follow suit in one way or another. If the US were to implement GDPR-style regulations further down the road, it may be the European firms that have an advantage as they were forced to focus on more parsimonious approaches to AI than other countries.

One (albeit imperfect) parallel to this logic is the Great Firewall of China. China’s creation of hurdles for American firms meant that the US tech industry lost the race to dominate the Chinese market. While this was probably not the main goal of the attempt to censor the internet on the Chinese side, it was certainly not an unwelcome consequence. This is analogous to the idea that by raising protective import duties on certain products, a country can develop its local industry relative to that of more advanced outsiders. Economic theory, as early as the German economist Friedrich List in the 1840s, discusses this idea in detail (List, 1841).

One potential course of action resulting from the narrative we have put forward here is that firms can train their algorithms outside Europe, and then use them within it. More specifically, this would imply that firms that are unconstrained by the GDPR could for example codify the results of deep-learning models trained on unconstrained amounts of data outside Europe to form a neural network that could be imported to Europe. There are, however, some obvious objections to this approach.

One objection is that this approach raises many of the same ethical issues that led to the implementation of the GDPR in the first place. An American company that enters European markets, while at the same time vigorously avoiding the GDPR, could earn itself a public relations nightmare. Another is that such an approach is also inherently restricted by the limitations of machine learning and neural networks themselves. The algorithms can learn arbitrary patterns from the data, but they are only as good as the data they learn from. If there are features unique to the Norwegian market and Norwegian customers, then there is no chance that the algorithm will learn these. The algorithms do not actually know anything about the world or human nature, and have no capacity for reason. Despite the dramatic successes of some applications, the algorithms are very far from human intelligence. Instead they are very sophisticated pattern-matching machines.

In sum, we therefore believe that the stricter European data regulations, such as the GDPR, will impose constraints for the ability of international companies to enter European markets. While we refrain from claiming that this will enable European firms to succeed in fighting off international competition in a similar fashion as the Firewall of China, we do believe that it will buy European firms time. And if this window of opportunity is used to develop superior capabilities, less data-centric algorithms and human-centric AI business models, it will probably enable European AI firms to be more competitive vis-à-vis their international counterparts (in Europe) than would a situation without strict data regulations.

CONCLUSION

The purpose of this paper has been to propose an alternative, more hopeful narrative about the competitive implications of Europe enforcing a stricter data regulation regime than countries elsewhere. In doing so, we started with an overview of basic data theory and AI, before looking more closely at some core features of the GDPR and its implications for AI-driven business models.

We do not claim that our alternative, more hopeful narrative will automatically result in European companies dominating Europe, while the international giants dominate everywhere else. Instead, we should think of this discussion more as raising awareness of a set of conditions that may delay, or complicate, the expansion plans or business model innovations of international firms targeted at European markets. At the very least, we believe the stricter European data regulations creates constraints that will buy European firms more time. But whether or not the final outcome is a success for European companies depends fundamentally on how they use this window of opportunity provided by European regulators.

  • Aggarwal, C.C. (2018). Neural networks and deep learning. Cham: Springer International Publishing.
  • Angelshaug, Knudsen & Saebi (2019). Nye Forretningsmodeller i Bank og Finans: Muligheter og Trusler, Magma, December issue.
  • Beath, C., Becerra-Fernandez, I., Ross, J., & Short, J. (2012). Finding Value in the Information Explosion. MIT Sloan Management Review, 53(4), 18–20.
  • Chaffey, D. (2007). E-business and E-commerce Management: Strategy, Implementation and Practice. Pearson Education.
  • Chivot, E., & Castro, D. (2019). The EU Needs to Reform the GDPR to Remain Competitive in the Algorithmic Economy. Center for Data Innovation, 1–23. Retrieved from https://www.datainnovation.org/2019/05/the-eu-needs-to-reform-the-gdpr-to-remain-competitive-in-the-algorithmic-economy/1/21
  • Dean, J. (2014). Big data, data mining, and machine learning: value creation for business leaders and practitioners. John Wiley & Sons.
  • De Choudhury, M., Counts, S., Horvitz, E.J., & Hoff, A. (2014, February). Characterizing and predicting postpartum depression from shared facebook data. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (pp. 626–638). ACM.
  • De Cnudde, S., Moeyersoms, J., Stankova, M., Tobback, E., Javaly, V., & Martens, D. (2019). What does your Facebook profile reveal about your creditworthiness? Using alternative data for microfinance. Journal of the Operational Research Society70(3), 353–363.
  • Feldman, R., & Sanger, J. (2007). The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press. https://doi.org/10.1179/1465312512Z.00000000017
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  • Gurney, K. (2014). An introduction to neural networks. CRC Press.
  • List, F. (1841). Das nationale System der politischen Ökonomie (The national system of political economy). Stuttgart, W Germany: JG Cotta.
  • Luger, G.F. (2005). Artificial intelligence: Structures and strategies for complex problem solving. Pearson Education.
  • McAfee, A., & Brynjolfsson, E. (2013). Big Data: The Management Revolution. Harvard Business Review, (October 2012), 1–9.
  • McCulloch, W., & Pitts, W. (1943). A Logical Calculus of the Ideas Immanent in Nervous Activity. Bulleting of Mathematical Biophysics, 5, pp. 115–133.
  • O’Reilly, T. (2007). What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. Communications & Strategies, 1(First Quarter), 17. https://doi.org/10.2139/ssrn.1008839
  • Oxford English Dictionary. (2019). Artificial Intelligence.
  • Papert, S. (1966). The Summer Vision Project. Artificial Intelligence Group, Vision Memo no. 100.
  • Rowley, J. (2007). The wisdom hierarchy: representations of the DIKW hierarchy. Journal of Information Science, 33(2), 163–180.
  • Walmsley, J. (2012). Classical Cognitive Science and “Good Old Fashioned AI”. In Mind and Machine (pp. 30–64). Palgrave Macmillan, London.
  • Witten, I.H., Frank, E., Hall, M.A., & Pal, C.J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
  • Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences112(4), 1036–1040.
  • Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media.

© Econas Informasjonsservice AS, Rosenkrantz' gate 22 Postboks 1869 Vika N-0124 OSLO
E-post: post@econa.no.  Telefon: 22 82 80 00.  Org. nr 937 747 187. ISSN 1500-0788.

RSS