Chapter 1 Introduction: Understanding identity data matching in transnational security infrastructures

The message is that there are no “knowns.” There are things we know that we know. There are known unknowns. That is to say there are things that we now know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know. So when we do the best we can and we pull all this information together, and we then say well that’s basically what we see as the situation, that is really only the known knowns and the known unknowns. And each year, we discover a few more of those unknown unknowns. (Rumsfeld 2002)

1.1 Information gaps and blind spots in identification

Former US Secretary of Defense Donald Rumsfeld made this observation in a 2002 press conference, which has since captivated academic and lay audiences alike. In his observation, he distinguished between what he called “known knowns” (i.e., facts that authorities are confident they know) and “known unknowns” (i.e., facts that authorities are aware they do not yet know). However, he also pointed out that there are “unknown unknowns” or peculiar blind spots that authorities don’t know about and don’t even realize they don’t know. In the context of this thesis, Rumsfeld’s idea of “known unknowns” and “unknown unknowns” offers an insightful lens through which to introduce the challenges of identifying people in border security and migration control. Identifying and tracking individuals across national borders is rarely straightforward, as data are only sometimes complete or readily available.1 This raises the question of how authorities can effectively identify individuals despite incomplete data sets, aliases, or even false identities, as well as how authorities can acknowledge and address the incompleteness of information about something they are not aware of.

Rumsfeld’s concept of “known unknowns” can be applied to situations where an individual’s data are present in a database but is not directly linked to their identity data in other systems. These “known unknowns” or “blind spots” can hinder the ability of authorities to identify people fully because of legal, organizational, or technical challenges. One example of a “known unknown” in identifying people at borders could be an individual with multiple identity data in different databases or systems that have not been linked. For instance, international watch lists contain information on individuals suspected or known to be involved in criminal activities. Individuals on these lists are often listed with multiple known aliases to address the challenge of linking all their identity data together to establish that they may be the same person. In other words, authorities already recognize these individuals, but there remain uncertainties regarding their identification. As such, the ambiguity of personal identity data creates a “known unknown” regarding individuals’ identities, which can have implications for security and law enforcement purposes.

“Unknown unknowns” can apply to information that is not only unknown but also unidentifiable through traditional means. Technology can play a role in detecting — or should we say enacting? — such blind spots by enabling the analysis and correlation of large amounts of identity data. For instance, advanced algorithms and machine learning techniques can detect previously unknown connections and patterns in the data. In their book “Algorithmic Reason,” Aradau and Blanke (2022) offer an intriguing case in which two journalists were potentially flagged as persons of interest by a United States security agency using algorithms that detected anomalies in regular data patterns. In the previous case of “known unknowns,” individuals may be placed on a watchlist, and it is known that these persons may use different names or identity documents. Conversely, “unknown unknowns” involve entirely unknown connections and patterns that have yet to be discovered. By connecting various identities and other data and finding patterns, it becomes possible to identify or re-identify someone who is not yet known or of interest to authorities, thus potentially uncovering “unknown unknowns.”

The problem of identifying and connecting identity data is not new and has been a challenge for various domains of knowledge. However, in recent decades, data collection, storage, and analysis have been significantly impacted by processes of datafication, resulting in vast amounts of personal information being processed (e.g., Borgman 2015; Kitchin 2014). Hence, one critical development in this context has been the growth of data matching technology to identify individuals across multiple sources (e.g., Christen 2012; Harron et al. 2017; Talburt 2013). By utilizing data matching technology, individuals can be identified even when the information is incomplete or inconsistent, thanks to the comparison and reconciliation of data, such as name, address, and identification numbers. As a result, data matching tools are widely deployed in fields where up-to-date information is vital, such as healthcare, finance, and law enforcement (Talburt 2013).

In the healthcare sector, data matching technology is used to match patient data across different systems to ensure accurate patient identification and prevent medical errors (e.g., Lee, Clymer, and Peters 2016; McCoy et al. 2013; Sauleau, Paumier, and Buemi 2005). By utilizing patient information like names, birthdates, and social security numbers, medical records can be effectively matched and organized across diverse healthcare systems and databases (Zech et al. 2016). In financial intelligence, data matching is used, among others, to detect fraudulent activities and for sanctions compliance. For instance, SWIFT, a worldwide provider of secure financial messaging services, employs data matching algorithms to aid financial institutions in complying with sanctions regulations by accurately identifying individuals on sanction lists who may use aliases or fraudulent identities to avoid detection (SWIFT 2018, 2021). In law enforcement, data matching technology is used to analyze data and aid investigations, such as identifying individuals involved in organized crime networks by linking their biographical information across databases (Ferguson 2017; Steinbock, n.d.). For example, data matching techniques are employed to analyze flight passenger data to identify patterns and potential threats by linking and analyzing individuals’ travel histories across different flights and airlines (Bellanova and Duez 2012; Hobbing 2010).

The previous examples underscore how data matching technology has become crucial in linking and reconciling personal data across multiple sources, given the increasing collection of information, such as electronic medical records, financial transactions, and online purchases. The technology enables organizations to create more comprehensive profiles of individuals, contributing to detecting fraudulent activities, ensuring regulatory compliance, and dealing with the siloed nature of data sources. Nevertheless, despite the increasing use of data matching technology in various sectors, there is still a lack of understanding of how it shapes the meaning of the things it connects, including identifying data as suspicious and shaping relations between organizations whose data are being matched and connected. This research seeks to contribute to a more performative understanding of the role of data matching technology by investigating how it shapes the meaning of data, practices, and the organizations that use it. As a result, it is necessary to begin by recalling the history and applications of matching and linking data.

1.2 Connecting the dots: The development of data matching techniques

The use of computing technology to connect personal identity data has a long past that dates to the early days of punch card technology, even predating database technology, as evident from research such as Dunn (1946)’s “Record linkage” and Newcombe et al. (1959)’s “Automatic linkage of vital records.” The term “record linkage” is often used in public health, epidemiology, and demography to describe the practice of matching and linking records pertaining to the same individual across multiple data sets. In the fields of public health ((e.g., Jutte, Roos, and Brownell 2011) and demographics (e.g., Abbott, Jones, and Ralphs 2015), for example, linking data proved beneficial to states seeking to improve services for their citizens and facilitate research. Through the use of identifiers and shared attributes such as name, address, date of birth, or social security number, states could establish a more detailed profile of individuals by connecting data records (Newcombe and Kennedy 1962). Difficulties arose when attempting to link personal data because of data quality issues, such as inconsistencies in name spellings or missing information, prompting the development of new technologies and techniques to tackle these challenges.

The emergence of electronic computers and database technology enabled more sophisticated matching algorithms to be developed, leading to increased adoption in other fields (Batini and Scannapieco 2016; Christen 2012). As a result, the process of matching data sets and linking records is now referred to by various names, such as data matching, data linking, data merging, data integration, record linkage, deduplication, or entity resolution, depending on the context and application (Christen 2012). This dissertation will use the term data matching as it is a more general term referring to identifying records in data sets that refer to the same real-world persons (or other entities) and reconciling duplicates or inconsistencies between data sets.

Another concept related to data matching is schema matching (Bellahsene, Bonifati, and Rahm 2011; Kementsietsidis 2009). Schema matching addresses the challenge of integrating data from multiple sources that have different schema structures or data models. Schema matching’s importance stems from its capacity to facilitate data integration across disparate data sources, which is frequently required for effective data matching. For example, identifying records referring to the same person may be challenging without knowledge of the underlying data models, as the same person may be represented differently across different data sets. Therefore, in fields where data are fragmented and dispersed across multiple sources, data matching and schema matching are crucial components of successful data management and integration. While the term “schema matching” will not be used in the dissertation, the question of how to investigate correspondences and differences between different data models (i.e., schemas) that underpin the data will be explored in greater depth.

Over the years, various data matching methods and techniques for classifying matches have been devised (Batini and Scannapieco 2016; Christen 2012; Fellegi and Sunter 1969; Winkler 2014). The following standard data matching methods can be distinguished based on the literature. One of the most basic techniques for identifying matching records is deterministic matching, which employs predefined rules or criteria. For example, when two records have the same first name, last name, and date of birth, they are considered a match. Another approach is probabilistic matching, which uses statistical algorithms to calculate the probability that two records are a match based on the similarity of their categories of data. If, as in the previous example, two data records have similar but not identical names or dates of birth, the records may still be considered a match based on the probability calculation. Another approach is rule-based matching, which can combine deterministic and probabilistic methods to find matches and incorporate expert knowledge or domain-specific rules to increase accuracy. Finally, matching techniques based on machine learning are gaining popularity. Such methods employ algorithms that can learn from data to improve accuracy and more easily adapt to new data sources.

While data matching may seem like a technical process, its increasing use and impact on society and individuals mean that it has significant consequences that should not be overlooked. With the growth of the internet and the digitalization of many aspects of contemporary life, data matching has become even more ubiquitous, with many actors using these techniques to link data from different sources and gain insights into individuals, their behavior, and preferences (Clarke 1994; Gandy 1989; Zuboff 2015). Furthermore, using data matching algorithms in automated systems can introduce errors and biases, making some people disproportionately the target of surveillance and control (e.g., Aradau and Blanke 2021; Benjamin 2019; Eubanks 2018). For instance, the German-Lebanese citizen Khalid al-Masri was imprisoned and tortured by the CIA in 2003 after being mistakenly identified as a suspected terrorist with a similar name (Priest 2005). Data matching technologies play a crucial role in these processes by allowing for the analysis and correlation of vast amounts of data and determining previously unknown connections in identity data.

Data matching has a long history of addressing the challenges posed by fragmented, incomplete, and duplicated information across multiple sources, with the development of various techniques. However, data matching is not just a technical process that can potentially discover previously unknown connections, but can alter the things being connected. These connections can affect the meaning of the data, practices, and the organizations that use it. For instance, one could argue that by matching flight passenger data to terrorist watch lists, the identification of a match alters the original meaning of the passenger data, and changes the role of organizations such as airline carriers (see also, Amoore and de Goede 2005; Bellanova and Duez 2012). Over time, passenger data has evolved from simple travel information to a powerful tool that connects data, allowing for the identification of suspicious travel patterns and the detection of individuals who may be considered security risks.

As such, it is crucial to understand how data matching technology shapes the meaning of data and practices, including identifying data as suspicious and shaping relationships between organizations whose data are being matched and connected. This research seeks to contribute to a more performative understanding of the role of data matching technology by investigating how it shapes the meaning of data, practices, and organizations. The choice of exploring data matching in border security and migration control is linked to the overarching Processing Citizenship (PC) project, which aims to understand how data infrastructures for processing migrants and refugees co-produce individuals and Europe (PC n.d.).2 The purpose of investigating the use of matching and linking data in the context of identity data in border security and migration management in this dissertation is thus closely connected to the PC project’s aim of exploring how the production, evaluation and circulation of data about third-country nationals are reshaping European governance (see also, Pelizza 2019; Pelizza and Loschi 2023). Specifically, this research aims to examine the use of matching and linking data in the context of identity data in border security and migration management. The following section takes a closer look at how data matching is used in this context by exploring a contemporary example of data matching in migration and border control within the European Union.

1.3 Leveraging data matching for border control

In light of the recent terrorist attacks in Europe and the increase in irregular migration in recent years, action needs to be taken to address this risk of information gaps and blind spots. The measures in this proposal [Interoperability of EU information systems for security, border and migration management] will ensure the various systems can exchange data and share information so that authorized bodies and officers have the information they need to strengthen our borders and better protect Europe. (European Commission 2017)


Establishing a common repository of data would overcome the current fragmentation in the EU’s architecture of data management for border control and security. This fragmentation is contrary to the data minimization principle, as it results in the same data being stored several times. Where necessary, the common repository would allow for the recognition of connections and provide an overall picture by combining individual data elements stored in different information systems. It would thus address the current knowledge gaps and shed light on blind spots for border guards and police officers. (European Commission 2016b, 18)


[One of] the four technical components of the proposal [is] a multiple identity detector — this will verify whether the biographical data that is being searched exists in multiple systems, helping to detect multiple identities. It has the dual purpose of ensuring the correct identification of bona fide persons and combating identity fraud. (European Commission 2017)


These quotes reveal how in the European Union (EU) context data matching is regarded as a critical component in addressing identity issues in migration and border control systems, including identifying multiple identities. The quotes above refer to a project linking identity data of different EU information systems for security, border, and migration management.3 Presently, each of these EU information systems operates independently of its database and serves a distinct purpose, such as managing asylum requests, processing visa applications, or supporting law enforcement activities. The proposal explicitly identifies a potential risk of information gaps and blind spots because data are not connected. It proposes to address this risk by connecting and sharing information from those multiple systems. Furthermore, the European Commission (EC) communication underscores the importance of having the necessary information to strengthen borders and identify potential threats.

The second quote describes the need for “establishing a common repository of data,” which would “address the current knowledge gaps and shed light on blind spots for border guards and police officers” (p. 18). Finally, the third quote describes a component for finding multiple identities that refer to the same person. Fragmentation of the EU’s data management architecture for border control and security is thus portrayed as causing duplicate data storage and leaving border guards and police officers with knowledge gaps and blind spots. According to this logic, the common repository and multiple identity detector components would enable the recognition of connections and provide a holistic view by combining data elements stored in different information systems.

The EU interoperability initiative introduces new components that emphasize the growing significance of data matching technologies. By allowing the matching of biometric data, visa data, and other identity-related information, the initiative aims to enhance the accuracy and efficiency of EU information systems for mobility and border control. However, the use of these new components is not just limited to improving the functioning of these systems. They will also be pivotal in implementing new, interlinked forms of identification of individuals deemed suspicious based on the links between data sets (Quintel 2018) based on probabilistic, rule-based, or machine learning-based data matching. Understanding the performative nature of data matching technology is essential, as it shapes the perceptions and treatment of individuals in the context of border security and migration management.

Note that the European Commission is not building these systems by itself; it increasingly relies on global information technology suppliers and integrators (Lemberg-Pedersen, Rübner Hansen, and Halpern 2020; Valdivia et al. 2022). A research gap exists in understanding how data matching technologies, which are increasingly developed by commercial entities for global use (see, for example, Leese 2018; Lemberg-Pedersen, Rübner Hansen, and Halpern 2020; Valdivia et al. 2022; Zureik and Hindle 2004), operate and influence processes of identification in a sensitive domain. The international and commercial dimensions of identification technology mean that there is a growing need to examine how the private sector is involved in developing and implementing these standardizing technologies. As Pollock and Williams (2009) have noted, criticism of standardized software and focus on how poorly such software adapts to different settings is not a sufficient research perspective. The widespread use of standardized identification software requires comprehending how such software is produced and adapted to operate in various contexts, as well.

1.4 Unpacking the challenge of analyzing data matching in transnational data infrastructures

The previous examples show how identity data matching has developed into an integral part of border and migration control systems to enable the identification and tracking of individuals across different systems and jurisdictions. As such, matching data from national and international sources indicates that identification practices extend beyond the borders of nation-states, highlighting the internationalization of identification. Yet, research on identification has typically concentrated on how states identify people. Data matching technology, however, is illustrative of the global dimensions of identification, which become apparent only when looking beyond the borders of individual nations.

A transnational perspective can thus shift the focus away from the nation-state and onto the various other actors involved in identification. The shift from state authorities creating and implementing identification technology to the state purchasing systems created by commercial organizations can be seen in various programmes involving global information technology companies. For example, the United States Automated Biometric Identification System relied on Cogent/Thales’ automated fingerprint identification technology (Thales Group 2021), while India’s Aadhaar biometric ID system utilized Accenture and Daon’s technology for combining different biometric modalities (Accenture 2010). Meanwhile, the upcoming European Entry/Exit System will utilize IDEMIA’s biometric matching systems (Accenture 2012). In this way, identification technology is increasingly becoming a commercial product rather than a creation of the state. Yet, little is known about how these actors developed identity data matching systems and put them to work. This lack of knowledge can be attributed to various factors, including the lack of transparency in developing these systems, limited access to information on their design and operation, and the complexity of the underlying technical and trans-organizational processes.

Data matching technology for identification can also be seen as a component of a broader data infrastructure (e.g., Flyverbom and Murray 2018; Kitchin 2014).4 Infrastructures are those things we depend on to make other things work (Edwards et al. 2009; Star and Ruhleder 1996). Hence, data infrastructure includes the technologies, protocols, regulations, habits, procedures, and agreements to handle and utilize data. In the case of data matching, the technology is an essential component of the infrastructure that enables the sharing, linking, and matching of identity data across various systems and organizations. The development and maintenance of this infrastructure require collaboration and coordination among different actors, including government agencies, private companies, and international organizations. Understanding data matching as part of data infrastructure provides a more comprehensive view of the interconnected systems that enable local, national, and international identification practices.

Considering these challenges, the following section will outline this study’s research problem, aims, and objectives, which seek to unpack the complexities of analyzing data matching in transnational data infrastructure.

1.4.1 Research problem, aims, and objectives

This dissertation aims to contribute to a more performative understanding of the role of data matching technology by investigating how it shapes the meaning of data, practices, and organizations in transnational contexts, particularly in the securitization of the European border. Recognizing a gap in research regarding the performative effects of data matching technology in transnational security infrastructures, this study aims to empirically investigate its involvement in infrastructure development, security, and internationalization within the realm of identification. To address this research problem and achieve the overarching research aims, the dissertation outlines specific research objectives:

  • To map the theoretical landscape related to internationalization, securitization, and infrastructuring of identification and derive the dissertation’s research question for investigating data matching in transnational data infrastructures (Chapter 2).
  • To develop a methodological framework for analyzing data matching in transnational infrastructures using methodological strategies to uncover the embedded and less obvious technical details of matching and linking identity data in infrastructures (Chapter 3).
  • To introduce a new method and software tool for analyzing the schemas that underpin information systems in population management (Chapter 4).
  • To examine the relationship between identity data matching technologies and routine identification practices (Chapter 5).
  • To investigate the long-term development of identification systems and building of transnational data infrastructures by identifying contingent moments in their evolution to explore how data matching expertise travels and circulates (Chapter 6).

The research aim to map the theoretical landscape aims to help understand how the meanings, practices, and technologies of identification have changed as it has become more international, commercial, linked to security issues, and part of broader infrastructures. The methodological framework and methodological strategy aim to provide an overarching framework for analyzing data matching in transnational infrastructures. The objective of introducing a new method and software tool to analyze the schemas underpinning information systems in population management is to examine the expectations and imaginaries of the schemas underpinning information systems. The objective of examining routine identification practices is to understand the relationship with identity data matching technologies, shedding light on how these technologies shape the utilization and meanings of data. Finally, the objective of investigating the long-term development of identification systems and the building of transnational data infrastructures is to explore how data matching expertise and technologies travel and circulate. The following section provides an overview of the dissertation’s structure, outlining the chapters and their focus.

1.5 Structure of the dissertation

The organization of this dissertation involves an initial mapping of theoretical concepts, followed by the development of a methodological framework to direct the analysis, and follows with empirical chapters. Chapter 2 starts by mapping theoretical concepts on identification and matching identity data, drawing on literature related to the internationalization and commercialization of identification, the securitization of identification, and the infrastructuring of identification. This chapter lays the groundwork for the subsequent chapters by discussing various theoretical perspectives on matching identity data and the implications of matching identity data for transnational data infrastructures.

Chapter 3 introduces the methodological framework for analyzing data matching in transnational infrastructures. The framework proposes three methodological strategies, wherein data matching serves as both a research topic and a methodological resource. These three strategies are based on comparing data models, analyzing data practices and tracing sociotechnical change. Comparing data models can reveal information collected by various organizations and systems; data practices can show the searching and matching of identity data within and across organizations; sociotechnical change can shed light on the circulation of data matching knowledge, technologies, and practices over time and across organizations. The chapter explains how these strategies were used in the dissertation’s fieldwork at a software company developing data matching technology. The chapter also describes the methods of data collection and the techniques of data analysis used in the dissertation.

Chapter 4 introduces the “Ontology Explorer” (OE) methodology, a semantic approach and an open-source tool to analyze the data models’ underlying information systems. The method draws inspiration from schema matching and is designed to compare data models in different formats used by various systems. This chapter explains how it is applied in the dissertation to reveal less visible assumptions and patterns in information systems design. Unlike other methods, the OE allows for the systematic comparison of non-homogeneous data formats and enables comparisons of data models across information systems run by diverse organizations and authorities. Therefore, the OE makes it possible to observe how identity data properties influence the production and circulation of data and the relations between different authorities’ data models.

Chapter 5 examines the relationship between technologies for searching and matching identity data and routine bureaucratic identification practices in migration management. The chapter focuses on how a government migration agency searches and matches applicants’ data using a data matching system. The chapter introduces the concept of “re-identification” to refer to the process by which subjects of bureaucratic procedures are re-identified in data infrastructures at various points in those procedures. The chapter demonstrates the implications of data matching in bureaucratic settings in two ways. First, the chapter shifts the usual focus from first registration to re-identification practices across data infrastructures. Secondly, the findings underscore that, while integrating data matching tools for re-identification alleviates data friction, it inadvertently also comes with certain costs.

Chapter 6 looks at the long-term development of identification systems and infrastructures. The chapter proposes two heuristics for detecting contingent moments in the evolution of identification technologies. First, it demonstrates how a data matching system’s changing “interpretative flexibility” allows discerning actors’ varying problematizations of identification, such as those related to the securitization of identification. Second, the chapter demonstrates how “gateway moments” make it possible to see the compromises necessary when building identification infrastructures and adapting globally honed technologies to new settings. Together, the chapter’s findings shed light on the activities of under-the-radar actors, such as commercial software vendors, whose distribution and reuse of systems have long-term implications for identification practices and infrastructures in various contexts.

The dissertation concludes in Chapter 7 with a summary of empirical findings, literature contributions, and reflections on the research process. Contributions include mapping the theoretical landscape of identity data matching, introducing a methodological framework for analyzing data matching in transnational infrastructures, proposing new methods for analyzing data matching and using these to examine the relationship between data matching technologies and bureaucratic practices. The study makes an additional contribution by delving into the long-term evolution of identification systems and infrastructures. Finally, the chapter acknowledges the study’s limitations and suggests areas for future research. The dissertation aims to advance our understanding of identity data matching by putting it into the STS and critical data studies agendas. It contends that matching identity data is a multifaceted phenomenon that requires a nuanced and interdisciplinary approach to understand how it shapes and is shaped by transnational data infrastructures.

References

Abbott, Owen, Peter Jones, and Martin Ralphs. 2015. “Large-Scale Linkage for Total Populations in Official Statistics.” In Methodological Developments in Data Linkage, 170–200. Chichester: John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119072454.ch8.

Accenture. 2010. “Unique Identification Authority of India (UIDAI) Selects Accenture to Implement a Multimodal Biometric Solution for ‘Aadhaar’ Program.” Press Release. Accenture Newsroom. https://web.archive.org/web/20220812203854/https://newsroom.accenture.com/article_display.cfm?article_id=5040.

Accenture. 2012. “European Commission Selects Consortium of Accenture, Morpho and HP to Maintain EU Visa Information and Biometric Matching Systems.” Press Release. https://web.archive.org/web/20201206154800/https://newsroom.accenture.com/subjects/client-winsnew-contracts/european-commission-chooses-consortium-of-accenture-morpho-and-hp-to-maintain-eu-visa-information-and-biometric-matching-systems.htm.

Amoore, Louise, and Marieke de Goede. 2005. “Governance, Risk and Dataveillance in the War on Terror.” Crime, Law and Social Change 43 (2-3): 149–73. https://doi.org/10.1007/s10611-005-1717-8.

Aradau, Claudia, and Tobias Blanke. 2021. “Algorithmic Surveillance and the Political Life of Error.” Journal for the History of Knowledge 2 (1): 1–13. https://doi.org/10.5334/jhk.42.

Aradau, Claudia, and Tobias Blanke. 2022. Algorithmic Reason: The New Government of Self and Other. Oxford, UK: Oxford University Press. https://doi.org/10.1093/oso/9780192859624.001.0001.

Batini, Carlo, and Monica Scannapieco. 2016. “Object Identification.” In Data and Information Quality: Dimensions, Principles and Techniques, edited by Carlo Batini and Monica Scannapieco, 177–215. Data-Centric Systems and Applications. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-24106-7_8.

Bellahsene, Zohra, Angela Bonifati, and Erhard Rahm, eds. 2011. Schema Matching and Mapping. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-16518-4.

Bellanova, Rocco, and Denis Duez. 2012. “A Different View on the ‘Making’ of European Security: The EU Passenger Name Record System as a Socio-Technical Assemblage.” European Foreign Affairs Review 17 (SI). https://doi.org/10.54648/eerr2012017.

Benjamin, Ruha. 2019. Race After Technology: Abolitionist Tools for the New Jim Code. Cambridge, UK: Polity.

Borgman, Christine L. 2015. Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, Mass.; London, Eng.: The MIT Press.

Bowker, Geoffrey C., Karen Baker, Florence Millerand, and David Ribes. 2009. “Toward Information Infrastructure Studies: Ways of Knowing in a Networked Environment.” In International Handbook of Internet Research, edited by Jeremy Hunsinger, Lisbeth Klastrup, and Matthew Allen, 97–117. Dordrecht: Springer Netherlands. https://doi.org/10.1007/978-1-4020-9789-8_5.

Christen, Peter. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Berlin; New York: Springer. https://doi.org/10.1007/978-3-642-31164-2.

Clarke, Roger. 1994. “Human Identification in Information Systems: Management Challenges and Public Policy Issues.” Information Technology & People 7 (4): 6–37. https://doi.org/10.1108/09593849410076799.

Dunn, Halbert L. 1946. “Record Linkage.” American Journal of Public Health and the Nations Health 36 (12): 1412–6. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1624512/.

Edwards, Paul N., Geoffrey C. Bowker, Steven J. Jackson, and Robin Williams. 2009. “Introduction: An Agenda for Infrastructure Studies.” Journal of the Association for Information Systems 10 (5): 364–74. https://doi.org/10.17705/1jais.00200.

Eubanks, Virginia. 2018. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. New York, NY: St. Martin’s Press.

European Commission. 2016a. “Commission Decision of 17 June 2016 Setting up the High Level Expert Group on Information Systems and Interoperability.” https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:32016D0715(01).

European Commission. 2016b. “Communication from the Commission to the European Parliament and the Council: Stronger and Smarter Information Systems for Borders and Security.” COM/2016/0205 final. https://publications.europa.eu/en/publication-detail/-/publication/702fa423-fca2-11e5-b713-01aa75ed71a1/language-en/format-PDF.

European Commission. 2017. “Frequently Asked Questions - Interoperability of EU Information Systems for Security, Border and Migration Management.” MEMO/17/5241. Strasbourg. https://ec.europa.eu/commission/presscorner/detail/en/MEMO_17_5241.

European Union. 2018. “Regulation (EU) 2018/1726 of the European Parliament and of the Council of 14 November 2018 on the European Union Agency for the Operational Management of Large-Scale IT Systems in the Area of Freedom, Security and Justice (Eu-LISA), and Amending Regulation (EC) No 1987/2006 and Council Decision 2007/533/JHA and Repealing Regulation (EU) No 1077/2011.” https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32018R1726.

European Union. 2019a. “Regulation (EU) 2019/817 of the European Parliament and of the Council of 20 May 2019 on Establishing a Framework for Interoperability Between EU Information Systems in the Field of Borders and Visa and Amending Regulations (EC) No 767/2008, (EU) 2016/399, (EU) 2017/2226, (EU) 2018/1240, (EU) 2018/1726 and (EU) 2018/1861 of the European Parliament and of the Council and Council Decisions 2004/512/EC and 2008/633/JHA.” http://data.europa.eu/eli/reg/2019/817/oj/eng.

European Union. 2019b. “Regulation (EU) 2019/818 of the European Parliament and of the Council of 20 May 2019 on Establishing a Framework for Interoperability Between EU Information Systems in the Field of Police and Judicial Cooperation, Asylum and Migration and Amending Regulations (EU) 2018/1726, (EU) 2018/1862 and (EU) 2019/816.” http://data.europa.eu/eli/reg/2019/818/oj/eng.

Fellegi, Ivan P., and Alan B. Sunter. 1969. “A Theory for Record Linkage.” Journal of the American Statistical Association 64 (328): 1183–1210. https://doi.org/10.1080/01621459.1969.10501049.

Ferguson, Andrew Guthrie. 2017. The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement. NYU Press. https://doi.org/10.2307/j.ctt1pwtb27.

Flyverbom, Mikkel, and John Murray. 2018. “Datastructuring—Organizing and Curating Digital Traces into Action.” Big Data & Society 5 (2): 1–12. https://doi.org/10.1177/2053951718799114.

Gandy, Oscar H., Jr. 1989. “The Surveillance Society: Information Technology and Bureaucratic Social Control.” Journal of Communication 39 (3): 61–76. https://doi.org/10.1111/j.1460-2466.1989.tb01040.x.

Hanseth, Ole, Eric Monteiro, and Morten Hatling. 1996. “Developing Information Infrastructure: The Tension Between Standardization and Flexibility.” Science, Technology, & Human Values 21 (4): 407–26. https://doi.org/10.1177/016224399602100402.

Harron, Katie, Chris Dibben, James Boyd, Anders Hjern, Mahmoud Azimaee, Mauricio L Barreto, and Harvey Goldstein. 2017. “Challenges in Administrative Data Linkage for Research.” Big Data & Society 4 (2): 1–12. https://doi.org/10.1177/2053951717745678.

Hobbing, Peter. 2010. “Tracing Terrorists: The European Union–Canada Agreement on Passenger Name Record (PNR) Matters.” In Mapping Transatlantic Security Relations, edited by Mark B. Salter. London: Routledge.

Jutte, Douglas P., Leslie L. Roos, and Marni D. Brownell. 2011. “Administrative Record Linkage as a Tool for Public Health Research.” Annual Review of Public Health 32 (1): 91–108. https://doi.org/10.1146/annurev-publhealth-031210-100700.

Kementsietsidis, Anastasios. 2009. “Schema Matching.” In Encyclopedia of Database Systems, edited by Ling Liu and M. Tamer Özsu, 2494–7. Boston, MA: Springer. https://doi.org/10.1007/978-0-387-39940-9_962.

Kitchin, Rob. 2014. The Data Revolution: Big Data, Open Data, Data Infrastructures & Their Consequences. London: SAGE Publications Ltd. https://doi.org/10.4135/9781473909472.

Lee, Martin Laurence, Robert Clymer, and Kate Peters. 2016. “A Naturalistic Patient Matching Algorithm: Derivation and Validation.” Health Informatics Journal 22 (4): 1030–44. https://doi.org/10.1177/1460458215607080.

Leese, Matthias. 2018. “Standardizing Security: The Business Case Politics of Borders.” Mobilities 13 (2): 261–75. https://doi.org/10.1080/17450101.2017.1403777.

Lemberg-Pedersen, Martin, Johanne Rübner Hansen, and Oliver Joel Halpern. 2020. “The Political Economy of Entry Governance.” Advancing Alternative Migration (ADMIGOV) Deliverable 1.3. Copenhagen: Aalborg University. http://web.archive.org/web/20230705132811/https://admigov.eu/upload/Deliverable_D13_Lemberg-Pedersen_The_Political_Economy_of_Entry_Governance.pdf.

Loukissas, Yanni Alexander. 2019. All Data Are Local: Thinking Critically in a Data-Driven Society. Cambridge, Mass.: The MIT Press.

McCoy, Allison B., Adam Wright, Michael G. Kahn, Jason S. Shapiro, Elmer Victor Bernstam, and Dean F. Sittig. 2013. “Matching Identifiers in Electronic Health Records: Implications for Duplicate Records and Patient Safety.” BMJ Quality & Safety 22 (3): 219–24. https://doi.org/10.1136/bmjqs-2012-001419.

Newcombe, H. B., and J. M. Kennedy. 1962. “Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information.” Communications of the ACM 5 (11): 563–66. https://doi.org/10.1145/368996.369026.

Newcombe, H. B., J. M. Kennedy, S. J. Axford, and A. P. James. 1959. “Automatic Linkage of Vital Records.” Science 130 (3381): 954–59. https://doi.org/10.1126/science.130.3381.954.

PC. n.d. “Processing Citizenship: Digital Registration of Migrants as Co-Production of Citizens, Territory and Europe.” ERC-2016-STG - ERC Starting Grant. ALMA MATER STUDIORUM - UNIVERSITA DI BOLOGNA, Italy: H2020-EU.1.1. - EXCELLENT SCIENCE - European Research Council (ERC). Accessed April 11, 2023. https://doi.org/10.3030/714463.

Pelizza, Annalisa. 2019. “Processing Alterity, Enacting Europe: Migrant Registration and Identification as Co-Construction of Individuals and Polities.” Science, Technology, & Human Values 45 (2): 262–88. https://doi.org/10.1177/0162243919827927.

Pelizza, Annalisa, and Chiara Loschi. 2023. “Telling ‘More Complex Stories’ of European Integration: How a Sociotechnical Perspective Can Help Explain Administrative Continuity in the Common European Asylum System.” Journal of European Public Policy, April, 1–22. https://doi.org/10.1080/13501763.2023.2197945.

Pollock, Neil, and Robin Williams. 2009. Software and Organisations: The Biography of the Enterprise-Wide System or How SAP Conquered the World. Routledge Studies in Technology, Work and Organisations 5. London; New York: Routledge.

Pollock, Neil, and Robin Williams. 2010. “E-Infrastructures: How Do We Know and Understand Them? Strategic Ethnography and the Biography of Artefacts.” Computer Supported Cooperative Work (CSCW) 19 (6): 521–56. https://doi.org/10.1007/s10606-010-9129-4.

Quintel, Teresa Alegra. 2018. “Connecting Personal Data of Third Country Nationals: Interoperability of EU Databases in the Light of the CJEU’s Case Law on Data Retention.” SSRN. http://hdl.handle.net/10993/35318.

Rumsfeld, Donald H. 2002. “Press Conference by Former US Secretary of Defence.” NATO HQ, Brussels. http://web.archive.org/web/20220922103427/https://www.nato.int/docu/speech/2002/s020606g.htm.

Sauleau, Erik A., Jean-Philippe Paumier, and Antoine Buemi. 2005. “Medical Record Linkage in Health Information Systems by Approximate String Matching and Clustering.” BMC Medical Informatics and Decision Making 5 (1): 32. https://doi.org/10.1186/1472-6947-5-32.

Star, Susan Leigh, and Karen Ruhleder. 1996. “Steps Toward an Ecology of Infrastructure: Design and Access for Large Information Spaces.” Information Systems Research 7 (1): 111–34. https://doi.org/10.1287/isre.7.1.111.

Steinbock, Daniel J. n.d. “Data Matching, Data Mining, and Due Process.” Georgia Law Review 40: 1. https://heinonline.org/HOL/Page?handle=hein.journals/geolr40&id=21&div=&collection=.

SWIFT. 2018. “Simplify the Complex World of Sanctions Screening: What Do Screening Activities Cover?” http://web.archive.org/web/20220226233517/https://www.swift.com/news-events/news/helping-simplify-complex-world-sanctions-screening.

SWIFT. 2021. “Name Screening: Fulfil Your Customer Due Diligence, Maintain Accurate Customer Risk Profiles and Mitigate Business and Reputational Risks.” Factsheet. http://web.archive.org/web/20230415104752/https://www.swift.com/swift-resource/250436/download.

Talburt, John R. 2013. “Special Issue on Entity Resolution Overview: The Criticality of Entity Resolution in Data and Information Quality.” Journal of Data and Information Quality 4 (2): 6:1–6:2. https://doi.org/10.1145/2435221.2435222.

Thales Group. 2021. “DHS’s Automated Biometric Identification System IDENT – the Heart of Biometric Visitor Identification in the USA.” Thales Group. http://web.archive.org/web/20230509140409/https://www.thalesgroup.com/en/markets/digital-identity-and-security/government/customer-cases/ident-automated-biometric-identification-system.

Valdivia, Ana, Claudia Aradau, Tobias Blanke, and Sarah Perret. 2022. “Neither Opaque nor Transparent: A Transdisciplinary Methodology to Investigate Datafication at the EU Borders.” Big Data & Society 9 (2): 1–17. https://doi.org/10.1177/20539517221124586.

Winkler, William E. 2014. “Matching and Record Linkage.” WIREs Computational Statistics 6 (5): 313–25. https://doi.org/10.1002/wics.1317.

Zech, John, Gregg Husk, Thomas Moore, and Jason S. Shapiro. 2016. “Measuring the Degree of Unmatched Patient Records in a Health Information Exchange Using Exact Matching.” Applied Clinical Informatics 7 (2): 330–40. https://doi.org/10.4338/ACI-2015-11-RA-0158.

Zuboff, Shoshana. 2015. “Big Other: Surveillance Capitalism and the Prospects of an Information Civilization.” Journal of Information Technology 30 (1): 75–89. https://doi.org/10.1057/jit.2015.5.

Zureik, Elia, and Karen Hindle. 2004. “Governance, Security and Technology: The Case of Biometrics.” Studies in Political Economy 73 (1): 113–37. https://doi.org/10.1080/19187033.2004.11675154.


  1. The word data is often treated as a mass noun, and hence, something that cannot be counted or divided (e.g., “the data is available”). In contrast, this dissertation uses data in its countable plural noun form (“data are”). I follow the convention of using this form to highlight that data are multiple and “arise from and are used in varied circumstances worth acknowledging” (Loukissas 2019, 13).↩︎

  2. The Processing Citizenship project, including this PhD research, was funded by the European Research Council in the context of the European Framework Program for Research and Innovation Horizon 2020, grant agreement No 714463, principal investigator Annalisa Pelizza.↩︎

  3. The interoperability initiative follows recommendations from the “High Level Expert Group on Information Systems and Interoperability” (European Commission 2016a) and a new legislative mandate (European Union 2018) for the European Agency for the operational management of large-scale IT systems in the area of freedom, security and justice (eu-LISA). It was split into two proposals due to different legal provisions for regulating (a) borders and visas and (b) police and judicial cooperation, asylum, and migration (European Union 2019a, 2019b).↩︎

  4. Earlier studies have employed alternative terminologies such as “information infrastructure” (e.g., Bowker et al. 2009; Hanseth, Monteiro, and Hatling 1996) or “e-infrastructure” (e.g., Pollock and Williams 2010) to refer to these assemblages of technological and social components that enable the flow and management of data across different settings and contexts.↩︎