Chapter 3 Towards a methodological framework for analyzing data matching in transnational infrastructures

This chapter develops a methodological framework that employs data matching within transnational commercialized security infrastructures to investigate the internationalization, commercialization, securitization, and infrastructuring of identification. To achieve this, three distinct methodological strategies are introduced, wherein data matching serves as both a research topic and a methodological resource. These strategies encompass 1) comparing data models, 2) analyzing data practices, and 3) tracing sociotechnical change. These strategies draw inspiration from Bowker (1994) and Bowker and Star (1999)’s concept of “infrastructural inversions,” which emerged to tackle the tendency of infrastructure, associated practices, and technologies to fade into the background when functioning seamlessly. Similarly, this dissertation proposes to use three aspects of data matching as inversion strategies that can bring less visible aspects of identification to the forefront. Comparing data models can allow apprehending organizations’ underlying assumptions about people in identification. Analyzing data matching practices can afford to comprehend how people are identified within and across organizations. Exploring sociotechnical change of data matching technology can enable tracking the circulation of identification knowledge, technologies, and practices over time and across organizations. The resulting methodological framework, which integrates these three inversion strategies, is then operationalized to address the research questions posed in this dissertation.

3.1 Data matching within data infrastructures as a topic and a resource of research

The methodological framework borrows the classical ethnomethodological understanding that “social structures [can serve as] both a topic and a resource for their inquiries” (Garfinkel 1964, 250). More recently, and particularly pertinent to this dissertation, this concept has been expanded to encompass digital devices, social media platforms, and data infrastructure as research subjects while also leveraging them as tools to explore other aspects of social life (Marres 2017; Pelizza 2016a; Pelizza and Van Rossem 2023; Rogers 2013; Weltevrede 2016). For example, utilizing social media platforms for social inquiries involves using platform features like posts, likes, and tags to gain insights into social dynamics rather than just researching social media use. However, it also always entails grappling with the inherent ambiguity arising from the intertwined nature of the research process and the affordances of the studied medium (Marres 2017; Weltevrede 2016). In a similar vein, this chapter will employ data matching practices and technologies as both a research topic and a methodological resource for investigating the internationalization, commercialization, securitization, and infrastructuring of identification.

There are two compelling reasons for adopting data matching as a methodological resource. Firstly, considering data matching as a distinct research topic presents an opportunity to discover how matching and linking identity data is practically achieved. Existing research, as highlighted in the preceding chapters, has yet to comprehensively explore the nuances of how identities in migration and border control are interconnected across diverse systems and databases. Secondly, delving into the “technical minutiae” (Pelizza 2016a) of data matching serves as a valuable resource for investigating broader shifts in identification. Data matching, characterized by its intricate interplay among various stakeholders, including diverse government agencies with interconnected data and commercial and security companies offering data matching technologies, can serve as a resource to explore transnational, commercialized security infrastructures in our research.

However, studying data matching presents its own challenges, given its propensity to operate inconspicuously within identification. Consider, for instance, the inconspicuous moments where our financial or travel data is automatically cross-referenced with police watch lists. Infrastructure studies have long been aware of such predicaments and have actively sought methods to bring these background elements to the forefront of analysis.

3.2 Infrastructural inversions for matching identity data

Infrastructure studies have provided different methodological strategies to invert the tendency of infrastructures to disappear and to make visible the interconnections between technical minutiae and the politics of knowledge production (e.g., Bowker 1994; Bowker and Star 1999; Edwards et al. 2009; Monteiro et al. 2013). Methods proposed by these authors to invert the tendency of infrastructure to disappear include looking at, among others, moments of breakdown (Star 1999), tensions in the emergence and growth of infrastructure (Hanseth et al. 2006), and material aspects (Ribes 2019). I propose to develop methodological strategies aimed at inverting transnational commercialized security infrastructures to elucidate the links between the intricate technical aspects of data matching and the dynamics of knowledge generation in identification. As indicated by Bowker and Star (1999):

Infrastructural inversion means recognizing the depths of interde­pendence of technical networks and standards, on the one hand, and the real work of politics and knowledge production on the other. It foregrounds these normally invisible Lilliputian threads and further­ more gives them causal prominence in many areas usually attributed to heroic actors, social movements, or cultural mores. (p. 34)

Expanding upon this quote in the context of data matching emphasizes the significance of acknowledging the less visible interdependencies between technical intricacies, politics, and knowledge production, often overshadowed by more apparent actors and social dynamics. This section will elaborate on three inversion strategies centered around 1) comparing data models, 2) analyzing data practices, and 3) tracing sociotechnical change. Comparative analysis of data models can provide insights into organizations’ underlying assumptions about individuals, as evidenced by the distinct data categories employed in the heterogeneous models used in matching data across various data infrastructures. Examining data matching practices across different infrastructures can facilitate a comprehensive understanding of how people are identified within and across organizational boundaries. Lastly, delving into the sociotechnical evolution of data matching technology can offer the means of tracing the diffusion of identification knowledge, technologies, and practices over time, thereby uncovering the long-term infrastructural implications.

3.2.1 First inversion strategy: Comparing data models

While data matching necessitates addressing the technical alignment of categories when integrating data models from diverse origins (Christen 2012), this aspect can be expanded to serve as a valuable research resource into the dynamics of knowledge production. Data models, designed to represent phenomena, are inherently interconnected with the politics of knowledge production as they become embedded in infrastructures and practices, instituting specific ways of knowing and working (Bowker and Star 1999; Bloomfield and Vurdubakis 1997; Hine 2006; Lampland and Star 2009; Timmermans and Epstein 2010). Designing data models and the knowledge they aim to capture is inherently political, determining how information is presented, what aspects are emphasized, and which elements may be marginalized (Bowker and Star 1999). This is particularly relevant in the context of border and migration management, where data models are used to categorize and identify individuals on the move across borders (Pelizza and Van Rossem 2023). While data models delineating specific categories of data may seem to offer limited information, they can be considered valuable research topics and methodological resources.

As a research topic, a systematic analysis of data models involving the detection of disparities and omissions holds promise in recovering the knowledge representations. Technical solutions designed to compare data models and automatically identify correspondences between data model concepts — as seen in knowledge engineering, linked data, and natural language processing techniques (Euzenat and Shvaiko 2007; Kementsietsidis 2009) — also emphasize the potential for recovering knowledge representations from discursive domains through an exploration of the relationships inherent in data models. In this context, a parallel can be drawn between these knowledge engineering methods, which aim to enhance machine comprehension of data for improved system functioning and information source integration, and data matching’s need for establishing connections between data models.

Data models can also be considered a methodological resource as a source of knowledge about their producers (Pelizza and Van Rossem 2023). Through comparative analysis, such as examining distinct authorities’ data models for identifying individuals, we can unveil not only the presence or absence of specific categories within one model as opposed to another but also reveal the inherent possibilities, limitations, and thus the underlying conceptions of diverse authorities regarding individuals in identification (Van Rossem and Pelizza 2022).9 A pertinent illustration comes from Bowker and Star (1999)’s exploration of classification systems, such as their work on the standards for categorizing nursing work, unveiling the types of work organizations value. In another instance, Cornford, Baines, and Wilson (2013) compared digital data standards initiated by public services in the United Kingdom to represent familial relationships. Their study showcased how data models can be employed to analyze “the kinds of family relationships that are recorded and those that are not recorded or harder to record, any hierarchies, implicit or explicit, for family forms or relationships and the implicit and explicit assumptions that underlie the terms and classifications used” (p. 8). Thus, the first inversion strategy capitalizes on insights gained from data matching’s need for data model alignment and extends this as a methodological resource to uncover the implicit assumptions about individuals within various identification systems.

3.2.2 Second inversion strategy: data matching practices

The second inversion strategy focuses on another dimension of data matching: the actual alignment of data within and across databases. Acknowledging the well-established notion that data cannot be regarded as “raw” values effortlessly aligning with their database models, this strategy draws on the notion that actual data results from a combination of localized human decisions and technical limitations (Bowker and Star 1999; Gitelman 2013). The process of matching data across varied organizations is thus thought to be intricately entwined with challenges stemming from ambiguity and uncertainty in data, which emerge from these diverse local contexts. Consequently, delving into data matching practices within the realm of identification unveils the practical mechanisms and challenges while also providing an opportunity to gain insights into the complex relationships that unfold among diverse actors in the interlinking of their data within transnational commercialized security infrastructures.

This recognition of the dual use of practices has also been well-established in the field of “practice theory,” which regards social practices as integral to broader societal investigations (Reckwitz 2002; Schatzki 2005; Shove, Pantzar, and Watson 2012). For example, practice theory scholars have examined routine activities like cooking (Rinkinen, Shove, and Smits 2019) or daily showering habits (Hand, Shove, and Southerton 2005) to gain insights into broader societal dynamics, such as food supply chains or patterns of energy consumption. A focus on practices has also been taken to understand the interlinkages between data practices and knowledge production in migration and borders (Cakici, Ruppert, and Scheel 2020; Scheel, Ruppert, and Ustek-Spilda 2019). For instance, to investigate how practices contribute to the enactment of race (M’charek, Schramm, and Skinner 2014), the government of mobilities (Glouftsios 2018), or how both European and non-European populations are enacted in Europe, along with the simultaneous enactment of institutions through data practices and infrastructures (Pelizza 2019). Analyzing data matching as a practice can offer a similar dual opportunity.

Firstly, it can enable the exploration of everyday practices associated with matching identity data and the dynamics of their evolution and adaptation. By analyzing data matching practices, we can also gain insight into the specifics of how technologies and practice mutually influence each other, revealing their evolving interdependence (Schatzki 2010; Ruppert and Scheel 2021; Hui, Schatzki, and Shove 2016). For instance, we could investigate how practitioners effectively match and link identity in the context of border control despite the heterogeneity between databases. In doing so, we can look at the specific technologies for matching identity data and how their embedded expertise in matching identity data, such as rules for determining name similarity, shapes the mutual interdependence between technology and practice. Such an examination of mutual relations may be crucial in understanding the interplay of data matching technologies, such as flagging irregularities in identity data that may lead to doubts about someone’s identity at the border.

Secondly, examining data matching as a practice can provide insights into the broader interconnections between practices and other significant societal phenomena (Shove 2016; Nicolini 2016). Just as the study of everyday practices such as driving can unveil interconnected patterns of social activities that illuminate broader societal dynamics, such as energy consumption (Shove, Watson, and Spurling 2015), the investigation of data matching practices is anticipated to unveil how these practices are shaped by and contribute to broader shifts in identification. Analyzing routine data matching practices will thus be used as a resource for investigating how these practices are intertwined with larger sociopolitical and economic forces, playing a role in the internationalization, commercialization, securitization, and infrastructuring of identification within the context of transnational commercialized security infrastructures.

3.2.3 Third inversion strategy: sociotechnical change

The third inversion strategy leverages the evolving developmental trajectory of data matching technology, employing it as both a research topic and a methodological resource. In migration and border control, data matching can be considered as operating as a component of more extensive infrastructures that interconnect various contexts and temporal scales, facilitating the exchange and utilization of information collected across diverse locations and moments. The evolution and use of data matching technologies will thus be interwoven with broader infrastructural developments, characterized by interactions among individuals, states, companies, and technologies, often giving rise to tensions as they strive to achieve distinct objectives (Edwards et al. 2007; Ribes and Finholt 2009). As previously noted, tensions and changes within such infrastructures tend to become invisible over time. While the first inversion strategy proposes to recover such design choices and tensions through comparative analysis, tracing technological evolution is another approach to investigate the matching and linking of identity data in transnational commercialized security infrastructures. Although researchers often concentrate on the final products of identification systems, adopting a methodological stance that traces their development can be advantageous for two reasons.

Firstly, tracing sociotechnical evolution can unveil technological design choices and elucidate the interactions among users, designers, system builders, and technological artefacts. Designers encode specific expectations into technological artefacts regarding users, systems, and data, which can, for example, be uncovered by focusing on how designers encourage specific interpretations (Woolgar 1990) or by comparing successful and failed adaptations to actual usage (Akrich and Latour 1992; Akrich 1992). For instance, data matching systems may hold certain expectations about how heterogeneous identity data can be matched, observable in technical specifications or instances of success or failure. Overall, the creation of technologies involves long processes with numerous choices and unforeseen outcomes, where diverse social actors often attribute varying meanings to artefacts and utilize technologies in distinct ways (Jackson, Poole, and Kuhn 2002; Pinch 2008; MacKenzie and Wajcman 1985; Pinch and Bijker 1984). For example, meanings and utilization of data matching technologies may have evolved differently between contexts like border and migration control, healthcare, or finance. Documenting the design choices and trajectories of data matching technology as a research topic can prove valuable for comprehending the interplay between users, designers, and technologies throughout the practical design, development, and real-world utilization of data matching technology, shedding light on embedded expectations and providing insights from both successful and unsuccessful adaptations to actual usage.

Secondly, utilizing sociotechnical evolution as a resource can facilitate exploring how these evolutions are intricately intertwined with broader sociopolitical and economic forces that both shape and are shaped by transnational commercialized security infrastructures. The instances when social groups challenge, alter or stabilize the meanings of technologies (Pinch and Bijker 1984) can offer insights not only into how technologies adapt but also into the evolving sociotechnical problems and solutions related to linking and matching identity data. This perspective can help understand how the technical intricacies of data matching intersect with the redefinition of sociotechnical challenges and solutions for identity data linking and matching, particularly within the context of growing securitization. Moreover, data matching plays a role in interconnecting previously independent systems into more extensive infrastructures. These moments can bring to light the technical challenges that arise when integrating once incompatible systems, such as requiring gateways for compatibility (Edwards et al. 2007). These junctures offer insights into broader dynamics within identification processes, including the impact of commercially and internationally deployed data matching systems on the standardization and stability of data matching practices.

3.3 A methodological framework for analyzing data matching in transnational infrastructure

3.3.1 Working model for operationalizing the inversion strategies

This section combines the three inversion strategies into the methodological framework, which will be operationalized for the dissertation’s investigation into how practices and technologies involved in matching identity data are both shaping and shaped by transnational commercialized security infrastructures. Hence, the operationalization of this framework encompasses three approaches to using data matching, each rooted in the inversion strategies and aligned with the overarching research questions expounded in Chapter 2:

  1. Comparative analysis of data models: This strategy involves examining the types of information collected by various organizations and systems, providing insights into organizations’ underlying assumptions about people through comparisons (RQ1).
  2. Examination of data practices: This strategy centers on routine identification practices, investigating how identity data is matched and linked within and across organizational boundaries (RQ2).
  3. Investigation of sociotechnical change: This strategy delves into the sociotechnical transformations within data matching software, exploring the trajectories and dissemination of data matching knowledge, technologies, and practices over time and across various organizational contexts (RQ3).

These three strategies will harness diverse data collection and analysis methods, which will be elaborated upon in the subsequent sections. The accompanying table delineates the relationships between the research questions, inversion strategies, and the techniques employed for data collection and analysis.

Table 3.1: Methodological framework.
RQ Inversion Pair Data Analysis
RQ1 Comparing data models (data models, categories of data) Traces of data models Mixed (The Ontology Explorer)
RQ2 Data practices (categories of data, data values) Fieldwork WCC & IND Deductive & inductive coding
RQ3 Sociotechnical change (data models, data values) Fieldwork WCC Thematic and historical analysis

The methodological framework can also be visually represented in a three-dimensional graph, as illustrated in Figure 3.1. This graphical representation resembles a relational model, where data are organized into tables with interrelationships. The visualization serves as both a guide for analysis and a communicative tool, conveying relationships between inversion strategies, data matching dimensions, and research questions. The three axes of the graph represent distinct data dimensions: data models, data categories, and data values. To better understand these dimensions, we will first explore each axis in more detail, their characteristics and implications for data matching. After that, subsequent sections will explore the interconnections between these axes and how they correspond to inversion strategies.

Figure 3.1: The methodological framework is represented graphically in three dimensions in this visualization.

First, the axis for data models represents the database and other informational models that standardize the kinds of information about people collected by different organizations’ IT systems. An example of such a data model could be the schemas specifying data collected about migrants by a government agency. Each system and database may have its own unique data models, while standardized formats can also exist for data exchange and interoperability between different systems. Second, the axis for data categories refers to the specific types of information captured about individuals within a data model. For instance, in the example of the asylum agency’s data model, data categories may include “name,” “place of birth,” and “date of birth.”

Connections between data models can often be identified, whether explicitly stated or implied. For example, one data model may utilize the data category “surname,” while another may use “family name” to represent the same concept. These variations demonstrate attempts to capture the same real-world information within different data models. Third, the axis for data values pertains to the actual values stored in databases corresponding to a particular data model and its associated data categories. For instance, a data value for the “place of birth” category could be “Brussels.” It is worth noting that similarities between data values can exist across different databases. For instance, another database might contain the data value “Bruxelles” for a similar category. By considering these three axes, we can now explore three combinations of them to illustrate three interconnected aspects of data matching.

3.3.1.1 Comparing authorities’ data models

The relationship between the axis for data models and the axis for data categories represents the similarities and differences between various data models through their corresponding data categories. This relationship forms the operationalization of the first inversion strategy, which involves conducting a comparative analysis of data models. Within this strategy, the focus is on examining the types of information collected, represented as data categories, by various organizations and systems, which are represented by data models. Through these comparisons, we can examine organizations’ underlying assumptions about individuals (RQ1).

The operationalization of examining specific authorities’ data models aligns with the objectives of Work Package 2 within the Processing Citizenship project. Therefore, it is necessary to contextualize the objective of data model analysis within the broader scope of the PC project, which seeks to investigate how national and European authorities formalize knowledge about individuals in data models for managing migration. Through the PC project’s script analysis methodology (see also, Pelizza and Aradau 2024), comparing data models contributed to producing a typology of “intended migrants,” which is the formalized knowledge of migrant identities inscribed in information systems (Pelizza and Van Rossem 2023).

Data model analysis aimed to detect differences in scale, mainly through comparing data models between EU and Member State authorities. Specifically, the study considers three significant information systems developed by European Commission agencies—Eurodac, the Schengen Information System (SIS), and the Visa Information System (VIS)—as representative of EU authorities. These three systems employ distinct data models to support various policing tasks related to travel, cross-border crime, and irregular migration. For national authorities, the analysis includes the data models of the Hellenic and German Register of Foreigners. The specific data models were thus chosen as part of the broader Processing Citizenship’s task plan of examining actual identification practices in “processing alterity” (Pelizza 2019).

The EU and national systems are characterized by diverse data models tailored to fulfill distinct functions within policing, travel, cross-border crime, and irregular migration management. However, within this array of systems, there are thought to be both disparities and commonalities in terms of data models and data categories. Eurodac, for instance, serves the purpose of aiding in identifying asylum seekers via fingerprinting while concurrently determining the Member State responsible for processing their asylum applications as part of the Dublin System.10 This system predominantly collects the fingerprints of asylum seekers. On another front, the Schengen Information System (SIS II) contributes to external border control and law enforcement cooperation within the European Union.11 It stores alerts containing comprehensive information about individuals and objects and directives for actions when encountering these entities. The Visa Information System (VIS) facilitates the exchange of visa-related data, encompassing personal and biometric information, in support of a unified EU visa policy. On the national level, the Hellenic Register of Foreigners is used to identify and register individuals arriving at the border in Greece. This system also collects data beyond mere identification, aiding various tasks encompassing the asylum process and assessing health conditions (Pelizza and Van Rossem 2021). Similarly, the German Register of Foreigners (GRF) holds a substantial repository of personal information about foreigners in Germany, including residence permit holders, asylum seekers, and recognized refugees (Bundesverwaltungsamt 2021). Chapter 4 delves further into these EU and national systems.

Two key considerations will need to be addressed in operationalizing the inversion strategy, focusing on comparing data models. Firstly, data models come in various formats, and specific formats might be less accessible due to their confidential nature, particularly within the context of migration and border control. In such instances, alternative forms of data model description, such as legislative documents or graphical user interface screenshots, will need to be relied upon for analysis. These substitutes should allow for the reconstruction of data models, enabling a comprehensive comparison across diverse authorities. Secondly, developing connection methods is crucial to identifying links between data categories in distinct data models, as these connections may not always be evident. For instance, one authority’s data model may feature the data category “family name,” while another may employ “surname” to refer to the same concept. Although the terminology differs, both categories pertain to an individual’s familial identity. Similarly, one data model may categorize all languages simply as “language,” whereas another might differentiate between “native language” and “spoken languages.” Chapter 4 will develop and use a method for extracting, analyzing, comparing, and visualizing data models from heterogeneous sources. However, beyond merely identifying these differences, these disparities will be used for gaining insights into authorities’ expectations and intentions about individuals.

3.3.1.2 Data matching within and across organizations

The second pair involves the relationship between the axis representing data categories and the axis denoting data values, establishing connections between data categories and their corresponding database entries. Consider, for instance, a database containing data categories such as “first name” and “last name,” wherein individuals’ records are associated with specific values. However, deviations from these data models’ expectations are commonplace. Instances may arise where first and last name values are inadvertently interchanged, or databases contain duplicate entries for the same individual. This pair of axes is connected to examining identification practices, which inherently involve dealing with inconsistent data—a challenge often addressed through data matching technology. By employing the second inversion strategy, our investigation delves into the routine practices of identity identification, scrutinizing how identity data is matched and linked.

Furthermore, the connection between the axis of data models and the axis of data values can be conceptualized as data matching practices across databases and organizations. In this context, data matching involves the procedures related to identifying and potentially linking or merging identity data that is distributed across multiple databases and organizations, where records of identity data for the same individual may coexist. A notable challenge for data matching mechanisms in this context is the inconsistent availability of unique identifiers, necessitating the reliance on, at times, ambiguous personal information. For instance, a woman might use her married surname in one system and her maiden name in another following marriage. Consequently, data matching processes across different organizations must consider variations in identification and registration practices. By employing the second inversion strategy, our research not only investigates the practices and technologies involved in matching data within and between organizations, but also explores differences and shifts in identifying individuals across various data infrastructures.

The operationalization of this aspect of the empirical framework, using the second inversion strategy to examine data matching practices, will be rooted in an empirical investigation of identification procedures within a government migration agency and its utilization of data matching software. Specifically, the research will focus on the applicant identification processes at the Netherlands’ Immigration and Naturalization Service (IND) and its interactions within broader inter-organizational networks. This investigation will particularly emphasize the role of the ELISE data matching software in these practices. The IND is responsible for processing residency and nationality applications, and it incorporates data matching software into its identification procedures for searching and matching applicants’ identity data within the back-office system and managing data anomalies like duplicate records. The analysis of data practices will rely on data gathered through fieldwork, including interviews, documents, and field notes, conducted at both the data matching software provider and the IND agency itself. Further details regarding setting up this fieldwork will be elaborated upon in subsequent sections of this chapter.

The operationalization of the second inversion strategy involves analyzing diverse data matching practices within the agency. This analysis encompasses at least three critical data matching practices identified in technical literature: batch data matching, real-time data matching, and data deduplication (Christen 2012). In the context of matching identity data, these practices operate at distinct moments. Batch data matching entails an offline, scheduled batch processing of data sets to identify matching identity data. For example, an organization might periodically enhance the data about individuals in their database by matching and integrating data from various sources. In contrast, real-time data matching addresses the need for immediate data retrieval through direct search queries. For instance, police officers may need to query databases using data categories such as “name,” “nationality,” and “date of birth” to identify approximate matches for these personal details instantly. Finally, data deduplication employs data matching technology to detect and merge multiple records in a database that are considered pertaining to the same individual.

The methodological framework’s focus on specific data matching practices is two-fold. First, it involves analyzing the practices and technology used to match applicant data within the IND’s data infrastructure. This approach provides insights into the intricacies of the practices and technologies used to match identity data within the IND’s operational context. Second, the research investigates how individuals are identified across the broader data infrastructure. This dual perspective not only sheds light on the identification mechanisms but also highlights the connections between shifts in identification practices and the integration of commercial systems for matching identity data. By examining these dynamics, the study offers a comprehensive understanding of the interplay between data matching practices, technological solutions, and broader shifts in identification within the context of transnational commercialized security infrastructures.

3.3.1.3 Traveling data matching software and expertise

A perspective spanning all three axes (data models, data categories, data values) can be conceptualized as the expertise ingrained within data matching technologies, which encompass expertise related to matching models, data categories, and values, along with the potential dissemination of data matching knowledge and technology across multiple organizations over time. The underlying assumption is that businesses develop and deploy tools to facilitate data matching efforts in diverse contexts and eras. As a result, expertise in domains such as comparing identity records with typographical errors and name variations may similarly propagate from one organization to another over time through these data matching technologies. Employing the third inversion strategy, this approach investigates the sociotechnical evolutions within data matching software, thereby examining how knowledge, technologies, and practices related to data matching may circulate across different organizational contexts and evolve over time.

The operationalization of this aspect of the empirical framework involves examining the evolutionary trajectory of data matching software technology. In particular, this analysis will focus on the ELISE data matching software, a commercial product developed by the company WCC12, which is widely deployed in various national and international border and migration control systems. The analysis will draw from fieldwork with WCC, encompassing the exploration of ELISE software development and deployment, as detailed in subsequent sections of this chapter. As explained above, operationalizing this third inversion strategy for data matching technology can offer insights into the dynamics between users, designers, and technologies throughout the practical phases of design, development, and real-world application. Moreover, tracing the sociotechnical evolution of the software is expected to contribute to a deeper understanding of broader shifts in internationalization, commercialization, and securitization within the realm of identification.

To operationalize the inversion strategy effectively, it is necessary to establish the appropriate heuristics to identify the analytically significant moments in the evolution of data matching software. These moments represent times when the interactions and involvement of users, designers, and other actors reveal the dynamics and evolution of the data matching process (Hyysalo, Jensen, and Oudshoorn 2016). In Chapter 6, two distinct heuristics will be introduced to pinpoint such pivotal moments within the lifecycle of identification technologies. The first heuristic will draw from the Social Construction of Technology framework’s concept of “interpretative flexibility,” (Pinch and Bijker 1984) to junctures when social groups challenge, alter, or stabilize the meanings attributed to identification practices and technologies. The second heuristic employs the concept of “gateway problem” (Edwards et al. 2007) from infrastructure studies to pinpoint moments where diverse identification software systems and infrastructures intersect and interact. These proposed heuristics will offer ways to capture and analyze shifts and developments in the evolution of data matching technology and how these evolutions are intricately intertwined with broader sociopolitical and economic forces that both shape and are shaped by transnational commercialized security infrastructures.

3.3.2 Contextualizing the research within the Processing Citizenship project framework

The methodological framework is closely linked to Processing Citizenship’s goal to examine how identity management systems support the creation of knowledge about non-European populations (Pelizza 2019; PC n.d.). Consequently, this section delineates the pertinent Work Packages (WPs) within the Processing Citizenship project framework, aiming to emphasize this dissertation’s distinct contributions to each of these WPs.

WP1
To develop a theoretical framework that integrates globalization studies, border studies and surveillance studies of migration with science and technology studies and media geography. Design a coherent methodological approach that combines ethnographic and computational techniques in order to establish information systems mediated registration practices at Hotspots as analytical sites.
This dissertation aligns with the objectives of WP1 by contributing to developing a methodological approach that combines ethnographic and computational techniques for analyzing registration practices at Hotspots. Specifically, the utilization of the second inversion strategy, which involves comparing data models, is in line with this goal. Chapter 4 introduces a methodology and software tool for comparing data models, facilitating the computational analysis of diverse authorities’ assumptions about individuals. The insights derived from this approach serve as a foundational component for critical analyses when integrated with ethnographic observations of the practical utilization of information systems. This differentiation aligns with the approach of Processing Citizenship, which aimed to identify “intended migrants” and compare them with actual migrants using ethnographic analysis.
WP2
To analyze and compare information systems used to register migrants across diverse Hotspots.
WP2 encompassed a range of tasks, including the semantic analysis of the ontologies underpinning information systems and identifying the “intended migrant” for relocation or resettlement using ontologies and algorithms. The Ontology Explorer method and tool, introduced in Chapter 4 and Van Rossem and Pelizza (2022), which is based on comparing data models, provided methodological support for these tasks. This methodology offered a comprehensive means of comparing data models employed by different authorities. Furthermore, the analysis of intended migrants was further advanced in the publication Pelizza and Van Rossem (2023), shedding light on the scripts of alterity that delineate the assumptions and limitations of border security frameworks through classification schemas. Additionally, as part of tasks related to WP2, interviews were conducted with IT developers within the EU, and these interviews were carried out during the fieldwork with the supplier of data matching software.
WP3
To describe identification and registration practices at Hotspots, focusing on the material devices involved, and assess them on the basis of migrants’ adaptation or resistance.
The Ontology Explorer (OE) not only offers a valuable method and tool for comparative analysis of the scripts related to uncovering authorities’ assumptions of people to be registered. Moreover, it contributes to WP3’s objectives by providing a lens to explore resistance to identification and registration practices. People do not resist in isolation but rather within the context of these scripts. The script analysis made possible by the OE thus serves as an initial analytical step, offering a foundation that can be utilized by fellow researchers within the Processing Citizenship project. While this dissertation does not directly address this WP, the tool was considered a way to support their ethnographic investigations to uncover various forms of resistance enacted by diverse actors.
WP4
To map architectures of data circulation in EU migration information systems.
The main contribution here is to the specific research question (RQ7A) “How are relationships between EU and MS enacted through efforts to achieve interoperability?” This dissertation contributes to WP4 by addressing the task of investigating tensions, data frictions, and controversies related to classifications, standards, and semantic interoperability. While the original plan was to delve into semantic interoperability at eu-LISA, the focus shifted towards examining a supplier of identity-matching software for the EU-VIS system. In Chapter 6, the dissertation conceptualizes the integration of WCC ELISE, a data matching software, as a gateway moment. This analysis illustrates the software’s role in facilitating semantic interoperability between EU and MS systems. Furthermore, the analysis of the data matching system in EU-VIS highlights the specific relations between EU, MS and commercial actors who needed to configure it while balancing new features and adhering to backward compatibility needs from EU member states’ systems.

3.3.3 Fieldwork context, goals and limitations

In this section, more context will be provided regarding the establishment of the fieldwork, encompassing its objectives and constraints. As elucidated earlier, one of the tasks outlined in the Processing Citizenship project was to engage in interviews with IT developers in the EU to identify tensions, data frictions and controversies over classifications, standards and semantic interoperability. At first, we sought to collaborate with eu-LISA on fieldwork related to data quality, but unfortunately, the plan did not come to fruition. This failed attempt led to a recalibration of focus toward scrutinizing a supplier of data matching software employed in both EU and Member State systems. Furthermore, the fieldwork presented the opportunity to investigate data matching practices and trace the evolutionary trajectory of data matching software.

3.3.3.1 Establishing the fieldwork site: failures and successes

The first, unfortunately unsuccessful, attempt at opening up a potential fieldwork site concentrated on the problems and technological solutions EU institutions face with identity data and data quality in border security and migration management. Establishing contacts with eu-LISA, the European Commission agency managing several EU border management information systems, was possible with the support of the PC project’s Principal Investigator. It was possible to participate in the 2018 annual eu-LISA conference and meet with the head officer for research and development to discuss a potential traineeship on-site at the agency’s office. After an initial agreement, I applied for a traineeship with a proof-of-concept proposal based on probabilistic databases, an active area of research that uses a database model based on working with uncertain data (Keulen 2018). Such a proof of concept was thought to potentially give insight into data uncertainties that the EU and MS encounter. Unfortunately, the proposal had to be abandoned by August 2019 due to issues surrounding the confidentiality of the data that the research would collect. As such, the study faced common barriers to qualitative secrecy research; the proposal could not pass the “gatekeepers” that could permit access to the fieldwork site (de Goede, Bosma, and Pallister-Wilkins 2020).

The second attempt at opening up a potential fieldwork site was successful by focusing instead on an eu-LISA supplier of identity-matching software. By looking at other technology companies that work with eu-LISA, we pinpointed the company WCC Group, which develops technology for matching identity data in border security and migration management. Indeed, the EU Visa Information System uses WCC produced software, namely the ELISE ID platform, to search and match identity and visa data. In short, ELISE provides data matching for fast querying based on inexact data. The company’s proprietary data matching technology uses various fuzzy logic algorithms for data matching that consider that data may be incomplete or inaccurate. In this context, fuzzy logic refers to a type of mathematical logic that computes truth variables using probabilities rather than boolean “true” or “false” values. The difference between these approaches is how they handle data uncertainties and produce results.

For example, consider a scenario where the data matching technology identifies potential duplicate records in a customer database. In a boolean search, the system would match exact values, such as names, email addresses, or phone numbers, and return either “true” (a match) or “false” (no match). In contrast, using fuzzy logic, the technology considers the possibility of minor variations or errors in the data, such as misspellings, nicknames, or incomplete information. It then calculates match probabilities based on the similarity level between records. This approach means that records with slight discrepancies can still be identified as potential matches, with varying degrees of confidence in their accuracy. By employing fuzzy logic algorithms, the company’s data matching technology can provide more comprehensive results by accommodating variations and uncertainties in the data. The company and its software were thus deemed as excellent opportunities to empirically observe the problems and solutions of matching and linking identity data in security settings. Indeed, next to using ELISE in VIS, WCC has various other customers who use the software in border security and migration management settings.

A meeting was set up in December 2019 with WCC at the company’s headquarters in Utrecht to discuss a research project that would meet the needs of the Processing Citizenship and the PhD project’s research and live up to WCC and its customers’ expectations. Sometime after, we proposed a project for on-site research focusing on the deployments of ELISE in the Visa Information System and the Netherlands’ immigration and naturalization government agency, which WCC later approved. Unfortunately, at the same time, the Covid-19 virus spread worldwide and became a global pandemic. As a result, the government of The Netherlands announced in March 2020 that people should work from home. We decided against starting remotely and instead pushed back the start date because we believed the research would benefit more from face-to-face interaction.

In May and June of 2020, the government of the Netherlands loosened up some restrictions on working on-site and taking public transportation. Therefore, we decided to start the fieldwork hybrid, with remote access to relevant resources and a few on-site visits to the Utrecht office when the staff members were present. During the summer of 2020, the office environment had a relaxed atmosphere. However, due to COVID-19 restrictions, only a limited number of people were allowed to be present at the office, and many employees were on holiday schedules. As a result, the office was less crowded than usual. In September 2020, the government introduced stricter COVID-related regulations, and subsequent fieldwork had to be conducted online. More information regarding these limitations can be found in later sections and chapters.

3.4 Methods for data collection

The methodological framework guided the collection and analysis of data matching in European security data infrastructures. Various techniques were used to gather and analyze qualitative data to support the investigation, including document analysis and user interviews. Naturally, these methods informed each other at various points throughout the research.

3.4.1 Data models

Regarding the collection of data models, the research draws on data collected during fieldwork conducted in the context of the Processing Citizenship project at border zones in Europe. In addition, given linguistic constraints and the PC project’s task plan organized as a matrix, some documents were collected by other researchers employed as collaborators in the Processing Citizenship project.13 Overall, the data collection efforts included desk research of European regulations over Eurodac, VIS, and SISII, technical documents made available by European and German authorities, systems screenshots collected at border zones in the Hellenic Republic, interviews and ethnographic observation with IT developers and users in Italy and Greece, and technical documents collected during fieldwork in The Netherlands.

The collection of documents on the analysis of data models encompassed the following:

Regulation (EU) No 603/2013, European Parliament and Council, 26 June 2013
The regulation governing the establishment of Eurodac includes Article 11, which addresses the “Recording of Data.” This article outlines the specific data that must be recorded in the Central System.
Regulation (EU) No 2018/1862, European Parliament and Council, 28 Nov. 2018
The regulation pertaining to the establishment, operation, and utilization of the Schengen Information System (SIS) includes Article 20, which addresses “Categories of Data.” This article outlines the specific categories of data provided by each Member State and contained within the SIS.
Regulation (EC) No 767/2008, European Parliament and Council, 9 July 2008
The regulation governing the Visa Information System (VIS) and data exchange among Member States concerning short-stay visas contains two pertinent articles. Article 5, titled “Categories of Data,” delineates the specific categories of data that must be recorded in the VIS. Concurrently, Article 9, titled “Data upon Lodging the Application,” elucidates the categories of personal data recorded of applicants when lodging their visa applications.
Hellenic Register of Foreigners screenshots
Screenshots of the Hellenic Register of Foreigners (also known as ALKYONI system), used in the Hellenic Republic for managing asylum applications, were captured by fellow members of the Processing Citizenship project during their fieldwork. These screenshots offer a detailed view of the system’s interface and functionalities, particularly in relation to the registration and identification of asylum seekers in Greece.
XAusländer Version 1.14.0
The XAusländer standard is a protocol based on XML (eXtensible Markup Language) that facilitates the electronic exchange of identity data among various authorities within Germany’s immigration administration. This standard is accessible through the website of the “Koordinierungsstelle für IT-Standards” (Coordination Office for IT Standards). Section “2.2 Das Informationsmodell” (2.2 The Information Model) within the standard elaborates on the specifics of the information model for individuals.

3.4.2 Documents

During fieldwork, I gathered a range of documents pertinent to the ELISE data matching systems. The collection of documents comprised a diverse range of materials, including technical design specifications, product brochures outlining the software’s features and advantages, sales presentations delivered to customers, and other materials. Access to these materials was facilitated through my contacts within the WCC ID Team, who provided me with access to these documents. Furthermore, when I visited WCC’s headquarters in person, I received further clarification regarding the technical complexities I had previously encountered while reviewing those materials.

This collection of documents can be categorized into two main groups. Firstly, some documents pertained to the broader technical intricacies of the ELISE ID platform, enabling me to familiarize myself with the software suite and its design by WCC. Secondly, documents linked to the platform’s specific implementations at the IND allowed me to gain insights into how WCC practically configures the system for its customers. Notably, the collection of documents encompassed archival materials, such as past presentations and meeting minutes, that shed light on the system’s deployment and configuration process at the IND. For instance, one set of meeting minutes provided detailed insights into discussions concerning the configuration of the matching criteria. Another noteworthy document was a presentation that covered an upgrade of the ELISE system and introduced new features implemented at that time.

Hence, the diverse range of materials underscores the multifaceted nature of documents and the documentation practices, highlighting that they serve a purpose beyond mere information recording, making them valuable topics and resources for research (e.g., Shankar, Hakken, and Østerlund 2017). Documents exhibit an active role within social contexts, functioning to account for and coordinate various workplace activities. In this regard, documents are closely intertwined with the processes that engender them (Hull 2012; Riles 2006). Consequently, the research adopted a two-fold approach towards these documents. On the one hand, they were examined to gain insights into the technical workings and features of the software system and its configuration. On the other hand, the documents were harnessed to comprehend larger trends in the evolution and utilization of identification technologies.

The documents collected during the research process played a dual role in contributing to the investigation, aligning with both the first and second inversion strategies within the methodological framework. Firstly, these documents provided insights into data matching practices through the technical documents outlining the data matching mechanics of ELISE, shedding light on the designed functioning of the data matching system. This designed functioning was then juxtaposed with the real-world utilization of different users within the IND, as discerned from interviews, which will be elaborated upon in the subsequent section. Chapter 5 explores contrasts between the design of the matching system and its practical implementation, centered on three key aspects: the formulation of search queries, the computation of search results, and the interpretation of search outcomes. This comparative analysis ultimately revealed distinct frictions between the intended design and actual usage in data matching practices, thereby shedding light on impediments that may obstruct the IND’s identification processes.

Secondly, the collected documents were used for the second inversion strategy to trace the sociotechnical evolution of the data matching software. In this context, a focus is on documenting practices, investigating alterations within document versions or the incorporation of supplementary annotations (see also, for example, Bowker and Star 1999; Shankar, Hakken, and Østerlund 2017; Sweeney 2008). For example, the documents associated with the deployment of ELISE at the IND encompassed materials detailing package updates and records of discussions about configuration changes. Chapter 6 effectively utilizes the documents to delve into the specifics of this update, revealing new data matching features originated from diverse contexts, including an international name matching competition and a feature tailored for the EU-VIS. By analyzing these documents, the chapter effectively highlights the process of disseminating and exchanging data matching knowledge among various organizations.

The collection of documents also brought about particular challenges and unexpected insights. Notably, the absence or inaccessibility of documents emerged as a significant but less evident aspect of the documentation process. During fieldwork, it became apparent that certain records related to the deployment of ELISE in the EU Visa Information System would be inaccessible. This lack of access resulted from the specific dynamics among actors involved in the development of the EU-VIS systems. In this arrangement, WCC was the technology supplier and collaborated with Accenture, the technology integrator. Consequently, the pertinent documents were under the purview of the EU agencies and the technology integrator. Moreover, even for WCC, the technology integrator effectively acted as a gatekeeper, hindering access to such documents. Despite not having access to the technical details of the integration of ELISE in the EU-VIS, this instance proved insightful in understanding the dynamics of WCC’s role as a technology supplier and its dependencies on technology integrators and consultancies within the larger sociopolitical and economic contexts, as elucidated in greater detail in Chapter 6.

The accompanying table provides an overview of the documents that were consulted. The findings obtained from the documents played a role in shaping the structure and content of the interviews, as will be elucidated in the following section.

3.4.3 Interviews

The fieldwork encompassed a series of semi-structured interviews designed to align with the methodological framework’s second and third inversion strategies. These interviews can be categorized into two participant groups, each serving distinct research objectives. The first group was dedicated to exploring data matching practices. This interview group included a diverse range of IND personnel, encompassing individuals engaged in technical development associated with INDiGO and those responsible for identity lookup and matching tasks using the ELISE search and match engine as an integral part of their tasks. The second group of interviews was centered around unraveling the sociotechnical evolution of the data matching software, involving interviews with WCC employees responsible for various aspects of the company’s identity matching system, including design, implementation, and (pre-)sales efforts.

The interview protocol for IND staff was designed to delve into specific aspects of searching and matching applicant data within the IND organization. Interviews commenced with establishing interviewees’ roles to contextualize their experiences and tailor subsequent questions. Three primary factors influencing data searching and matching were addressed: query formulation, match computation, and result handling. For query formulation, questions delved into categories used, preferred data elements, and wildcard utilization. The second factor explored match result expectations and engine understanding. The third factor probed result processing, encompassing quality perception, match ranking, and score interpretation. Additionally, duplicates and deduplication criteria were discussed. Further details about the interview protocol and analysis of responses are provided in Chapter 5.

The interview approach was guided by the hypothesis that differences in data matching practices could be best discerned by contrasting how personnel from various IND departments employed these tools for searching and matching in their distinct identification tasks. This hypothesis emerged from initial insights from document analysis and discussions with the WCC ID Team. Consequently, efforts were made to recruit interview participants from different organizational units within the IND (a list of these units is available in Table ??). While it was not feasible to interview users from every unit, insights into the utilization of search and match tools in various departments did emerge during the interviews. This insight was made possible by particular participants with extensive experience, having worked across different organizational units within the IND or in roles involving collaboration with multiple departments. Five interviews with IND staff were conducted, each lasting about an hour, where participants detailed their experiences with identifying applicants and utilizing search and match tools.14 This approach yielded valuable results, and Chapter 5 leverages the diverse data matching practices revealed in these interviews to develop an analytical tool for interpreting these practices.

The second group of interviews, conducted concurrently with and following the IND interviews, centered on delving into the sociotechnical evolution of the ELISE data matching software to unveil broader shifts within the realm of identification. These interviews primarily involved discussions with staff members from WCC with experience within the identity domain. The interviews were designed to uncover significant milestones in the history of their data matching software system. Given the diversity in participants’ roles, projects, and experiences within the company, the interviews followed a more flexible format than those conducted with IND employees, allowing for tailored discussions based on each individual’s background and expertise. For instance, when interviewing individuals involved in projects like the EU-VIS, the questions were honed to delve deeper into these areas. In cases where less information was available about the interviewee, the conversation typically began by exploring their connections with current and potential customers in the security and identity market, gradually steering the discussion towards sociotechnical changes. Seven interviews were conducted in this group, each spanning approximately an hour.15 The participants comprised individuals in various roles, including consultancy, pre-sales, solutions management, software development, and user experience design.

Participants from both interview groups consented to record the sessions using Processing Citizenship’s informed consent form. The form allowed participants to specify how the research would use their provided data while guaranteeing anonymity and confidentiality. The recording procedure followed a protocol to enhance anonymity, including only audio recordings with a distorted voice. Additionally, manual transcription of interviews ensured additional confidentiality by preventing confidential information from being leaked via automated transcription platforms. Table ?? provides an overview of the interviews.

The number of interviews conducted in this study was lower than initially expected and fell short of the number initially foreseen by the Processing Citizenship project. Several factors contributed to this outcome. Firstly, due to the sensitive nature of their border control and security work, it was challenging to gain access to additional customers beyond the software technology supplier and one of their customers, despite the initial intention to include them. Clearance and background checks required for individuals involved in these areas posed difficulties for the researcher in expanding the participant pool. Additionally, the COVID-19 pandemic further complicated the situation by limiting networking opportunities and hindering the ability to find additional interview participants, particularly in the case of IND interviews. As a result, the study had to adapt to conducting online interviews, which, although enabling data collection to continue, introduced constraints such as reduced rapport-building and limitations in gathering nuanced insights compared to face-to-face interactions.16 The study made the most of available opportunities despite these challenges; it provided valuable insights within its defined scope, shedding light on the perspectives of the software technology supplier and their customer.

3.4.4 Events

The operationalization of the second inversion strategy, which involves examining sociotechnical change within the realm of identification technologies, also encompasses data collection through participation in relevant events. As recognized in Social Construction of Technology, the development of technologies involves a wide array of relevant social groups. These social groups can extend beyond developers and users to include researchers, journalists, politicians, civil organizations, and other stakeholders. Therefore, attending events such as industry conferences, academic symposiums, and other gatherings involving diverse stakeholders—such as industry representatives, academics, Member State authorities, and EU agencies—helped gain additional insights into the sociotechnical evolution of identification technologies. These attended events were characterized by discussions revolving around related topics such as data matching, data quality, identification, and data interoperability within EU identification systems. Additionally, some of these events were directly related to the fieldwork, involving the participation of WCC. A summary and description of these events, whether attended in person or online, can be found in Table ??.

Participating in such events was perceived as a means to explore how diverse social groups define challenges concerning developing identification technologies within border and migration control. The gatherings were seen as opportunities to unveil how varying interpretations of these challenges may lead to conflicts and the differing interpretations of how these conflicts can be resolved through technological means. To illustrate this, consider an observation made during a 2018 eu-LISA industry conference. During the Q&A session following a presentation on technical solutions for achieving interoperability among EU systems, an audience member raised a pertinent concern about handling the potentially significant volume of false positives resulting from integrating disparate databases and matching data containing outdated information. In response, one of the panelists acknowledged the need to address this issue through a one-off intensive effort to resolve these challenges. This example illustrates differing interpretations of the problem and technical solutions; in this case, it questions the perceived smooth integration of databases and highlights the costs, such as the labor-intensive manual efforts required to integrate diverse identification systems.

Moreover, these events were seen as opportunities to not only observe how various actors and social groups define and interpret problems and solutions, but also to observe how these definitions and interpretations are disseminated and circulated. For example, examples given by professionals may involve data quality and matching concerns related to security and counter-terrorism, which could illuminate the securitization process within data matching and identification practices. A particularly insightful illustration emerged during one of my online sessions, which I integrated into the opening of Chapter 5. The presented case underscores how security companies frame the importance of technologies for searching and matching data stored in diverse databases, especially within counter-terrorism initiatives. The company recounted the instance of authorities adding one of the “Boston bombers” to police watch list databases before the attack but with inconsistent and invalid transliterations of his name. The professional posited that discrepant information across databases poses hurdles for authorities’ investigations, which could result in potential blind spots for authorities (the problem), which could be solved by utilizing their data matching technology. Subsequently, I discovered this example was recurrently cited across various companies in similar contexts. In this manner, these events also offer a glimpse into the circulation and transfer of the problems and solutions of data matching knowledge across different organizations.

3.4.5 Other data

As previously mentioned, the sociotechnical evolution of technologies encompasses various pertinent social groups. To augment the analysis, supplementary data were obtained from other publications, such as news articles and press releases. These materials offered additional viewpoints, facilitating a comprehensive exploration of the historical progression and transformations concerning the WCC search and match software. The Nexis Uni17 database and tool were used to search for articles that mentioned the company WCC and its software in the INDIGO system of the IND or the EU-VIS system. Nexis Uni allows users to search for articles in newspapers, online business publications, and other sources written in English and Dutch. For example, several articles published in Dutch IT business news websites provided information on IND’s INDiGO system. Moreover, this database surfaced press releases pertaining to WCC’s participation in the MITRE multicultural name matching challenge, featured in the analysis of Chapter 6. Beyond this, Nexis Uni enabled the discovery of newspaper articles, including a 2011 piece from “Het Financieele Dagblad,” a prominent Dutch newspaper focused on economy and business. The article, titled “Utrechtse datatechnologie moet terroristen buiten de VS houden” (Data technology from Utrecht must keep terrorists out of the US), contributed to tracing the company’s evolutionary trajectory within the realm of security.

Furthermore, additional publications were found via conventional search engines. For example, blog posts on WCC’s and competitors’ websites showed how experts in the field of name-matching devise and disseminate their knowledge. Consider, for instance, the blog post titled “Understanding Dari and Pashto names: a challenge to intelligence gathering in Afghanistan” (Basis Technology 2012). This post originates from a WCC competitor engaged in developing a comparable data matching system. Within this blog post, a linguist elucidates the intricate technical hurdles associated with “Afghani names and how they challenge these software tools.” Here, one can observe the practical application of linguistic expertise in security intelligence, exemplified by the assertion that “Afghani names pose a challenge to intelligence agencies.” Another example is a WCC blog post titled “Biographic matching & UMF standards for EU interoperability” (Scheers 2021). This specific post sheds light on a noteworthy dynamic, illustrating how WCC’s software is being reinterpreted to align with a project’s requirements set by the EU. The blog post delineates how the demands of the EU project are mapped onto the capabilities of WCC’s solution, showcasing changes in design and interpretative flexibility.

3.5 Techniques of data analysis

The data analysis in this dissertation employs various techniques that span qualitative and computational methods. In Chapter 4, a distinctive methodology is devised to compare data models across different authorities. This approach is utilized to analyze data models from information systems focused on population management. Moving to Chapter 5, a mixed deductive-inductive approach is adopted to analyze data matching practices derived from the interviews conducted at the IND. Lastly, Chapter 6 takes a slightly different approach by relying less on developing codes for theory development. Instead, the analysis involves writing integrative memos, which serve as a means to elaborate ideas and interconnect various pieces of data.

The data analysis for the first inversion strategy involved the development of a new method to compare data models collected in various formats and used by diverse systems, as no existing method fully met our requirements. Chapter 4 provides a comprehensive methodology and introduces a software tool called “The Ontology Explorer,” designed to facilitate the comparison of data models collected in different formats and from various systems in two primary ways. Firstly, it supports analyses of information systems that define their data models, even if these systems are only occasionally comparable. Secondly, it systematically and quantitatively enables discursive analysis of “thin” data models by identifying differences and gaps between systems. The methodology involves extracting, analyzing, comparing, and visualizing heterogeneous data models. This structured approach was applied to the data models employed by diverse organizations, ultimately enabling the observation of differences and similarities within the data models of various data infrastructures. The data models used by the EU and Member States were analyzed using the Ontology Explorer methodology and tool. The results showed discrepancies and commonalities in the collected data, shedding light on the circulation of knowledge about individuals and the division of labor among different involved actors.(Pelizza and Van Rossem 2023)

The data analysis for the second inversion strategy, focusing on data matching practices, involved utilizing the computer-assisted qualitative data analysis software ATLAS.ti for coding and analyzing the data. This data coding and analysis process was guided by the “Noticing-Collecting-Thinking” (NCT) method by Friese (2014), tailored to the ATLAS.ti software and comprises three interconnected steps. In the “Noticing” step, segments within the documents were labeled with codes. This phase encompassed both deductive codes stemming from research hypotheses regarding crucial aspects of applicant identification through the search and match tools and openness to inductive insights from the data. The deductive codes were primarily built around the three factors of search: query formulation, match computation, and manipulation of search results. Concurrently, new codes emerged based on the data’s inductive findings, such as the processes of dealing with duplicate records. In the “Collecting” step, these codes were reviewed and grouped into similar categories. Subsequently, in the “Thinking” step, patterns, processes, and typologies were identified among the developed codes. Throughout the process, there was an iterative movement between the noticing, collecting, and thinking steps, enhancing the depth and richness of the analysis. Figure 3.2 visually represents this recursive data collection process and the application of the NCT steps. Chapter 5 offers practical examples of applying this methodology, including illustrative examples that led to the development of the interpretative framework for data matching practices.

Figure 3.2: Schematic representation of the data analysis process for the second inversion strategy.

The data analysis undertaken for the third inversion strategy, which focused on understanding sociotechnical change, involved writing integrative memos to elucidate concepts and connect fragments of data. Preliminary analyses and memos were already embedded within the interview transcripts and field notes. These fragments of analysis represent the researcher’s initial development of ideas based on the data collected. Integrative memos amalgamate diverse data analysis components from interviews, field notes, or excerpts from external materials, such as news articles. The aim was to write these memos with the anticipation of future readers in mind, particularly those who may not be familiar with the research setting. As Emerson, Fretz, and Shaw (2011) highlights, “integrative memos provide a first occasion to begin to explicate contextual and background information that a reader who is unfamiliar with the setting would need to know in order to follow the key ideas and claims” (p. 193). For example, I mapped various actors and their involvements over different time points to track the sociotechnical evolution of the data matching software. Instead of relying solely on data coding, I found it more effective to construct narratives that inherently encompassed the required contextual details. These integrative memos played a pivotal role in advancing two key aspects. Firstly, they contributed to refining our understanding regarding the social groups involved and the interpretative flexibility of the software. Secondly, these memos aided in conceptualizing the software as a gateway technology that bridges connections among diverse actors and systems. This approach facilitated the organization of pertinent information into cohesive narratives, proving instrumental in meticulously tracing the complex sociotechnical evolution of the data matching software.

References

Ajana, Btihaj. 2013. “Asylum, Identity Management and Biometric Control.” Journal of Refugee Studies 26 (4): 576–95. https://doi.org/10.1093/jrs/fet030.

Akrich, Madeleine. 1992. “The de-Scription of Technical Objects.” In Shaping Technology/Building Society: Studies in Sociotechnical Change, edited by Wiebe E. Bijker and John Law, 205–24. Inside Technology. Cambridge, Mass.: The MIT Press.

Akrich, Madeleine, and Bruno Latour. 1992. “A Summary of a Convenient Vocabulary for the Semiotics of Human and Nonhuman Assemblies.” In Shaping Technology/Building Society: Studies in Sociotechnical Change, edited by W. E. Bijker and John Law, 259–64. Cambridge, Mass.: The MIT Press.

Basis Technology. 2012. “Understanding Dari and Pashto Names: A Challenge to Intelligence Gathering in Afghanistan.” Rosette Text Analytics: NLP Blog. https://web.archive.org/web/20230605072915/https://www.rosette.com/blog/understanding-dari-and-pashto-names-a-challenge-to-intelligence-gathering-in-afghanistan/.

Bloomfield, Brian P., and Theo Vurdubakis. 1997. “Visions of Organization and Organizations of Vision: The Representational Practices of Information Systems Development.” Accounting, Organizations and Society 22 (7): 639–68. https://doi.org/10.1016/S0361-3682(96)00024-4.

Bowker, Geoffrey C. 1994. Science on the Run: Information Management and Industrial Geophysics at Schlumberger, 1920-1940. Inside Technology. Cambridge, Mass.: The MIT Press.

Bowker, Geoffrey C., and Susan Leigh Star. 1999. Sorting Things Out: Classification and Its Consequences. Inside Technology. Cambridge, Mass.: The MIT press.

Cakici, Baki, Evelyn Ruppert, and Stephan Scheel. 2020. “Peopling Europe Through Data Practices: Introduction to the Special Issue.” Science, Technology, & Human Values 45 (2): 199–211. https://doi.org/10.1177/0162243919897822.

Christen, Peter. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Berlin; New York: Springer. https://doi.org/10.1007/978-3-642-31164-2.

Cornford, James, Susan Baines, and Rob Wilson. 2013. “Representing the Family: How Does the State ‘Think Family’?” Policy & Politics 41 (1): 1–18. https://doi.org/10.1332/030557312X645838.

de Goede, Marieke, Esmé Bosma, and Polly Pallister-Wilkins, eds. 2020. Secrecy and Methods in Security Research: A Guide to Qualitative Fieldwork. London & New York: Routledge.

Edwards, Paul N., Geoffrey C. Bowker, Steven J. Jackson, and Robin Williams. 2009. “Introduction: An Agenda for Infrastructure Studies.” Journal of the Association for Information Systems 10 (5): 364–74. https://doi.org/10.17705/1jais.00200.

Edwards, Paul N., Steven J. Jackson, Geoffrey C. Bowker, and Cory Philip Knobel. 2007. “Understanding Infrastructure: Dynamics, Tensions, and Design.” Working Paper Final report of the workshop, "History and Theory of Infrastructure: Lessons for New Scientific Cyberinfrastructures". http://deepblue.lib.umich.edu/handle/2027.42/49353.

Emerson, Robert M., Rachel I. Fretz, and Linda L. Shaw. 2011. Writing Ethnographic Fieldnotes. Second. University of Chicago Press.

Euzenat, Jérôme, and Pavel Shvaiko. 2007. Ontology Matching. Berlin; New York: Springer. https://doi.org/10.1007/978-3-540-49612-0.

Friese, Susanne. 2014. Qualitative Data Analysis with ATLAS.Ti. Second. London: SAGE Publications Ltd.

Garfinkel, Harold. 1964. “Studies of the Routine Grounds of Everyday Activities.” Social Problems 11 (3): 225–50. https://doi.org/10.2307/798722.

Gitelman, Lisa, ed. 2013. “Raw Data” Is an Oxymoron. Infrastructures. Cambridge, Mass.: The MIT press.

Glouftsios, Georgios. 2018. “Governing Circulation Through Technology Within EU Border Security Practice-Networks.” Mobilities 13 (2): 185–99. https://doi.org/10.1080/17450101.2017.1403774.

Hand, Martin, Elizabeth Shove, and Dale Southerton. 2005. “Explaining Showering: A Discussion of the Material, Conventional, and Temporal Dimensions of Practice.” Sociological Research Online 10 (2): 1–13. https://doi.org/10.5153/sro.1100.

Hanseth, Ole, Edoardo Jacucci, Miria Grisot, and Margunn Aanestad. 2006. “Reflexive Standardization: Side Effects and Complexity in Standard Making.” MIS Quarterly 30: 563–81. https://doi.org/10.2307/25148773.

Hine, Christine. 2006. “Databases as Scientific Instruments and Their Role in the Ordering of Scientific Work.” Social Studies of Science 36 (2): 269–98. https://doi.org/10.1177/0306312706054047.

Hui, Allison, Theodore Schatzki, and Elizabeth Shove. 2016. The Nexus of Practices: Connections, Constellations, Practitioners. London & New York: Routledge. https://doi.org/10.4324/9781315560816.

Hull, Matthew S. 2012. “Documents and Bureaucracy.” Annual Review of Anthropology 41 (1): 251–67. https://doi.org/10.1146/annurev.anthro.012809.104953.

Hyysalo, Sampsa, Torben Elgaard Jensen, and Nelly Oudshoorn, eds. 2016. The New Production of Users: Changing Innovation Collectives and Involvement Strategies. Routledge Studies in Innovation, Organization and Technology 42. New York: Routledge.

Jackson, Michèle H., Marshall Scott Poole, and Tim Kuhn. 2002. “The Social Construction of Technology in Studies of the Workplace.” In Handbook of New Media: Social Shaping and Consequences of ICTs, edited by Leah A. Lievevrouw and Sonia Livingstone, 236–53. London: SAGE Publications Ltd. https://doi.org/10.4135/978184860824510.4135/9781848608245.n18.

Kementsietsidis, Anastasios. 2009. “Schema Matching.” In Encyclopedia of Database Systems, edited by Ling Liu and M. Tamer Özsu, 2494–7. Boston, MA: Springer. https://doi.org/10.1007/978-0-387-39940-9_962.

Keulen, Maurice van. 2018. “Probabilistic Data Integration.” In Encyclopedia of Big Data Technologies, edited by Sherif Sakr and Albert Zomaya, 1–9. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-63962-8_18-1.

Lampland, Martha, and Susan Leigh Star, eds. 2009. Standards and Their Stories: How Quantifying, Classifying, and Formalizing Practices Shape Everyday Life. Ithaca: Cornell University Press.

Latour, Bruno. 2005. Reassembling the Social: An Introduction to Actor-Network-Theory. Clarendon Lectures in Management Studies. Oxford & New York: Oxford University Press.

MacKenzie, Donald A., and Judy Wajcman. 1985. The Social Shaping of Technology: How the Refrigerator Got Its Hum. Milton Keynes: Open University Press.

Marres, Noortje. 2017. Digital Sociology the Reinvention of Social Research. Malden, MA: Polity.

M’charek, Amade, Katharina Schramm, and David Skinner. 2014. “Topologies of Race: Doing Territory, Population and Identity in Europe.” Science, Technology, & Human Values 39 (4): 468–87. https://doi.org/10.1177/0162243913509493.

Monteiro, Eric, Neil Pollock, Ole Hanseth, and Robin Williams. 2013. “From Artefacts to Infrastructures.” Computer Supported Cooperative Work (CSCW) 22 (4): 575–607. https://doi.org/10.1007/s10606-012-9167-1.

Nicolini, Davide. 2016. “Is Small the Only Beautiful? Making Sense of ‘Large Phenomena’ from a Practice-Based Perspective.” In The Nexus of Practices Connections: Constellations, Practitioners, edited by Allison Hui, Theodore Schatzki, and Elizabeth Shove, 98–113. London & New York: Routledge. https://doi.org/10.4324/9781315560816.

PC. n.d. “Processing Citizenship: Digital Registration of Migrants as Co-Production of Citizens, Territory and Europe.” ERC-2016-STG - ERC Starting Grant. ALMA MATER STUDIORUM - UNIVERSITA DI BOLOGNA, Italy: H2020-EU.1.1. - EXCELLENT SCIENCE - European Research Council (ERC). Accessed April 11, 2023. https://doi.org/10.3030/714463.

Pelizza, Annalisa. 2016a. “Developing the Vectorial Glance: Infrastructural Inversion for the New Agenda on Governmental Information Systems.” Science, Technology and Human Values 41 (2): 298–321. https://doi.org/10.1177/0162243915597478.

Pelizza, Annalisa. 2019. “Processing Alterity, Enacting Europe: Migrant Registration and Identification as Co-Construction of Individuals and Polities.” Science, Technology, & Human Values 45 (2): 262–88. https://doi.org/10.1177/0162243919827927.

Pelizza, Annalisa. 2010. “From Community to Text and Back: On Semiotics and ANT as Text-Based Methods for Fleeting Objects of Study.” Tecnoscienza : Italian Journal of Science & Technology Studies 1 (2): 57–89. https://research.utwente.nl/en/publications/from-community-to-text-and-back-on-semiotics-and-ant-as-text-base.

Pelizza, Annalisa, and Claudia Aradau. 2024. “Scripts of Security: Between Contingency and Obduracy.” Science, Technology, & Human Values 0 (0). https://doi.org/10.1177/01622439241258822.

Pelizza, Annalisa, and Wouter Van Rossem. 2021. “Sensing European Alterity: An Analogy Between Sensors and Hotspots in Transnational Security Networks.” In Sensing in/Security: Sensors as Transnational Security Infrastructures, edited by Nina Klimburg-Witjes, Nikolaus Pöchhacker, and Geoffrey C. Bowker, 262–86. Manchester, UK: Mattering Press. https://doi.org/10.28938/9781912729111.

Pelizza, Annalisa, and Wouter Van Rossem. 2023. “Scripts of Alterity: Mapping Assumptions and Limitations of the Border Security Apparatus Through Classification Schemas.” Science, Technology, & Human Values 0 (0): 1–33. https://doi.org/10.1177/01622439231195955.

Pinch, Trevor. 2008. “Technology and Institutions: Living in a Material World.” Theory and Society 37 (5): 461–83. https://doi.org/10.1007/s11186-008-9069-x.

Pinch, Trevor, and Wiebe E. Bijker. 1984. “The Social Construction of Facts and Artefacts: Or How the Sociology of Science and the Sociology of Technology Might Benefit Each Other.” Social Studies of Science 14 (3): 399–441. https://doi.org/10.1177/030631284014003004.

Reckwitz, Andreas. 2002. “Toward a Theory of Social Practices.” European Journal of Social Theory 5 (2): 243–63. https://doi.org/10.1177/13684310222225432.

Ribes, David. 2019. “Materiality Methodology, and Some Tricks of the Trade in the Study of Data and Specimens.” In DigitalSTS: A Field Guide for Science & Technology Studies, edited by Janet Vertesi and David Ribes, 43–60. Princeton, N.J.; Oxford: Princeton University Press. https://doi.org/10.1515/9780691190600.

Ribes, David, and Thomas A. Finholt. 2009. “The Long Now of Infrastructure: Articulating Tensions in Development.” Journal of the Association for Information Systems 10 (5): 375–98. https://doi.org/10.17705/1jais.00199.

Riles, Annelise, ed. 2006. Documents: Artifacts of Modern Knowledge. Ann Arbor: The University of Michigan Press.

Rinkinen, Jenny, Elizabeth Shove, and Mattijs Smits. 2019. “Cold Chains in Hanoi and Bangkok: Changing Systems of Provision and Practice.” Journal of Consumer Culture 19 (3): 379–97. https://doi.org/10.1177/1469540517717783.

Rogers, Richard. 2013. Digital Methods. Cambridge, MA: The MIT Press. https://doi.org/10.7551/mitpress/8718.001.0001.

Ruppert, Evelyn, and Stephan Scheel, eds. 2021. Data Practices: Making up a European People. London: Goldsmiths Press.

Schatzki, Theodore. 2005. “Introduction: Practice Theory.” In The Practice Turn in Contemporary Theory, edited by Theodore Schatzki, Karin Knorr Cetina, and Eike von Savigny, 10–23. London; New York: Routledge.

Schatzki, Theodore. 2010. “Materiality and Social Life.” Nature and Culture 5 (2): 123–49. https://doi.org/10.3167/nc.2010.050202.

Scheel, Stephan, Evelyn Ruppert, and Funda Ustek-Spilda. 2019. “Enacting Migration Through Data Practices.” Environment and Planning D: Society and Space 37 (4): 579–88. https://doi.org/10.1177/0263775819865791.

Scheers, Marie-Louise. 2021. “Biographic Matching & UMF Standards for EU Interoperability.” WCC Group. https://web.archive.org/web/20230211102336/https://www.wcc-group.com/company/post/2021/02/22/high-quality-biographic-matching-umf-standards-for-eu-interoperability/.

Shankar, Kalpana, David Hakken, and Carsten Østerlund. 2017. “Rethinking Documents.” In The Handbook of Science and Technology Studies, edited by Ulrike Felt, Rayvon Fouché, Clark A. Miller, and Laurel Smith-Doerr, Fourth, 59–85. Cambridge, Mass.; London, Eng.: The MIT Press.

Shove, Elizabeth. 2016. “Matters of Practice.” In The Nexus of Practices: Connections, Constellations, Practitioners, edited by Allison Hui, Theodore Schatzki, and Elizabeth Shove, 155–68. London & New York: Routledge. https://doi.org/10.4324/9781315560816.

Shove, Elizabeth, Mika Pantzar, and Matt Watson. 2012. The Dynamics of Social Practice: Everyday Life and How It Changes. Thousand Oaks, CA: SAGE Publications Ltd.

Shove, Elizabeth, Matt Watson, and Nicola Spurling. 2015. “Conceptualizing Connections: Energy Demand, Infrastructures and Social Practices.” Edited by Tracey Skillington. European Journal of Social Theory 18 (3): 274–87. https://doi.org/10.1177/1368431015579964.

Star, Susan Leigh. 1999. “The Ethnography of Infrastructure.” American Behavioral Scientist 43 (3): 377–91. https://doi.org/10.1177/00027649921955326.

Sweeney, Shelley. 2008. “The Ambiguous Origins of the Archival Principle of "Provenance".” Libraries & the Cultural Record 43 (2): 193–213. https://www.jstor.org/stable/25549475.

Timmermans, Stefan, and Steven Epstein. 2010. “A World of Standards but Not a Standard World: Toward a Sociology of Standards and Standardization.” Annual Review of Sociology 36: 69–89. https://doi.org/10.1146/annurev.soc.012809.102629.

Van Rossem, Wouter, and Annalisa Pelizza. 2022. “The Ontology Explorer: A Method to Make Visible Data Infrastructures for Population Management.” Big Data & Society 9 (1): 1–18. https://doi.org/10.1177/20539517221104087.

Weltevrede, E. J. T. 2016. “Repurposing Digital Methods: The Research Affordances of Platforms and Engines.” PhD thesis, Universiteit van Amsterdam. https://hdl.handle.net/11245/1.505660.

Woolgar, Steve. 1990. “Configuring the User: The Case of Usability Trials.” The Sociological Review 38 (S1): 58–99. https://doi.org/10.1111/j.1467-954X.1990.tb03349.x.


  1. As semiotics has theorized, meaning emerges from comparison. For example, Latour (2005) explains the significance of not defining groups a priori because “whenever some work has to be done to trace or retrace the boundary of a group, other groupings are designated as being empty, archaic, dangerous, obsolete, and so on. It is always by comparison with other competing ties that any tie is emphasized.” (p. 32). Similarly, Pelizza (2010) recalls that “the situations where the social is made visible and graspable are those where meaning emerges from comparison” (p. 67).↩︎

  2. The Dublin System (Regulation No. 604/2013; also known as the Dublin III Regulation) establishes the criteria and mechanisms for determining which EU Member State is responsible for examining an asylum application. The system is currently operational in the Member States of the European Union (plus Norway, Iceland, Switzerland, and Liechtenstein). Law enforcement agencies and Europol also use the system, which has significantly expanded its original scope since its inception (Ajana 2013). New proposals aim to include more biographic and biometric data, including a facial image (Procedure 2016/0132/COD, a recast of the Eurodac Regulation).↩︎

  3. In addition, the SIRENE network can exchange this information between law and border enforcement authorities.↩︎

  4. Went Computing Consultancy Group BV.↩︎

  5. Over the years, PC researchers have been: A. Bacchi, E. Frezouli, Y. Lausberg, C. Loschi, L. Olivieri, A. Pelizza, A. Pettrachin, S. Scheel, P. Trauttmansdorff.↩︎

  6. Repeated efforts were made to expand the sample size of participants from the IND for interviews. Initial attempts began through the WCC’s primary contact with the IND, who provided contact details for several individuals. However, only a subset of these individuals agreed to participate. Subsequently, efforts were made to leverage these participants to connect with their colleagues, aiming for a respondent-driven sampling approach. However, the process encountered challenges and delays, compounded by the online nature of communication and the necessity of remote calls. Over several months, these endeavors ultimately proved more complex than anticipated. Additionally, the need to produce a report for WCC within a specific timeframe further constrained the possibilities for expanding the participant pool. Despite these challenges, the sample size provided valuable insights and rich details for analysis.↩︎

  7. Considering the relatively modest size of the company and the ID Team, the sample size, to the best of my knowledge, comprehensively encompassed most individuals with relevant backgrounds or experiences related to the field or the company and its products.↩︎

  8. In the context of the IND interviews, the research could have benefitted from observing the practical use of the search and match tools. During these interviews, the research aimed to utilize the WCC company’s secure Microsoft Teams installation for online meetings, which offered the possibility of screen sharing for such observations. However, constraints emerged due to installing and using the Microsoft Teams application on their company-issued laptops, which ultimately necessitated conducting phone interviews.↩︎

  9. http://web.archive.org/web/20230606140543/https://www.lexisnexis.com/en-us/professional/academic/nexis-uni.page (Formerly LexisNexis Academic)↩︎