Chapter 7 Conclusions
This concluding chapter summarizes and synthesizes the dissertation’s research on matching identity data in transnational commercialized security infrastructures. Throughout the dissertation, we explored the practices and technologies of matching identity data, aiming to understand the types of knowledge and assumptions inscribed in data models, the technologies organizations used to match identity data, and the circulation of data matching knowledge and technologies across organizations. This last chapter begins by reviewing the previous empirical chapters’ research questions and key findings. Next, the importance of these findings will be interpreted, and their theoretical and practical implications will be discussed. Lastly, the chapter will reflect on the research process and outline potential directions for future research.
7.1 Restatement of the main research question and summary of the research
The dissertation sought to investigate the interplay between data matching, identification systems, and data infrastructure to understand the internationalization, commercialization, securitization, and infrastructuring of identification through the materiality and performativity of data matching practices and technologies. Specifically, the dissertation addressed the following overarching research question:
How are practices and technologies for matching identity data in migration management and border control shaping and shaped by transnational commercialized security infrastructures?
Through this research question, the dissertation sought to contribute to a more performative understanding of data matching technology’s role in shaping the meaning of data, practices, and the organizations that employ it. In Chapter 1, the dissertation’s goal of examining data matching in transnational data infrastructures was broken down into several research objectives that were taken up in the subsequent chapters. Firstly, Chapter 2 mapped the theoretical landscape related to internationalization, securitization, and infrastructuring of identification, deriving the research questions and hypotheses that guided our investigation into data matching in transnational commercialized security infrastructures. Chapter 3 developed a methodological framework wherein data matching serves as both a research topic and a resource to investigate the internationalization, commercialization, securitization, and infrastructuring of identification.
Chapter 4 introduced a novel method and software tool designed to analyze the underlying data models of information systems in population management. This approach was used to investigate authorities’ imaginaries about people on the move through the connections between different data models. Chapter 5 examined the relationship between identity data matching technologies and routine identification practices, demonstrating the practical implications and challenges organizations encounter when matching identity data to repeatedly identify individuals within and across organizations. Lastly, Chapter 6 investigated the circulation of a commercial data matching system throughout its lifecycle. Specifically, the chapter investigated contingent moments in the problematizations and design of a data matching system — such as shifts in the meanings attributed to the software from a generic data matching engine to a security tool or its use as a bridge to connect disparate identification systems — influence the practices and technologies employed in transnational security infrastructures.
7.2 Overview of the findings
This section summarizes the key findings of the empirical investigations conducted in the dissertation. The findings are linked to the research objectives, questions, and the use of the methodological framework’s infrastructural inversion strategy.
7.2.1 Chapter 4: The Ontology Explorer: A method to make visible data infrastructures for population management
Chapter 4 contributes to the research objective of introducing a new method and software tool to analyze the schemas underpinning information systems in population management. This method and tool, dubbed the “Ontology Explorer” (OE), help achieve this objective by enabling a systematic examination and comparison of non-homogeneous data models used in diverse information systems. The chapter utilizes the OE method, which draws inspiration from schema matching techniques and STS research on classifications and standards, to bring attention to the assumptions embedded in the data models used in information systems for population management.
The infrastructural inversion strategy employed in Chapter 4 compared data models to uncover authorities’ imaginaries and expectations about people on the move embedded within the technical standards of data infrastructures. Chapter 4 thus addresses research question 1 (Which types of knowledge and assumptions about people-on-the-move are inscribed in data models of national and transnational security infrastructures? What implications does this have for how organizations can search and match identity data?), through an illustrative analysis of the types of knowledge and assumptions about people-on-the-move inscribed in data models of national and transnational authorities. The illustrative analysis demonstrated how the OE can be used to make visible the implicit knowledge and assumptions in data infrastructures for population management.
Chapter 4 aids in answering the main research question by revealing implicit assumptions and patterns within information systems design. The findings indicate different conceptualizations of identity data among authorities involved in population management, which could have implications for matching data across various database systems. For example, the Eurodac system primarily collects fingerprint data and does not include biographical details such as name, date of birth, and nationality. Therefore, the only means of matching data with Eurodac are biometric matching methods or linking data using the Eurodac number. Furthermore, Chapter 4 contributes to Processing Citizenship’s WP1 by introducing the OE, which facilitates the computational analysis of diverse authorities’ assumptions about individuals while serving as a foundational component for critical analyses when integrated with ethnographic observations of the practical utilization of information systems.
7.2.2 Chapter 5: From registration to re-identification: Exploring the interplay of data matching software in routine identification practices
Chapter 5 contributes to the dissertation’s research objective of examining the relationships between identity data matching technologies and routine identification practices. It addresses Research Question 2 (How do organizations that collect information about people-on-the-move search and match for identity data in their systems? How is data about people-on-the-move matched and linked across different agencies and organizations?) through an empirical investigation, centering on the iterative processes of identifying applicants within the Dutch government migration and asylum agency and the role of a data matching system in these processes. Re-identification is introduced to conceptualize the multifaceted iterative identification procedures, including retrieving corresponding identity data from databases and determining whether multiple database records, potentially originating from different organizations, pertain to the same real-world individual.
Utilizing the second infrastructural inversion strategy, this chapter investigates the practices surrounding identity matching and linking within the agency and across other organizations. In doing so, the findings reveal a considerable diversity in re-identification practices within the IND, which manifests in two primary dimensions. Firstly, these practices vary regarding the information accessible to staff during the process. Secondly, they exhibit differences in the precision criteria required for successful re-identification. Building upon these findings, the chapter introduces an interpretative framework that categorizes re-identification practices based on the specific requirements for interpreting search inputs and results. This framework yields a matrix, acknowledging re-identification as a multifaceted process rather than a singular activity. It encompasses a spectrum of iterative practices, spanning diverse scenarios such as direct applicant interactions, telephone conversations managed by staff, handling application forms received through postal services, and automated re-identification procedures.
The chapter’s findings emphasize the various practices of re-identification that can be impeded by data friction, potentially resulting in failed re-identification. The analysis identified three prominent forms of data friction that may hinder applicant re-identification: friction between standardized identification and the differences in institutional practices, friction from variations in the precision and accuracy of identity data during its transformation across different mediums and use in formulating search queries, and friction arising from the opaque calculation of match results and the need for thorough interpretation and fine-tuning of search results. These forms of friction, in turn, prompted a closer examination of the costs arising from failed re-identification, as exemplified by the existence of duplicate records and the labor-intensive process of deduplication.
Chapter 5 addresses the main research question by examining the interaction between the re-identification practices of the IND and a commercially developed data matching system. The IND’s use of the commercial data matching system influenced the agency’s re-identification strategies due to the system’s embedded expertise in matching identity data. Additionally, the chapter draws attention to the link between the deduplication process and transnational systems. The IND and partner organizations use data from major European Union information systems, connecting seemingly unrelated records within their databases. These findings underscore re-identification processes and associated technologies are not isolated; they are intricately intertwined with broader commercialized security infrastructures.
7.2.3 Chapter 6: Uncovering the long-term development of identification infrastructures: A multi-temporal perspective
Chapter 6 delved into the evolution of a data matching system and the construction of transnational security infrastructures. The chapter’s findings contribute to the research objective of exploring the evolution of identification systems, identifying contingent moments, and examining how data matching expertise circulates among various organizations. The evolution of a commercial data matching system is thus used to investigate more extensive long-term developments of identification systems and transnational commercialized security infrastructures.
The chapter uses the third infrastructural inversion strategy of sociotechnical change to trace the evolution of data matching technology and design. The strategy is operationalized using “multi-temporal sampling,” which the chapter uses as an alternative to conventional longitudinal studies for exploring the contingent processes shaped by and shaping identification systems and infrastructures. The chapter proposes two heuristic devices to identify contingent moments: the changing interpretive flexibility of the data matching system and the gateway moments when identification systems intersect with broader infrastructures. The first heuristic suggests that changes in the interpretive flexibility of the data matching system, as per SCOT, are analytically valuable moments that can provide insight into how the data matching system’s design was contingent upon specific social groups’ problematizations of data matching. The second heuristic proposes examining gateway moments, as per infrastructure studies, which involve making sociotechnical compromises when integrating separate systems into more extensive infrastructures. These two heuristics were applied to analyze the contingencies in the evolution of a commercial data matching system, thereby providing insight into the internationalization, commercialization, securitization, and infrastructuring of identification.
In this way, the chapter addresses Research Question 3 (How do knowledge and technology for matching identity data circulate and travel across organizations?) by exploring the networks of people and organizations involved in developing and disseminating data matching technologies. Concerning the main research question, the chapter’s analysis offers insights into how globally honed technologies are adjusted to new contexts by uncovering the compromises and adaptations required when building identification infrastructures. Furthermore, the activities of software vendors, often overlooked actors, are revealed as they distribute and reuse data matching systems, thereby influencing the long-term development of identification practices and infrastructures.
Significant findings from Chapter 6 arose by looking at the moments of interpretative flexibility and analyzing moments when the system demonstrated gateway-like characteristics. Moments of interpretative flexibility made it possible to retrace how a private company became enrolled in security logic through shifts in identification problems and solutions. The software was initially created as a versatile tool for data matching across industries and was only later tailored to meet specific needs like border security and migration management. At different points, the interpretative flexibility of data matching was closed down and reintroduced with shifts in data matching and identification problematizations. The design of the technology needed to address specific requirements and challenges of identification in border security and migration management, as exemplified by the development of matching functions incorporating multicultural name matching and biometrics in response to new problematization of identifying known and suspected security threats. Additionally, focusing on these contingent moments made apparent the role of diverse social groups, such as the development of international professional networks to secure contracts for developing identification systems.
Gateway moments offered a complementary approach to identifying contingent moments in the long-term evolution of the data matching software by considering the intricate technical and logistical challenges of integrating different identification systems, which are not visible by only looking at shifts in interpretative flexibility. As such, the chapter provided insight into the contingent circulation of identity matching expertise among organizations. The integration of the data matching system into IND systems showed the capacity to disseminate specific name matching functionalities across organizational boundaries. In contrast, such knowledge transfer does not invariably transpire, as seen in the use of the data matching system in the VIS Evolutions project, where factors like backward compatibility constraints limited the capacity for data matching knowledge to circulate.
Methodologically, the multi-temporal sampling approach employing gateway moments as a heuristic unveiled the contingency of the data matching system’s integration into more extensive networks, revealing that the transfer of data matching knowledge relies on various elements. Gateway moments like the UMF and data model mapping elucidated that full data interoperability is not always a prerequisite for data matching. Instead, gateways can offer adaptability and openness, potentially leading to contingent standardization. However, this standardization hinges on organizations’ specific choices regarding data mapping and adaptation to integrated systems.
Together, the moments of interpretative flexibility and gateway moments helped answer the main research question by demonstrating how data matching knowledge and expertise can but does not invariably circulate between different actors and organizations. These findings indicate how technologies and methods for matching identity data can spread and influence identification practices across various domains and settings. Furthermore, the chapter addresses Processing Citizenship’s WP4 by illustrating data matching’s role in facilitating semantic interoperability between EU and MS systems. Furthermore, the analysis of the data matching system in EU-VIS highlights the specific relations between EU, MS and commercial actors who needed to configure it while balancing new features and adhering to backward compatibility needs from EU member states’ systems.
7.3 Discussion of theoretical and practical implications
This section discusses the theoretical implications of the research findings and how each chapter contributed to existing theoretical frameworks and concepts in its respective area of focus.
7.3.1 Chapter 4
In Chapter 4, the introduction of the Ontology Explorer (OE) method and tool expands on methodological innovations in infrastructure studies and digital methods to analyze data models underpinning information systems in population management. The chapter highlights the potential for innovative methodological contributions to make data infrastructures for population management visible by examining the semantic level of information systems’ data models. Methodologically, the chapter uses correspondences and discrepancies between data models to reveal the interconnections between technical details and the politics of knowledge production. The chapter’s illustrative analysis enhances our understanding of the data models used in migration and border control information systems and their role in revealing how authorities envision managing populations.
The chapter’s practical and theoretical implications of the research findings are twofold. Firstly, the OE method introduces a means to account for resistance within information systems. Using the OE to analyze “thin” data models, researchers can identify differences and absences between systems, which can serve as the basis for critical analyses. This method allows for the detection of discrepancies between what is represented in the data models and the actual use of the systems, providing valuable insights into power dynamics and potential areas of intervention. Furthermore, analysis via the OE can be complemented by ethnographic observations to account for practices of resistance exerted by actual people.
Secondly, the OE method contributes to ongoing discussions on the politics of data and data-driven governance. While, for example, digital sociology often focuses on user-generated content, interventions into data need to examine existing structures, such as the underlying data models. The OE has the potential to present and analyze data models, which could allow for experimental forms of participation and speculation on alternate ways to represent data about populations. By engaging with the technicalities of information infrastructures rather than solely relying on textual content, the OE facilitates critical engagements with the collected categories of data. This could open up opportunities for actors typically treated as data subjects to become active participants in rethinking and reshaping data practices, leading to more inclusive and transformative approaches to data governance.
Overall, Chapter 4 highlights the potential of the OE method to uncover resistance and power dynamics within information systems and to foster more participatory and reflexive approaches to data governance. By focusing on the technical aspects of data models, the method offers a distinctive perspective that complements traditional qualitative approaches, providing researchers and practitioners with valuable tools for critical analysis and intervention in the politics of data.
7.3.2 Chapter 5
Chapter 5 extends existing discussions on identification by emphasizing the broader scope of what the chapter refers to as re-identification. While previous research has often focused on initial registration and biometric data, this chapter highlights the iterative processes of retrieving and matching identity data across time and space. The chapter thus addresses an area not covered in existing literature — the use of data matching technologies in routine bureaucratic re-identification. These chapter’s findings expand our theoretical understanding of identification beyond its initial stages and underscore the importance of considering the influence of data matching technologies in identifying people throughout bureaucratic procedures.
Chapter 5 presents several practical implications that emerge from the study of re-identification practices and the use of data matching at the IND. The chapter explores the practical utility of data matching technologies, which serve as tools to mitigate errors and ensure the precision of re-identification processes. On the one hand, organizations’ utilization of these tools can reduce re-identification errors and enhance the quality of identification data. Conversely, the findings on using data matching in re-identification underscore the potential consequences of errors in data linkage or incorrect data entry into databases, such as duplicate identity data. Such errors can adversely affect individuals and organizations, necessitating additional efforts to verify and rectify inaccuracies.
Additionally, the integration of various systems can introduce friction and ambiguity into the search process. For example, when users must query multiple systems equipped with distinct data matching engines, discrepancies can emerge between search behaviors and query formulations. To mitigate these issues, it would be advantageous for tools to consider the search behaviors of other systems. This consideration could lead to aligning search processes or implementing interfaces capable of translating and optimizing queries for multiple systems simultaneously. By facilitating this integration and alignment, the search process can be streamlined, reducing user confusion and ultimately enhancing the efficiency of re-identification efforts.
Furthermore, the design of the data matching tools and their use within identification systems notably influences re-identification. In the context of the IND, the data matching tools do not account for the contextual nuances of a search, such as the user’s role or the stage of the identification process. However, the research findings have underscored that users often possess distinct needs and preferences concerning search functionalities. For instance, certain users may necessitate more comprehensive information explaining why a specific match result was included in the search results. To accommodate these varied user requirements, developing novel application features to assist users in crafting queries and comprehending search outcomes could be advantageous. Moreover, fostering a more coherent integration between the backend search capabilities and the frontend user interface holds promise in enhancing both the user-friendliness and the overall capabilities of these search tools.
7.3.3 Chapter 6
In Chapter 6, the theoretical implications are twofold. Firstly, the chapter offers an alternative to conventional longitudinal studies for exploring the complex processes shaping and shaped by identification systems and infrastructures. By proposing two multi-temporal sampling heuristics — moments of interpretative flexibility and gateway moments —, the chapter demonstrates the use of considering the contingency in developing sociotechnologies of identification. This contribution adds to the ongoing theoretical discourse, as exemplified by the Biography of Artefacts and Practices Approach (Pollock and Williams 2009), regarding methods used to comprehend the development and use of software systems. Adopting a multi-temporal sampling approach in the field of IT in border and migration value broadens the analytical focus and opens up new avenues for inquiry. Adopting this approach makes it possible to discern the contingency inherent in the sociotechnical processes involved in the development and evolution of identification technologies, rather than simply viewing identification systems as a final product.
Secondly, the chapter further advances our comprehension of an often overlooked facet in the existing literature by delving into a commercial software vendor’s development of data matching technology. Employing heuristics to identify moments of contingency in the evolution of data matching software, the analysis effectively retraces the sociotechnical transformations of the software. It underscores instances where the internationalization, commercialization, securitization, and the infrastructuring of identification become apparent. Consequently, this chapter’s insights contribute to the body of knowledge surrounding the involvement of non-state and commercial entities in the datafication of migration and border control.
Chapter 6 identifies practical implications stemming from using the interpretative flexibility heuristic. This heuristic underscores the inherently dynamic nature of identification technologies and practices, challenging the notion that these technologies possess fixed or stable meanings. Instead, it accentuates how various social groups actively engage in the interpretation and contestation of these technologies. This social agency, in turn, significantly influences the trajectory and outcomes of identification technologies and practices. Recognizing the concept of interpretative flexibility invites rethinking the design of identification systems in a way that is more responsive to the diverse and evolving needs of the individuals and communities they affect.
The concept of gateway moments provides practical insights into the expansion of identification systems into more extensive infrastructures. It underscores the critical role of data matching technologies as connectors, bridging gaps and enabling the seamless integration of heterogeneous identification systems and practices. Although sometimes underestimated, these gateway-like technologies serve as crucial facilitators in establishing and maintaining networks within larger-scale infrastructures. Recognizing the significance of these gateways can inform the design and implementation of identification systems. Acknowledging the role of these gateways can pave the way for more robust and effective identification infrastructures, benefiting the individuals and organizations that rely on them.
The following section will reflect on the research process and discuss limitations encountered during the study. It will provide an opportunity to evaluate the methodology, data collection, and analysis techniques, discussing their strengths and weaknesses. Additionally, the section will address constraints or challenges faced throughout the research, such as data access limitations or the generalizability of findings. By reflecting on the research process and its inherent limitations, the dissertation aims to produce recommendations for future studies on identification technologies and transnational commercialized security infrastructures.
7.4 Overview of the research process
The research process was guided by a methodological framework that leveraged data matching as both a topic of investigation and a methodological resource. Drawing inspiration from Bowker and Star’s notion of “infrastructural inversions,” three distinct methodological strategies were employed: comparing data models, analyzing data practices, and tracing sociotechnical change. These strategies were integrated into a methodological framework, allowing for a comprehensive examination of the practices and technologies involved in matching identity data as a research topic. Furthermore, by reversing the tendency of data matching to recede into the background, this methodological framework made it possible to use data matching as a resource to investigate the internationalization, commercialization, securitization, and infrastructuring of identification.
The first inversion strategy employed was the comparison of data models, which provided insights into the information collected by various authorities’ information systems for population management. This work included collecting data models from EU and EU Member State authorities through desk research. Other researchers’ contributions from the Processing Citizenship project also provided indirect traces of data models, such as screenshots of graphical user interfaces that provided evidence of categories of data collected in systems. Utilizing indirect traces of data models emerged as a strategic approach to tackle the challenges of obtaining data models for systems operating in the sensitive realm of migration and border control. In this context, where technical specifications related to data models are typically not publicly accessible, the indirect traces served as a valuable alternative. However, this approach introduced an additional layer of complexity. It necessitated the development of methods to effectively compare various documents about data models, a task made more intricate by the diverse file formats and languages in which these documents were presented.
Therefore, the Ontology Explorer method dealt directly with those challenges. It successfully enabled us to compare diverse data models by employing additional steps to code, harmonize, and group all documents, categories, and values. The novelty came from integrating insights from existing classification and infrastructure studies and discourse analysis into a systematic method for analyzing data models. Additionally, drawing from an analysis of data models using the OE, Pelizza and Van Rossem (2023) built further on this work. This 2023 article described the different “scripts of alterity” of security subjects, highlighting how data infrastructures shape power dynamics among individuals and states. Furthermore, the OE tool used for this research has been released as open-source software (Van Rossem 2021), enabling the analysis of information systems’ data models in various domains and contexts beyond this research’s scope.
The second infrastructural inversion strategy delved into the analysis of data practices, aiming to unveil the intricacies of routine activities involved in searching for and matching identity data within diverse organizational contexts. To operationalize this strategy data, fieldwork and interviews were conducted at a software supplier specializing in commercial data matching tools and at one of the company’s customers, the Netherlands’ Immigration and Naturalization Service (IND). The fieldwork involved a series of meetings and interviews with the company’s personnel, which took place online and on-site at their headquarters in Utrecht, The Netherlands. Additionally, documentation was provided to me to facilitate an in-depth exploration of the technical intricacies underpinning the design of the data matching software. This approach enabled a thorough exploration of the data practices embedded within these organizations and their implications in the context of identity matching.
The initial project proposal was initially conceived to undertake a comprehensive analysis and comparison of the utilization of the company’s data matching software at both the IND and the EU Visa Information System (EU-VIS). This approach was rooted in the anticipation of encountering distinct challenges and dynamics due to the differing scales of use at these two entities. However, as the research progressed, it became evident that adaptations to the original research plan were necessary. Access to the IND proved to be more readily available, while the complexities surrounding confidentiality and access hindered extensive research into the EU-VIS project. Consequently, the research focus was redirected towards a more in-depth exploration of the software’s practical application within the IND. This shift in approach was embraced by all parties involved, as it offered mutual benefits. The software company gained valuable insights into the real-world utilization of their product. At the same time, the IND agency had the opportunity to gain a deeper understanding of the intricacies and challenges associated with the search and match functionalities.
I conducted semi-structured interviews with the IND agency staff members to understand their day-to-day activities concerning data searching and identity matching in identifying applicants throughout the agency’s complex bureaucratic procedures. Detailed descriptions of the interview process and the subsequent data analysis can be found in Chapter 3 and Chapter 5 of this dissertation. Following the interview phase, I compiled a report summarizing my findings, that was then shared with the software company and the IND agency. The responses I received from both parties were notably positive, with an invitation extended for me to present the report at a company-wide online meeting dedicated to disseminating knowledge and insights. It is worth highlighting that the research offered the company valuable insights into how its software is actually being used. The explanation for this aspect lies in the company’s function as a software supplier, where their products are used by integrator partners to deploy across different organizations, resulting in a lack of visibility into end-users’ interactions with their software solutions.
The third infrastructural inversion strategy focused on mapping the dynamics of sociotechnical change, exploring the dissemination of knowledge, technologies, and practices associated with data matching across organizations and over extended periods. This approach arose as an adaptation to the constraints encountered during the research, which initially aimed to analyze and compare the use of data matching technology in the IND and EU-VIS systems. The modified research plan instead delved into the historical progression of the software, unveiling the sociotechnical networks that played a role in the transnational circulation of identification technologies. The research explored the growth of commercial solutions within the identification technology domain and illuminated the interplay between data matching and shifting problematizations of identification. To illustrate, I delved into the company’s involvement in an international competition sponsored by a prominent US research organization. This competition sought innovative name matching solutions, and by retracing the various parties involved, I unearthed connections between technological advancements, market dynamics, and the imperative need for security measures. This exploration facilitated an understanding of how data matching solutions have continuously evolved to address changing problems.
This research phase was also grounded in fieldwork and interviews conducted at the same software supplier, a creator of commercial data matching tools. Here, the central objective was to delve into the transformations within data matching and transnational data infrastructures, seeking to unveil contingencies in these systems’ historical development and evolution. The interviewees were generally keen to share their experiences, expressing interest in contemplating broader issues beyond their immediate organizational roles. Nevertheless, some aspects of the early evolutions of the company remained obscure, often due to interviewees’ lack of direct involvement during that period. To address this gap, I complemented the interviews with information from archival sources, such as dated web pages, news articles, and press releases. This multifaceted approach allowed for a comprehensive understanding of the subject matter by cross-referencing and triangulating the information, revealing valuable insights into the transformations, adaptations, and challenges.
This section has provided an overview of the research process employed in this study. Using the methodological framework inspired by “infrastructural inversions,” the research used data matching as both a research topic and a resource. Three methodological strategies were used, including comparing data models, analyzing data practices, and tracing sociotechnical change. These strategies enabled the examination of the interplay between identification systems and the internationalization, commercialization, and securitization of identification. Through data collection methods such as desk research, fieldwork, and interviews, the research developed new perspectives on the information collected by organizations, the everyday data practices of matching identity data and re-identification, and the dynamics of transnational commercialized security infrastructures. The next section will further reflect on the findings, including unexpected discoveries and the limitations and challenges faced during the research.
7.5 Reflection on findings and limitations
While conducting research, I discovered some surprising findings indicating how a company’s integrated software affects how customers use it for data matching and identity management. Specifically, these findings challenged my assumptions about the data matching tools, as the software’s integration in systems such as the Netherlands’ migration and asylum was relatively invisible to users. Also, I was surprised to discover that the software was adaptable and effective across different domains, which challenges traditional notions about the difficulty of adapting generic software to local circumstances. However, throughout the research, several limitations also arose that had a direct impact on the findings I was able to produce. In this section, I will examine the unexpected results, discuss the identified limitations, and reflect on the research outcomes.
7.5.1 Surprising and unexpected findings
Several surprising and unexpected findings emerged from Chapter 4 that aided our analysis of data categories within systems and their implications. The investigation found that different systems had significant differences in the categories of data they stored. For example, the specificity of information about individuals’ names varied greatly. These differences were crucial for the analysis as they revealed authorities’ conceptions and imaginaries about individuals. Categories such as alias names and violence indicators showcased how these systems could indicate suspicion or criminality. Furthermore, the analysis showed that only a limited number of shared data categories, such as birthdate, nationality, and biometric data, were consistently used among the multiple systems. These findings showed how identity data categories, seemingly technical details, help examine authorities’ imaginaries about people on the move.
Chapter 5 presented unexpected insights regarding the integration of WCC company’s data matching software into the Immigration and Naturalization Service (IND) systems. Notably, the software’s design, which positioned the data matching software as a relatively independent component within the IND systems, created distinctive challenges. While this technical design was efficient in some respects, it did not always align with the specific needs of all users. One significant consequence of this design was the relative invisibility of the search process to most users, resulting in a lack of awareness regarding the underlying data matching mechanisms. The chapter’s exploration of diverse data matching scenarios within the IND underscores the potential benefits of tighter integration between the software, the local context, and the system’s graphical user interface. For instance, during interviews, it became evident that users often required clarification when search results included matches based on historical data, such as matching previous names before marriage. Contrary to expectations before the research, this design resulted in an invisible layer of data matching. However, this also led to a lack of clear understanding of search results and functionality for users.
Chapter 6 revealed unexpected findings concerning the role of commercial software companies within the domain of identification. The research investigated the dynamics between the software company as a technology supplier and its associations with technology integrators, unearthing intricate professional networks focused on identification solutions. At the beginning of the research, it was unclear how the company got involved in data matching for identity and security. However, it was surprising that these relationships significantly influenced the company’s positioning in the market and paved the way for it to explore new realms of identity and security. An additional surprising finding pertained to the circulation of data matching knowledge; contrary to initial expectations, the adaptable design of the data matching software allowed for customization to meet customers’ specific requirements. This means that the expertise in data matching within the software was not always utilized and circulated between organizations. Overall, Chapter 6’s examination challenged the conventional view of the company’s data matching software as a fixed solution for identity matching in migration, border control, and security, unveiling the diverse contingencies shaping the software’s evolution.
7.5.2 Limitations
During this research, several limitations were encountered that are essential to understand the scope and implications of the study. These limitations impacted the breadth and depth of the investigation, making it necessary to consider them when interpreting the research findings.
As part of the Processing Citizenship project, the initial plan, in collaboration with the Principal Investigator, was to set up my fieldwork within the eu-LISA EU agency to investigate identity data and data quality in the context of border security and migration management. However, this endeavor proved challenging due to significant access restrictions and the customary barriers encountered in qualitative secrecy research. Regrettably, we could not secure access to commence the proposed project at the designated fieldwork location, resulting in significant time and effort spent in cultivating contacts and formulating a comprehensive project proposal for a potential traineeship with the agency. Ultimately, the primary stumbling block pertained to disagreements over the ownership and utilization of the research data that would be generated. Consequently, we decided to establish new fieldwork research by collaborating with a data matching technology supplier for the agency.59
The company swiftly accepted the project proposal in Dec 2019–Jan 2020, but the COVID-19 pandemic introduced unexpected challenges to the research project with the data matching software company. Initially, the research had to be postponed due to COVID-19-related safety measures and travel restrictions in the Netherlands. When restrictions were eventually eased, my on-site visits to the company were impacted by fewer employees on-site, as limitations were imposed on the number of individuals allowed to work in the office, with many staff members continuing to work remotely. Subsequently, with the reintroduction of restrictions, I could not visit the company on-site. Consequently, the research had to adapt to an online format, reducing opportunities for in-person visits and direct observations during fieldwork. This absence of face-to-face interactions and observations may have hindered my grasp of the intricacies and ramifications associated with the usage of the software
The study had limitations due to the decision to concentrate on the EU Visa Information System and the Netherlands’ migration and asylum agency systems rather than investigating the company’s developments in advanced passenger information systems. This decision aligned with the broader goals of the Processing Citizenship project, but it restricted the inquiry to specific projects in the company’s portfolio; some other relevant customers would also have necessitated security clearances. Additionally, obtaining relevant documentation about the use of data matching software in the EU-VIS system proved to be challenging due to the company’s limited access to EU documentation and confidentiality practices by the technology integrator, which hindered my ability to delve deeply into the intricacies of the software implementation and its influence on identity data management within the EU context. Indeed, the research in Chapter 5 focused on the implementation within The Netherlands’ immigration and asylum agency, with access to this specific context’s technical details. In contrast, Chapter 6 shifted its focus towards exploring connections with EU-VIS and the broader transnational security infrastructures by tracing the software’s sociotechnical evolution, drawing from different accessible data sources.
It is also important to note that the exclusion of migrants’ experiences from this research was a result of the overall design of the Processing Citizenship project. Other researchers within the project focused more specifically on migrant experiences. For example, elements of this dissertation’s Chapter 4, which delves into information systems’ categorization of data, can be complemented by the work of Lorenzo Olivieri (2023). His research involved examining the processes of translating migrants’ identities and personal stories into standardized categories and traits, utilizing innovative techniques that included working closely with migrants. Additionally, Pelizza and Van Rossem (2023) examine how migrants are inscribed as particular kind of users, referred to as “scripts of alterity.” This article utilizes field observations from the project to analyze forms of resistance against these scripts. The Processing Citizenship project’s overarching framework ensured the incorporation of various perspectives, including those of migrants, into the project’s broader scope.
Despite the encountered limitations, this study undoubtedly contributes to our comprehension of how the integration of commercial identification software shapes and is shaped by transnational security infrastructures. Recognizing these limitations also serves to highlight the complexities inherent in researching technology within the realm of security, including issues related to access and secrecy (de Goede, Bosma, and Pallister-Wilkins 2020). Moreover, it emphasizes the significance of being attuned to the contingencies and unforeseen developments that can unfold in research. Significantly, the shift towards collaborating with a less widely known data matching software supplier unveiled previously lesser-known dimensions of identification and identity management. The subsequent section will explore the ethical considerations that arose throughout this collaboration and research process.
7.6 Ethical considerations
Collaborating with a company in identity data and security inevitably prompted ethical considerations that revolved around the potential repercussions of their technology on individuals. These concerns extended to pondering how the findings and insights derived from the research might influence the company’s ongoing development of its software solutions. The research’s dual role, offering an in-depth analysis of the software’s functionalities while simultaneously assisting the company in refining its technology, necessitated reflecting on the parties involved, including the company, the individuals impacted by the technology, and the broader societal implications of data matching in the securitization of migration and border control.
On the one hand, during my interviews with Immigration and Naturalization Service (IND) personnel, I observed the potential advantages of data matching in streamlining bureaucratic processes, benefiting both employees and applicants. The implementation of data matching technology assisted in the execution of daily tasks, offering support in various aspects of their work. However, my observations also revealed that particular system design choices could unintentionally impede efficient operations. For instance, I observed that there was no automatic check to determine whether an applicant existed within the agency’s database. This seemingly minor technical oversight sometimes led to the creation of duplicate data entries, subsequently necessitating additional efforts to resolve these duplicates at a later stage. Nevertheless, rectifying this issue is not straightforward, as it requires a clear and reliable definition of what constitutes a duplicate record. Additionally, duplicate entries could raise suspicions of fraudulent activities on the part of applicants. These issues can often stem from previous data entry errors rather than malicious intent, underscoring the complexities of data matching in sensitive contexts like migration and identity management.
On the other hand, ethical considerations related to the development of data matching software arose in Chapter 6. One area of concern is the use of name matching functionalities within the software, which involves the utilization of name databases. These databases contain a wide range of names for identification purposes, including variations of Arabic and Asian names. The development of name matching technology, influenced by linguistic and transliteration differences, can potentially flag names from these groups as potential matches disproportionately. This overrepresentation could lead to the perpetuation of stereotypes and marginalization of specific individuals. Therefore, evaluating the potential consequences and unintended effects of such technological solutions is essential. Unfortunately, such systems often operate as opaque “black boxes” that are not transparent to the public, which is a significant concern, especially when proprietary technologies of commercial entities are involved. Therefore, the “multi-temporal sampling” method in Chapter 6 also acts as a way to examine the characteristics of such systems by studying specific moments in their development.
7.7 Future research directions
There are several areas that require further research to fill the knowledge gaps, anticipate emerging technological developments, advance methodologies, encourage interdisciplinary collaborations, and explore practical applications of data matching and other identification technologies in transnational commercialized security infrastructures.
Chapter 4 emphasized the potential benefits of alternative methods and tools for analyzing identification systems, which highlight the need for further research in this area. To advance research on identification technology, it is recommended that future studies continue incorporating and analyzing the technical intricacies of identification systems. The methods used in Chapter 4 combined both qualitative and quantitative approaches to analyze one such aspect: data models. The Ontology Explorer (OE) method and tool developed and applied in this dissertation required interdisciplinary collaboration across various fields, including computer science, sociology, law, ethics, and policy. Future research should actively promote collaborations that engage researchers from diverse domains, as this can provide multifaceted perspectives and yield novel insights. The OE method and tool were instrumental in analyzing the similarities and differences among various authorities’ data models. Future research could analyze the mechanisms that hinder or facilitate data interoperability, thereby enhancing our understanding of how data models affect identification in transnational infrastructures. Lastly, the OE offers flexibility that extends beyond the scope of analyzing data models in population management and security, enabling its application in other contexts as well.
Chapter 5 examined the interplay between data matching technologies and re-identification processes, revealing both the potential benefits and challenges of these technologies. Future research can delve deeper into the evolving technologies used in data matching practices, particularly considering the influence of emerging technologies like generative artificial intelligence and biometrics on identification. For instance, AI-driven data matching systems can present unique challenges for individuals who may be falsely identified, as explaining and comprehending these matches can introduce further complexity, potentially hindering people’s potential for disputing false positive results. It is imperative to scrutinize these technologies’ potential benefits and associated risks, all while considering ethical and policy implications. Establishing robust governance mechanisms will be essential to safeguard individual rights and ensure accountability and transparency, such as how matching results are calculated and presented.
Chapter 6 provides a detailed analysis of the evolution of a data matching system, highlighting its dependence on various factors and the significant role played by different actors in shaping its development. Future research should continue exploring identification systems’ evolution and practical applications. Such an approach can offer valuable insights into changes in the broader domain of identification, such as the role of commercial actors and shifts in securitization. Additionally, combining academic theories with practical experiences can show how specific identification technologies function across various domains, including healthcare, finance, and public administration. Collaborative efforts across multiple disciplines are essential, as experts from diverse fields can significantly contribute to our understanding of identification systems.
7.8 Final reflections and concluding remarks
This dissertation embarked on a journey driven by an initial curiosity about the interplay between data matching and linking processes and the blind spots faced by authorities. It aimed to decipher how authorities navigate the complexities of identifying individuals even when faced with incomplete data, aliases, and false identities. By mapping the theoretical landscape to untangling the intricacies of matching identity data in the domain of transnational commercialized security infrastructures, it laid the foundation for the research questions. Furthermore, it introduced a methodological framework to not only examine data matching but also harness data matching as a research resource, opening new avenues for investigating the challenges posed by identification processes across national borders, notably in the context of border security and migration control.
First, introducing a novel method and tool in this study facilitated a comprehensive exploration of the differences and commonalities embedded in national and transnational security infrastructures’ data models, shedding light on the varying knowledge and assumptions about individuals on the move. Within the categories of data found in these models, one can discern authorities’ imaginaries, revealing how they conceptualize and enact individuals in distinct ways. This conceptualization also has relevance for the realm of data matching, as the integration and interoperability of data hinge on the connections between these data models, playing a pivotal role in matching individuals’ data across diverse data sources.
Secondly, the focus on data matching made it possible to shift attention beyond the initial registration and identification phases to re-identification practices across space and time. Re-identification was introduced as a concept that entails the continuous utilization and interconnection of data from diverse sources to determine if multiple sets of identity data correspond to a singular real-world individual. However, examining the iterative processes of re-identifying applicants throughout different stages of bureaucratic procedures highlighted that, while integrating data matching tools for re-identification can mitigate friction in re-identification, it can also introduce certain associated costs.
Thirdly, tracing the sociotechnical changes of a data matching system allowed us to detect the circulation of knowledge, technologies, and practices involved in data matching over time and across various organizations. The data matching system’s use in identification for migration and border control was shown as not predetermined. Instead, we found that the system’s securitization was influenced by specific choices made by key actors and the changing sociotechnical landscape of identification and data matching. Through this analysis, we encountered unexpected connections between software suppliers and their customers, highlighting the intricate networks that underlie identification infrastructures. These findings challenge simplistic narratives and underscore the necessity of adopting a more performative understanding of the sociotechnical dynamics that shape identification practices. By emphasizing contingency in the system’s evolution, we found moments where the outcomes in development are not predetermined but rather influenced by the specific circumstances, factors, and decisions made by the individuals and entities involved, thereby expanding our comprehension of the processes underpinning identification technologies’ development.
In conclusion, this dissertation has sought to unravel the intricate dynamics of practices and technologies for matching identity data in the sensitive domains of migration management and border control, elucidating how these processes both shape and are shaped by the broader, transnational commercialized security infrastructures. This dissertation’s performative approach to data matching has underscored that the intersection of identification technologies, data matching, and securitization is far from deterministic or linear. Instead, it is shaped by contingent choices and sociotechnical dynamics, emphasizing that alternative courses of action have been and continue to be possible. In this ever-evolving landscape, the actors involved play a pivotal role in shaping the direction of identification technologies, ultimately influencing the complex interplay of identity, security, and migration in our contemporary world. In a world where borders, identities, and security technologies are becoming more intertwined, this research highlights the importance of maintaining a critical and adaptable approach when examining identification systems.
References
de Goede, Marieke, Esmé Bosma, and Polly Pallister-Wilkins, eds. 2020. Secrecy and Methods in Security Research: A Guide to Qualitative Fieldwork. London & New York: Routledge.
Olivieri, Lorenzo. 2023. Temporalities of Migration. Time, Data Infrastructures and Intervention. Padova: Padova University Press.
Pelizza, Annalisa, and Wouter Van Rossem. 2023. “Scripts of Alterity: Mapping Assumptions and Limitations of the Border Security Apparatus Through Classification Schemas.” Science, Technology, & Human Values 0 (0): 1–33. https://doi.org/10.1177/01622439231195955.
Pollock, Neil, and Robin Williams. 2009. Software and Organisations: The Biography of the Enterprise-Wide System or How SAP Conquered the World. Routledge Studies in Technology, Work and Organisations 5. London; New York: Routledge.
Van Rossem, Wouter. 2021. “Ontology-Explorer.” Zenodo. https://doi.org/10.5281/zenodo.4899316.
As the researcher conducting this study, I acknowledge that my background played a meaningful role in assisting me in accessing the fieldwork. Being a white male with an academic background in computer science and experience in software development, I was able to speak the same language as my interlocutors, which facilitated communication and ensured a better understanding of their perspectives. Moreover, my knowledge of Dutch and being based in the Netherlands helped to establish trustworthiness with the participants. It is worth mentioning that the research was conducted under the EU-funded project Processing Citizenship, which added to its credibility and relevance.↩︎