A Supplement to Chapter 4

A.1 Definitions 1: Graphs, nodes, links

We recall that a graph is formally defined as a pair \(G=(N, L)\). Where \(N\) is a set whose elements are the nodes (also called vertices or points), and \(L\) is a set of links (also called edges) which are ordered pairs of distinct nodes. \(L\) is a subset of the set of all possible links between nodes, where in our case a node cannot be associated to itself: \(L \subseteq\left\{(x, y)|(x, y) \in N^{2} \wedge x \neq y\right\}\). For a link \((x, y)\), \(x\) and \(y\) are called the endpoints of a link. In our approach, links are undirected and a link represents an occurrence of \(x\) in \(y\). As a shorthand we write the link between \(x\) and \(y\) as \(l_{xy}\). For example, the link \(l_{xy}\) can represent the occurrence of a category ‘date of birth’ (node \(x\)) in the document group ‘Eurodac’ (node \(y\)). All categories present in the document group have corresponding nodes and links in the graph. In fact, we can treat each data model (i.e., document group) as a separate graph. The complete graph is then in effect the combination of different data models. For each data model as a separate graph \(G_{i}\), the combined graph is the disjoint union of graphs: \(G=\bigcup_{i \in I} G_{i}\).

A.2 Definitions 2: Attributes

In our graph model, nodes are objects composed of attributes that are used to keep metadata of nodes. These attributes are formulated using the notation \(n.a\) for an attribute \(a\) of a node \(n\). The most important metadata kept for a node are \(n.name\) and \(n.type\), where \(name\) is the natural language label of the node. The attribute \(type\) can only take a limited set of values: \(type \in \{category, categoryValue, codeGroup, document, documentGroup\}\).

A.3 Definitions 3: Graph drawing

A drawing of a graph \(G=(N,L)\) is a collection of points in a two-dimensional space. Each point \(p_i\) with coordinates \(x\) and \(y\) is the position of the node \(n_i\) in the layout. Whenever there exists a link \((p_i, p_j) \in L\), a line is drawn between points \(p_i\) and \(p_j\). The task of the layout algorithm is to find a positioning of points so that specific criteria are optimally met. Examples of commonly used criteria are: nodes should not overlap, neighbouring nodes should be grouped together, the number of crossing link should be minimised. Each algorithm and set of criteria has its own benefits and drawbacks.

A.4 Definitions 4: Degree & neighbourhood

For a node \(n_j\)n the degree is defined as the number of links a node has: \(deg(x) = \left|\{ n_{j}: l_{ij} \in L \right\}|\). The set of linked nodes is called the neighbourhood of a node. The neighbourhood \(H_{i}\) for a node \(n_{j}\) is defined as: \(H_{i}=\left\{n_{j}: l_{i j} \in L \vee l_{j i} \in L\right\}\).

A.5 Definitions 5: Betweenness centrality

The betweenness centrality of a node \(n\) is defined as \(bc(n)=\sum_{s \neq n \neq t} \frac{\sigma_{s t}(n)}{\sigma_{s t}}\). Where \(\sigma_{st}\) is the total amount of shortest paths from node \(s\) to node \(t\) and \(\sigma_{st}(n)\) is the amount of those paths that pass through \(n\). A path is a sequence of nodes, where each pair of nodes in the sequence is linked. The shortest path is the path between two nodes \(s\) and \(t\) that traverses the smallest number nodes. The equation for betweenness centrality takes into account that there may be several possible paths from \(s\) to \(t\), with only some passing through \(n\).

A.6 Definitions 6: Presence

The presence of all categories in a document group node \(n_x\) is a set of all category nodes \(Categories(x) = \left\{ n_y \in N: (l_{xy} \in L \vee l_{yx} \in L) \wedge n_y.type=category\right\}\). The presence of a category \(n_x\) in a document group is the set of nodes of the type document group for which there exist a link between this category and the document group. Formally defined as: \(Presence(x, documentGroup)=\left\{ n_y \in N : l_{xy} \in Links \wedge y.type = docGroup\right\}\).

A.7 Definitions 7: Intersection and difference

The absence of categories between a \(docGroup_1\) and \(docGroup_2\) is the set of categories present in the second document group minus the set of categories present in the first. In our notation: \(Absence(docGroup_1, docGroup_2) = \left\{ Categories(docGroup_2) \setminus Categories(docGroup_1)\right\}\). The categories that are common between those same two document groups are determined using the intersection of the sets of categories that are present in either: \(CommonCodes(docGroup_1, docGroup_2)=docGroup_1 \cap docGroup_2\). This operation is not limited to two sets. The intersection between more sets can be notated as \(\bigcap_{i=1}^n Presence(docGroup_i)\).

A.8 Table presence of code groups for authorities

Table A.1: Presence of code groups for EU (Eurodac, SIS, VIS), Greek (HRF), German (GRF), and their relative degree and betweenness centrality.
Degree	Code group	EU	Greek	German
3
	asylum	1
	biometrics	1
	biometrics: photograph data	1
	birth data	1
	birth data: date of birth	1
	birth data: place of birth	1
	education	1
	name: surname	1
	nationality	1
	occupation data	1
	residence	1
	sex & gender	1
	travel document data	1
2
	additional info / comments	1		0
	biometrics: fingerprint data	1	0	1
	contact info	0	1
	criminal offence data	1	0	1
	education: extent	0	1
	family status data	0	1
	language	0	1
	linking data: EU	0	1
	name: earlier/other names	1	0	1
	name: forename	1	0	1
	occupation data: current	1	0	1
	parents data	1		0
	personal ties	1		0
	personal ties: in EU	1		0
	procedure data	1		0
	religion	0	1
	residence: previous	0	1
	travel: relocation	1		0
1
	application data	0	1	0
	asylum: rejection	0		1
	citizenship	0		1
	country of origin	0		1
	date of application	1	0
	date of entry	0	1	0
	date of exit	1	0
	ethnicity	0	1	0
	integration	0		1
	language: speaking	0	1	0
	law enforcement	0		1
	law enforcement: extradition	0		1
	law enforcement: investigation	0		1
	law enforcement: unauthorized entry and residence	0		1
	linking data: MS	1	0
	linking data: responsibility	0		1
	name: variations	0		1
	occupation data: past	0		1
	operator data	1	0
	registration status	0		1
	residence: current	0		1
	residency request data	0		1
	restrictions: movement	0	1	0
	stay data	1	0
	temporary accomodation/housing	0	1	0
	travel data	1	0
	travel document: validity: expiration	1	0
	travel document: visa-related data	1	0
	travel: rejection or removal	0		1
	vulnerability	0	1	0