On the Temporal Betweenness Centrality
Antonio Cruciani
antonio.cruciani@gssi.it
Graphs are ubiquitous
they evolve over time
many of them share a common feature:
Social
Biological
Transport
....
....
A temporal graph is an ordered triple \(\mathcal{G} = (V,\mathcal{E}, T) \), where:
Temporal Graph
Temporal Paths
A temporal path is a sequence \(P = \langle (u_1,v_1,t_1),(u_2,v_2,t_2),\dots ,(u_k,v_k,t_k)\rangle\) of temporal arcs in which \(t_i < t_{i+1} \) and \(v_i = u_{i+1}\) for all \(i\in [k-1]\) and each node in \(V\) is visited at most once.
\(P = \langle s \rightarrow h\rightarrow z \rangle \)
Shortest
Foremost
Fastest
\(P = \langle s \rightarrow u \rightarrow v \rightarrow w \rightarrow z \rangle \)
\(P = \langle s \rightarrow x\rightarrow y\rightarrow z \rangle \)
A (static) Centrality Zoo
The temporal betweeness centrality of each node \(v\in V\) is defined as
$$b^{(\star)}_v = \frac{1}{n(n-1)}\sum_{\substack{s, z\in V\\ s\neq z\neq v}} \frac{\sigma^{(\star)}_{s,z}(v)}{\sigma^{(\star)}_{s,z}}\in [0,1]$$
#P-Hard :(
\(\mathcal{O}(|V|^3|T|^2)\)
\(\mathcal{O}(|V| |\mathcal{E}|\log |\mathcal{E}|)\)
\(\mathcal{O}(|V||T|+|\mathcal{E}|)\)
\(\mathcal{O}(|V| + |\mathcal{E}|)\)
Time
Space
Two Pass algorithm:
For each node \(s\in V\)
Step 1 (explorative step)
Can I find a nice heuristic to ``approximate`` such a centrality measure?
Step 2 (ideal step)
Can I design a good approximation algorithm ?
Proxying Betweenness Centrality Rankings in Temporal Networks
SEA 2023
Ruben Becker,Pierluigi Crescenzi, Antonio Cruciani, and Bojana Kodric
Proxies
We say that a centrality measure \(B\) is a proxy for the centrality measure \(A\) if its ranking is "close" to the one of \(A\)
\(R_B\approx R_A\)
Here we consider one of proxy
Local
Centrality values of nodes are completely determined by the induced subgraph of their neighborhood (including themselves)
Desired properties for a proxy
Fast
Good
Ranking: list of nodes sorted in descending order of centrality value
A novel temporal degree notion: Pass-Through Degree
Existing temporal degree definitions:
Why PTD?
Can be computed in \(\mathcal{O}(|\mathcal{E}|\log |E|) = \tilde{\mathcal{O}}(|\mathcal{E}|)\) time using \(\mathcal{O}(|V|+|E|)\) space
Pass-Through Degree:
\(d_{und}(v)\)
\(d_{avg}(v)\)
\(\text{t-ptd}(v)\)
\(k+1\)
\(1\)
\(\sqrt{k}\)
\(1\)
\(1\)
\(1\)
\(k+1\)
\(1\)
\(\sqrt{\frac{k(k+1)}{2}}\approx k\)
\(N_t(v) = \{u\in N(v):(v,u,t)\in \mathcal{E}\}\)
Ego Betweenness
The temporal ego-betweenness of \(u\) is the temporal betweenness computed on the temporal graph induced by the in- and out-neighbors of \(u\)
ONBRA
Experimental setting: temporal networks
Local proxies (rank correlation)
W. Kendall's \(\tau\)
and TOP(50)
Local proxies (running time)
Summary of the experiments
MANTRA: Temporal Betweenness Centrality Approximation through Sampling
ECML-PKDD
2024
Antonio Cruciani
$$\textbf{Pr}\left(\sup_{v\in V} \left|b_v^{(\star)}-\tilde{b}_v^{(\star)}\right|\leq \varepsilon\right)\geq 1-\delta$$
MANTRA in three lines
Given a set of functions \(\mathcal{F}\) from a domain \(\mathcal{D}\) and a sample \(\mathcal{S}\)
$$\textbf{SD}(\mathcal{F},\mathcal{S}) =\sup_{f\in \mathcal{F}}{\left| a_f(\mathcal{S})-\mu_f\right|} $$
$$a_f(\mathcal{S}) = \frac{1}{|\mathcal{S}|} \sum_{i = 1}^{|\mathcal{S}|}f(s_i) $$
$$\mu_f = \mathbb{E}[a_f(\mathcal{S})] $$
The Supremum Deviation
Goal:
$$\textbf{SD}(\mathcal{F},\mathcal{S})$$
To find an Upper bound for
How do we estimate the SD of \(\mathcal{S}\) using a single sample?
\(\mathcal{S}=\)
\(\mathcal{S}=\)
\(\sup_{f\in\mathcal{F}}\left( a_f(\mathcal{S}_1) - a_f(\mathcal{S_2})\right)\)
\(\mathcal{S}=\)
\(\displaystyle\frac{2}{|\mathcal{S}|}\sup_{f\in\mathcal{F}}\sum_{i = 1}^{|\mathcal{S}|}\lambda_i f(z_i)\)
\(\mathcal{S}=\)
\(\mathcal{S}=\)
$$\bm{\lambda}=(\lambda_1,\lambda_2,\dots,\lambda_{|\mathcal{S}|}) \in \{-1,+1\}$$
\(\displaystyle RA(\mathcal{F},\mathcal{S}) = \frac{1}{|\mathcal{S}|}\mathbb{E}_{\bm{\lambda}}\left[\sup_{f\in\mathcal{F}}\sum_{i = 1}^{|\mathcal{S}|}\lambda_i f(z_i)\right]\)
\(\mathcal{S}=\)
\(\mathcal{S}=\)
$$\bm{\lambda}=(\lambda_1,\lambda_2,\dots,\lambda_{|\mathcal{S}|}) \in \{-1,+1\}$$
Rademacher Complexity captures the idea of the previous slide by considering the expectation of the last formula with respect to the random choice of \(\bm{\lambda}\)
\(\displaystyle RA(\mathcal{F},\mathcal{S}) = \frac{1}{|\mathcal{S}|}\mathbb{E}_{\bm{\lambda}}\left[\sup_{f\in\mathcal{F}}\sum_{i = 1}^{|\mathcal{S}|}\lambda_i f(z_i)\right]\)
\(\mathcal{S}=\)
\(\mathcal{S}=\)
$$\bm{\lambda}=(\lambda_1,\lambda_2,\dots,\lambda_{|\mathcal{S}|}) \in \{-1,+1\}$$
Rademacher Complexity captures the idea of the previous slide by considering the expectation of the last formula with respect to the random choice of \(\bm{\lambda}\)
\(\displaystyle SD(\mathcal{F},\mathcal{S}) \leq 2\mathbb{E}_{\mathcal{S}\sim\mathcal{D}_r}\left[RA(\mathcal{F},\mathcal{S})\right]+\sqrt{c\frac{2\ln (2/\delta)}{r}}\)
c-Monte Carlo Emprical Rademacher Averages (c-MCERA)
$$\bm{\lambda} \in \{-1,+1\}^{c\times|\mathcal{S}|}, r = |\mathcal{S}|$$
$$R^c_{r}(\mathcal{F},\mathcal{S},\bm{\lambda}) = \frac{1}{c}\sum_{j=1}^c \sup_{f\in\mathcal{F}}\frac{1}{r} \sum_{s_i\in\mathcal{S}}\lambda_{j,i} f(s_i) $$
With Probability at least
$$1-\delta$$
$$\mathcal{W}_\mathcal{F}(\mathcal{S})= \sup_{f\in \mathcal{F}} \frac{1}{r}\sum_{s_i\in\mathcal{S}}(f(s_i))^2$$
Empirical
Wimpy Variance
\(R = \tilde{R} + \frac{\ln(4/\delta)}{r}+\sqrt{\left(\frac{\ln(4/\delta)}{r}\right)^2 + \frac{2\ln(4/\delta)\tilde{R}}{r}}\)
\(\xi = 2R + \sqrt{\frac{2 \ln(4/\delta)\left(\hat{v}+4R\right)}{r}}+\frac{\ln(4/\delta)}{3r}\)
$$\textbf{SD}(\mathcal{F},\mathcal{S})\leq \xi$$
Variance Dependent Bound
\(\tilde{R} = R^c_{r}(\mathcal{F},\mathcal{S},\bm{\lambda}) + \sqrt{\frac{4 \mathcal{W}_{\mathcal{F}}(\mathcal{S})\ln(4/\delta)}{cr}}\)
\(f_v(s,z) =\frac{\sigma^{(\star)}_{s,z}(v)}{\sigma^{(\star)}_{s,z}}\)
\(\mathcal{D} = \{(s,z)\in V\times V : s\neq z\}\)
$$\tilde{b}^{(\star)}_v =\frac{1}{|\mathcal{S}|}\sum_{(s,z)\in \mathcal{S}} f_v(s,z) $$
Functions
Domain
Sample Mean
$$\mathcal{W}_\mathcal{F}(\mathcal{S})= \sup_{v\in V} \frac{1}{|\mathcal{S}|}\sum_{(s,z)\in\mathcal{S}}(f_v(s,z) )^2$$
Emp. Wimpy Variance
\(\displaystyle f_v(s) =\frac{1}{n(n-1)}\sum_{z\neq s}\frac{\sigma^{(\star)}_{s,z}(v)}{\sigma^{(\star)}_{s,z}}\)
Sample \(s\) u.a.r. from \(V\)
Perform a full temporal BFS from \(s\)
Sample \(s\neq z\) u.a.r. from \(V\)
Sample \(s\neq z\) u.a.r. from \(V\)
\(f_v(s,z) =\frac{\sigma^{(\star)}_{s,z}(v)}{\sigma^{(\star)}_{s,z}}\)
Given \(\varepsilon,\delta \in (0,1)\)
Union Bound + Hoeffding's Bound
\(\displaystyle\left|\mathcal{S}\right| = \frac{1}{2\varepsilon^2} \log\left(\frac{2n}{\delta}\right)\)
\(\textbf{SD}(\mathcal{F},\mathcal{S})\leq \varepsilon\) with probability \(\geq 1-\delta\)
Given \(\varepsilon,\delta \in (0,1)\)
Maximum number of nodes in the \((\star)\)-temporal optimal path
Vapnik–Chervonenkis (VC) Dimension
\(\displaystyle\left|\mathcal{S}\right| = \frac{0.5}{\varepsilon^2} \left(\lfloor \log D^{(\star)}-2\rfloor+1+\ln\left(\frac{1}{\delta}\right)\right)\)
\(\textbf{SD}(\mathcal{F},\mathcal{S})\leq \varepsilon\) with probability \(\geq 1-\delta\)
Given \(\varepsilon,\delta \in (0,1)\)
\(\displaystyle\left|\mathcal{S}\right|\in \mathcal{O}\left(\frac{\hat{v}+\varepsilon}{\varepsilon^2}\ln\left(\frac{\rho^{(\star)}}{\delta\hat{v}}\right)\right) \)
\(\max_{v\in V}\textbf{Var}(\tilde{b}^{(\star)}_v) \leq \hat{v}\)
\(\displaystyle \frac{1}{n(n-1)}\sum_{s,z\in V}\left|\textbf{Int}(tp_{sz})\right| \)
Variance-Aware
\(\textbf{SD}(\mathcal{F},\mathcal{S})\leq \varepsilon\) with probability \(\geq 1-\delta\)
All-Pairs \((\star)\)-Temporal Paths
\(\mathcal{O}(|V||\mathcal{E}|\)) :((
Let's use sampling to approximate these values !!!
Very high-level idea
With a sample of \(k=\Theta\left(\frac{\ln n}{\varepsilon^2}\right)\) nodes, we can approximate w.h.p.
Perform \(k\) \((\star)\)-Temporal BFS from random nodes
Temporal Connectivity Rate
Bootstrap Phase
Estimation Phase
All the algorithms have been implemented in Julia
Every algorithm is run 10 times
\(\delta = 0.1\)
\(\varepsilon\in\{0.1,0.07,0.05,0.01,0.007.0.005,0.001\}\)
Parameters:
\(c = 25\)
Machine used: Server with Intel Xeon Gold 6248R (3.0GHz)
32 cores and 1TB RAM thank you CySTAR :)
ONBRA needs more than 1TB of RAM!!!
MANTRA
STBProxy