Chapter 3 Locally dependent exponential random network models

A locally dependent random network with neighborhoods \(A_1, A_2, \dots, A_K\) and two binary node attributes, represented as gray or black and circle or diamond.

Figure 3.1: A locally dependent random network with neighborhoods \(A_1, A_2, \dots, A_K\) and two binary node attributes, represented as gray or black and circle or diamond.

To begin, we define a random network, developed by Fellows & Handcock (2012). By way of motivation, note that in the ERGM the nodal variates are fixed and are included in the model as explanatory variables in making inferences about network structure. Furthermore, there is a class of models that we do not discuss here that consider the network as a fixed explanatory variable in modeling (random) nodal attributes. It is not difficult to come up with situations where a researcher would like to jointly model both the network and the node attributes. Thus we define a class of networks in which both the network structure and the attributes of the individual nodes are modeled as random quantities.

Definition 3.1 (Random network) Let \(\mathcal{N}\) be a countable collection of nodes (which we take to be a subset of \(\mathbb{N}\)). Let \(Y\) be the random graph on the nodes \(\mathcal{N}\) with support \(\mathcal{Y}\). Then for each element \(n \in \mathcal{N}\), let there be a corresponding random vector of node attributes \(X_n \in \mathbb{R}^q\) and collect them into the \(n \times q\) random matrix \(X\) with support \(\mathcal{X}\). The is the random variable \(Z = (Y, X)\) with support \(\mathcal{Z} = \mathcal{Y} \times \mathcal{X}\).

Now we wish to model these objects, so we follow the ERGM and turn to the exponential family (Fellows & Handcock, 2012). We write \[\begin{equation} P(Y = y, X = x| \eta) \propto e^{\eta \cdot g(y, x)}. \tag{3.1} \end{equation}\] This looks very similar to the ERGM, but note the explicit dependence on the quantity \(x\). More concretely, we can include terms that depend only on \(x\), which would have no place in an ERGM. We can further express the difference of the two models by rewriting the left hand side of as \[\begin{equation*} P(X = x, Y = y | \eta) = P(Y = y|X = x, \eta)P(X=x|\eta), \end{equation*}\] where the first term on the right hand side is the ERGM and the second term is \[\begin{equation*} P(X = x|\eta) = \frac{C(\mathcal{Z}, \eta, x)}{C(\mathcal{Z}, \eta)}, \end{equation*}\] where \[\begin{equation*} C(\mathcal{Z}, \eta, x) = \int_{\{(v, u) \in \mathcal{Z} : u = x\}} P(X = x | \eta). \end{equation*}\]

Roughly, this is the proportion of the total sample space \(\mathcal{Z}\) that is possible with \(x\) fixed. This is not, in general, equal to one, so the ERNM is not equal to the ERGM (Fellows & Handcock, 2012).

3.1 Definitions and notation

We will consistently refer to a set of nodes, \(A_{k}\), as the \(k\)-th neighborhood, with an uppercase \(K\) representing the total number of neighborhoods and a lowercase \(k\) representing a specific neighborhood. The variable \(\mathcal{N}\) will refer to the domain of a random network, usually the union of a collection of neighborhoods. Nodes within the network will be indexed by the variables \(i\) and \(j\), with \(Z_{ij} = (\{Y_{ij}, X_i, X_j\})\), where \(Y_{ij}\) is referring to the edge between nodes \(i\) and \(j\), and \(X_{i}\) and \(X_j\) refer to the random vectors of node attributes. Abstracting this further, \(i\) and \(j\) will also refer to tuples of nodes, so we will write \(\vec{i} = (i_1, i_2, \dots, i_q) \in \mathcal{N} \times \mathcal{N} \times \dots \times \mathcal{N}\). The variables \(Z\) and \(Y\) will also often carry a subscript of \(W\) or \(B\) (for example \(Y_{Bij}\)) which emphasizes that the edge from \(i\) to \(j\) is within or between neighborhoods, respectively. Finally, for lack of a better notation, the indicator function \(\mathbb{I}_{B}(i,j)\) (where \(B\) is for between) is one if \(i \in A_{l}\) and \(j \in A_{p}\) where \(l \neq p\), and zero otherwise.

Definition 3.2 (Local dependence property) Extending the definition in Schweinberger & Handcock (2015), a random network model satisfies if there is a partition of the node set \(\mathcal{N}\) into neighborhoods \(A_1, A_2, \dots, A_K\) for \(K \geq 2\) such that the network variables \(Z_{ij}\) are dependent when \(i, j \in A_{\ell}\) for some \(\ell\) and independent otherwise. We also require that nodal attributes depend only on the attributes of nodes within the same neighborhood. Thus, the probability measure can be written as \[\begin{equation} P(Z \in \mathbf{Z}) = \prod_{k = 1}^{K}\left[ P_{kk}(Z_{kk} \in \mathbf{Z}_{kk}) \prod_{\ell = 1}^{k-1} P_{kl}(Z_{kl} \in \mathbf{Z}_{kl}, Z_{lk} \in \mathbf{Z}_{lk}) \right], \end{equation}\] where \(Z_{mn}\) is the subnetwork consisting of the random graph ties from nodes in \(A_m\) to those in \(A_n\) and the appropriate node variables and \(\mathbf{Z}_{mn}\) is a subset of the sample space of \(Z_{mn}\). Furthermore, the measures \(P_{kk}\) can induce dependence between dyads while the measures \(P_{kl}\) induce independence.

Definition 3.3 (Sparsity) Also from Schweinberger & Handcock (2015), we say a locally dependent random network is if there is some \(\delta > 0\) and some \(C > 0\) such that \[\begin{equation} \mathbb{E}\left( \left| Y_{Bij} \right|^{p} \right) \leq Cn^{-\delta}, \qquad (p = 1, 2) \end{equation}\] where \(n = |\mathcal{N}|\) and \(Y_{Bij}\) signifies the tie between neighborhoods from node \(i \in A_{l}\) to node \(j \in A_{m}\) where \(l \neq m\).

3.2 Preliminary theorems

In proving our theorems, we will make use of several other central limit theorems, all of which can be found in Billingsley (1995). The first is the Lindeberg-Feller central limit theorem for triangular arrays. The second is Lyapounov’s condition, which gives a convenient way to show that the Lindeberg-Feller theorem holds. Finally, we make use of a central limit theorem for dependent random variables. For the sake of brevity, in this section we state each of these without proof.

Theorem 3.1 (Billingsley, 1995 Theorem 27.2) For each \(n\) take \(X_{n1}, \dots, X_{nr_n}\), independent with \(\mathbb{E}(X_{ns}) = 0\) for all \(n\) and \(s\) (where no generality is lost in this assumption). Then we have \(\sigma^{2}_{ns} = \operatorname{Var}(X_{ns}) = \mathbb{E}(X_{ns}^2)\). Next, set \(s^{2}_{s} = \sum_{s = 1}^{r_n} \sigma^{2}_{ns}\). Now set \[\begin{equation*} S_{n} = X_{n1} + \dots + X_{nr_n}. \end{equation*}\] If the , \[\begin{equation} \lim_{n \to \infty} \sum_{s = 1}^{r_n} \frac{1}{s^{2}_{n}} \int_{|X_{ns} \geq \epsilon s_n} X^{2}_{ns} = 0 \end{equation}\] holds for all \(\epsilon > 0\), then \(S_n \xrightarrow{\mathrm{d}} N(0,1)\).

Theorem 3.2 (Billingsley, 1995 Theorem 27.3) Let \(S_n\) be as before. Then if , \[\begin{equation} \lim_{n \to \infty} \sum_{s = 1}^{r_n} \frac{1}{s_n^{2 + \delta}} \mathbb{E} \left( \left| X_{ns} \right|^{s + \delta} \right) = 0 \tag{3.2} \end{equation}\] holds for some \(\delta > 0\), then the Lindeberg condition also holds. Therefore \(S_{n} \xrightarrow{\mathrm{d}} N(0,1)\).

Theorem 3.3 (Billingsley, 1995 Theorem 27.4) Suppose that \(X_1, X_2, \dots\) is stationary and \(\alpha\)-mixing with \(\alpha_n = O(n^{-5})\) and that \(\mathbb{E}(X_n) = 0\) and \(\mathbb{E}(X_n^{12}) < \infty\). Note that the condition on \(\alpha\) is stronger than what we require. Our \(X_n\) will be \(M\)-dependent, meaning that each \(X_n\) is independent of all \(X_m\) where \(|n - m| > M\). It is true that an \(M\)-dependent sequence is \(\alpha\)-mixing for constant \(\alpha\). Then, if \(S_n = X_1 + \dots X_n\), we have \[\begin{equation} \frac{\operatorname{Var}(S_n)}{n} \to \sigma^2. \end{equation}\] Then, if \(\sigma > 0\), we have \(S_n \xrightarrow{\mathrm{d}} N(0,1)\).

The final theorem is Slutsky’s theorem, a classic result of asymptotic theory in statistics.

Theorem 3.4 (Wasserman, 2004 Theorem 5.5) Let \(X_n, X, Y_n\) be random variables and let \(c\) be a constant. Then, if \(X_n \xrightarrow[]{\mathrm{d}} X\) and \(Y \xrightarrow[]{\mathrm{p}} c\) we have \(X_n + Y_n \xrightarrow[]{\mathrm{d}} X + c\) and \(X_n Y_n \xrightarrow[]{\mathrm{d}} cX\).

3.3 Consistency under sampling

With these in place, we attempt to extend a result about locally dependent ERGMs proven by Schweinberger & Handcock (2015) to locally dependent ERNMs. In short, this theorem states that the parameters estimated by modeling a small sample of a larger network can be generalized to the overall network. It was shown by Shalizi & Rinaldo (2013) that most useful formulations of ERGMs do not form projective exponential families in the sense that the distribution of a subgraph cannot be, in general, recovered by marginalizing the distribution of a larger graph with respect to the edge variables not included in the smaller graph. Hence, we are unable to generalize parameter estimates from the subnetwork to the total network.

To show that locally dependent ERNMs do form a projective family, let \(\mathbb{A}\) be a collection of sets \(\mathcal{A}\), where each \(\mathcal{A}\) is a finite collection of neighborhoods. Also, allow the set \(\mathbb{A}\) to be an ideal, so that if \(\mathcal{A} \in \mathbb{A}\), every subset of \(\mathcal{A}\) is also in \(\mathbb{A}\) and if \(\mathcal{B} \in \mathbb{A}\), then \(\mathcal{A} \cup \mathcal{B} \in \mathbb{A}\). If \(\mathcal{A} \subset \mathcal{B}\), think of passing from the set \(\mathcal{A}\) to the set \(\mathcal{B}\) as taking a larger sample of the (possibly infinite) set of neighborhoods in the larger network. Then let \(\{\mathcal{P}_{\mathcal{A}, \theta}\}_{\mathcal{A} \in \mathbb{A}}\) be the collection of ERNMs with parameter \(\theta\) indexed by the sets in \(\mathbb{A}\). For each \(\mathcal{A} \in \mathbb{A}\), let \(\mathcal{P}_{\mathcal{A}, \Theta} = \{P_{\mathcal{A}, \theta}\}_{\theta \in \Theta}\) be the collection of ERNMs on the neighborhoods in \(\mathcal{A}\) with parameter \(\theta \in \Theta\) where \(\Theta \subset \mathbb{R}^{p}\) is open. Assume that each distribution in \(\mathcal{P}_{\mathcal{A}, \Theta}\) has the same support \(\mathcal{Z}_{\mathcal{A}}\) and that \(\mathcal{A} \subset \mathcal{B}\) if and only if \(\mathcal{Z}_{\mathcal{B}} = \mathcal{Z}_{\mathcal{A}} \times \mathcal{Z}_{\mathcal{B} \setminus \mathcal{A}}\). Then, the exponential family \(\{\mathcal{P}_{\mathcal{A}, \Theta}\}_{\mathcal{A} \in \mathbb{A}}\) is projective in the sense of Shalizi & Rinaldo (2013 Definition 1) precisely when Theorem 3.6 holds.

This follows from a specific case of the general definition given by Shalizi & Rinaldo (2013). There, for every pair \(\mathcal{A}\) and \(\mathcal{B}\) with \(\mathcal{A} \subset \mathcal{B}\), they define the natural projection mapping \(\pi_{\mathcal{B} \to \mathcal{A}}: \mathcal{Z_B} \to \mathcal{Z_A}\). Informally, this mapping projects the set \(\mathcal{Z_B}\) down to \(\mathcal{Z_A}\) by simply removing the extra data. For example if \(\mathcal{B} = \{A_1, A_2\}\) and \(\mathcal{A} = \{A_1\}\) as in Figure 3.1, then the mapping \(\pi_{\mathcal{B} \to \mathcal{A}}\) is shown in Figure 3.2.

The projection mapping from \(\mathcal{B} = \{A_1, A_2\}\) to \(\mathcal{A} = \{A_1\}\).

Figure 3.2: The projection mapping from \(\mathcal{B} = \{A_1, A_2\}\) to \(\mathcal{A} = \{A_1\}\).

This is desirable because Shalizi & Rinaldo (2013) have demonstrated the following theorem.

Theorem 3.5 (Shalizi & Rinaldo, 2013 Theorem 3) If the exponential model family \(\{\mathcal{P}_{\mathcal{A} \Theta}\}_{\mathcal{A} \in \mathbb{A}}\) is projective and the log of the normalizing constant can be written as \[\begin{align} \begin{split} \log \left( C(\theta, \mathcal{Z}) \right) &= \log \left( \int_{\mathcal{Z}} e^{\theta \cdot g(z)} \;\mathrm{d}z \right) \\ &= r \left(\left| \mathcal{Z} \right| \right) a(\theta), \end{split} \end{align}\] where \(r\) is a positive, monotone increasing function of some positive measure on \(\mathcal{Z}\) and \(a\) is a differentiable function of \(\theta\), then the maximum likelihood estimator exists and is strongly consistent, meaning that the MLE, \(\hat{\theta} \xrightarrow[]{\text{a.s.}} \theta\), where \(\theta\) is the unknown parameter being estimated.

This is trivially achieved by setting \(r = 1\) for all values of \(\left| \mathcal{Z} \right|\) and setting \(a(\theta) = \log(C(\theta, \mathcal{Z}))\). We have differentiability of \(a\) with respect to \(\theta\) by a result from multivariable calculus that follows from Fubini’s theorem. From a practical perspective, this means that a researcher using this model can assume that parameters estimated from samples of a large network are increasingly good approximations for the true parameter values as the sample size increases.

Theorem 3.6 Let \(A_{1}, A_{2}, \dots\) be a sequence of neighborhoods and define the sequence \(\{\mathcal{N}_{K}\} = \bigcup_{i = 1}^{K} A_{i}\). Then let \(Z_{1}, Z_{2}, \dots\) be the sequence of locally dependent random networks on the \(\mathcal{N}_K\). For each \(Z_K\), there is the corresponding set of neighborhoods \(\mathcal{A}_K\). Let \(P_K\) be a generic probability distribution from the family \(\{\mathcal{P}_{K \theta}\}_{\theta \in \Theta}\). Let the network \[\begin{equation*} \pi_{\mathcal{A}_{K+1} \to \mathcal{A}_K}(Z_{K+1}) = Z_{K+1 \setminus K}, \end{equation*}\] with corresponding distirbution \(P_{K+1 \setminus K}\). Then \[\begin{equation} P_{K}(Z_{K} \in \mathbf{Z}_K) = P_{K+1}(Z_{K} \in \mathbf{Z}_K, Z_{K+1 \setminus K} \in \mathcal{Z}_{K+1 \setminus K}), \end{equation}\] my_dev where \(\mathcal{Z}_K\) is the sample space of the distribution \(P_{K}\) and \(\mathbf{Z}_K \subset \mathcal{Z}_K\). This is a specific case of the definition of projectibility for a general exponential family given by Shalizi & Rinaldo (2013).

Proof. This follows from the definition of local dependence, in much the same way as the proof for ERGMs by Schweinberger & Handcock (2015) does. We have \[\begin{align*} P_{K+1}(Z_{K} \in \mathbf{Z}_K, Z_{K+1 \setminus K} \in \mathcal{Z}_{K+1 \setminus K}) &= P_{K+1}(Z_{K} \in \mathbf{Z}_K) P_{K+1 \setminus K}(Z_{K+1 \setminus K} \in \mathcal{Z}_{K+1 \setminus K}) \\ &= P_{K}(Z_{K} \in \mathbf{Z}_K) (1) \\ &= P_{K}(Z_{K} \in \mathbf{Z}_K), \end{align*}\] where the measure becomes \(P_k\) from the product definition of a locally dependent random network.

3.4 Asymptotic normality of statistics

In this section we will prove that certain classes of statistics of locally dependent random networks are asymptotically normally distributed as the number of neighborhoods tends to infinity. The statistics we consider can be classified into three types: first, statistics which depend only on the graph structure; second, statistics that depend on both the graph and the nodal variates; and third, statistics that depend only on the nodal variates. The first class of statistics has already been considered by Schweinberger & Handcock (2015), but we will reproduce the proof here, as the second proof is very similar. The third class of statistics becomes normal in the limit by a central limit theorem for \(M\) dependent random variables in Billingsley (1995).

Before we begin to explicitly define each of these classes, we clarify the notation that will be used. A general statistic will be a function \[\begin{equation*} S:\mathcal{N}^d \to \mathbb{R}, \end{equation*}\] where \(\mathcal{N}^d\) is the \(d\)-fold Cartesian product of the set of nodes, \(\mathcal{N}\), with itself: \[\begin{equation*} \mathcal{N}^d = \underbrace{\mathcal{N} \times \dots \times \mathcal{N}}_{d \text{ times}}. \end{equation*}\]

Additionally, the statistic will often carry a subscript \(K\), indicating that the statistic is of the random network with \(K\) neighborhoods.

Formally, as explained in Schweinberger & Handcock (2015), the first class of statistics contains those that have the form \[\begin{equation*} S_{K} = \sum_{i \in \mathcal{N}^d} S_{Ki}, \end{equation*}\] where \[\begin{equation*} S_{Ki} = \prod_{l, p \in i} Y_{lp}, \end{equation*}\]

a product \(q\) of edge variables that captures the interaction desired. We will also make use of the set \(A_k^d,\) wich is a similar cartesian product. When we write \(i \in A_k^d\), we mean the every component of the \(d\)-tuple \(i\) is an element of \(A_k\). Furthermore, by a catachrestic abuse of notation, we will write \(l, p \in i\) to mean that \(l\) and \(p\) are vertices contained in the \(d\)-tuple \(i\). Now we are ready to prove the first case of the theorem.

Theorem 3.7 Let \(A_1, A_2, \dots, A_K\) be a sequence of neighborhoods of size at most \(M\) and form the sequence of domains \(\mathcal{N}_{K} = \bigcup_{k = 1}^{K} A_{k}\). Then let \(Z_{1}, Z_{2}, \dots, Z_{K}\) be the sequence of unweighted random networks on the \(\mathcal{N}_{K}\). Then, let the statistic \(S_{K}:\mathcal{N}_{K}^d \to \mathbb{R}\) be given. Furthermore, assume the statistic depends only on the graph variables of the \(Z_{K}\). We also assume that the \(Z_{K}\) satisfy the local dependence property and that they are \(\delta\)-sparse, for some \(\delta > d\). Finally, we require that \(\operatorname{Var}(W^{\ast}_{K}) \to \infty\), where \(W^{\ast}_{K}\) is defined in (3.3). Then \[\begin{equation} \frac{S_K - \mathbb{E}(S_K)}{\sqrt{Var(S_{K})}} \xrightarrow[K \to \infty]{\mathrm{d}} N(0, 1). \end{equation}\]

Proof. As the networks \(Z_K\) are unweighted, all edge variables \(Y_{ij}\in \{0, 1\}\). Let \(\mu_{ij} = \mathbb{E}(Y_{ij})\). Then define \(V_{ij} = Y_{ij} - \mu_{ij}\). Therefore, without loss of generality, we may work with \(V_{ij}\), which has the convenient property that \(\mathbb{E}(V_{ij}) = 0\). This means that we can similarly shift our statistics of interest, \(S_K\). Therefore, call \(S_K^{\ast} = S_{K} - \mathbb{E}(S_K)\), so that \(\mathbb{E}(S_K^{\ast}) = 0\).

Note that we can write \[\begin{equation*} S^{\ast}_{K} = W^{\ast}_{K} + B^{\ast}_{K}, \end{equation*}\] with \[\begin{equation} W^{\ast}_{K} = \sum_{k = 1}^{K} W^{\ast}_{K,k} = \sum_{k = 1}^{K} \sum_{i \in \mathcal{N}_{K}^d} \mathbb{I}(i\in A_{k}^d) S^{\ast}_{Ki} \tag{3.3} \end{equation}\] and \[\begin{equation} B^{\ast}_{K} = \sum_{i \in \mathcal{N}_{K}^d} \mathbb{I}_{B}(i) S^{\ast}_{Ki}, \end{equation}\]

where the indicator functions restrict the sums to within the \(k\)-th neighborhood and between neighborhoods of the graph, respectively. Specifically, \(\mathbb{I}_{B}(i) = 1\) when the \(d\)-tuple of nodes \(i\) contains nodes from different neighborhoods, or exactly when \(\mathbb{I}(i \ in A_{k}^d) = 0\) for all neighborhoods \(k\). By splitting the statistic into the within and between neighborhood portions, we are able to make use of the independence relation between edges that connect neighborhoods. We also have \(\mathbb{E}(W^{\ast}_{K}) = 0\) and \(\mathbb{E}(B^{\ast}_{K}) = 0\), as each quantity is a sum of random variables with mean zero.

The idea of this proof is to gain control over the variances of \(B^{\ast}_{K}\) and all the elements of the sequence \(W^{\ast}_{Kk}\). We can then show that \(B^{\ast}_{K}\) is converging in probability to zero and that the triangular array \(W^{\ast}_{K}\) satisifies Lyaponouv’s condition, and is thus asymptotically normal. Finally, Slutsky’s theorem allows us to extend the result to \(S^{\ast}_{K}\).

To bound the variance of \(B^{\ast}_{K}\), note that \[\begin{equation*} \operatorname{Var}(B^{\ast}_{K}) = \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}_{B}(i) \mathbb{I}_{B}(j) \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}). \end{equation*}\] Despite independence, some of these covariances may be nonzero if the two terms of the statistic both involve the same edge. For example, in Figure , a statistic that counted the number of edges between gray nodes plus the number of edges between diamond shaped nodes would have a nonzero covariance term because of the edge between the two nodes that are both gray and diamond shaped. To show that, in the limit, these covariances vanish, we need only concern ourselves with the nonzero terms in the sum. That is, only those terms where \(\mathbb{I}_{B}(i) \mathbb{I}_{B}(j) = 1\). This happens exactly when both \(S^{\ast}_{Ki}\) and \(S^{\ast}_{Kj}\) involve a between neighborhood edge variable. So, note that we have \[\begin{align} \begin{split} \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) &= \mathbb{E}(S^{\ast}_{Ki} S^{\ast}_{Kj}) - \mathbb{E}(S^{\ast}_{Ki}) \mathbb{E}(S^{\ast}_{Kj}) \\ &= \mathbb{E}(S^{\ast}_{Ki}S^{\ast}_{Kj}), \end{split} \end{align}\] as the expectation of each term is zero. Next we take \(Y_{l_1 l_2}\) to be one of the (possibly many) between neighborhood edge variables in this product (such that \(\mathbb{I}_B(i) = 1\) where \(i\) is any tuple containing \(l_1\) and \(l_2\)) and \(V_{l_1 l_2}\) to be the recentered random variable corresponding to \(Y_{l_1 l_2}\). Then, \[\begin{align*} \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) &= \mathbb{E}\left( \prod_{m,n \in i} V_{mn} \prod_{m,n \in j} V_{mn} \right) \\ &= \mathbb{E}\left( V_{l_1 l_2}^p \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} \right), & (p = 1,2) \end{align*}\] where we must consider the case where \(p = 1\) to account for the covariance of \(S^{\ast}_{Ki}\) and \(S^{\ast}_{Kj}\) when \(i \neq j\) and the case where \(p = 2\) to account for the variance of \(S^{\ast}_{Ki}\), which is computed in the case where \(i = j\). So, if \(p = 1\), then \[\begin{align*} \mathbb{E}\left( V_{l_1 l_2} \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} \right) &= \mathbb{E}\left( V_{l_1 l_2}\right) \mathbb{E}\left( \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} \right) \\ &= 0, \end{align*}\] by the local dependence property and the assumption that \(\mathbb{E}(V_{l_1 l_2}) = 0\). The local dependence property allows us to factor out the expectation of \(V_{l_1 l_2}\), as this edge is between neighborhoods, and therefore independent of every other edge in the graph. Now, if we have \(p = 2\), then, by sparsity and the fact that the product below is at most \(1\), \[\begin{equation*} \mathbb{E}\left( V_{l_1 l_2}^2\right) \mathbb{E}\left( \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} \right) \leq DCn^{-\delta}, \end{equation*}\] where D is a constant that bounds the expectation above. There exists such a constant because each of the V_{mn} are bounded by definiton, so a product of them is bounded. So as \(K\) grows large, the between neighborhod covariances all become asymptotically negligible. Therefore, we can conclude that \[\begin{equation} \operatorname{Var}(B^{\ast}_{K}) = \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}_{B}(i) \mathbb{I}_{B}(j) \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \leq DCn^{2d(-\delta)}. \end{equation}\] So we have \[\begin{equation} \operatorname{Var}{B^{\ast}_{K}} \to 0. \end{equation}\] Then, for all \(\epsilon > 0\), Chebyshev’s inequality gives us \[\begin{equation} \lim_{K \to \infty} P(|B^{\ast}_{K}| > \epsilon) \leq \lim_{K \to \infty} \frac{1}{\epsilon^2} \operatorname{Var}(B^{\ast}_{K}) = 0, \end{equation}\] so \[\begin{equation} B^{\ast}_{K} \xrightarrow[K \to \infty]{\text{p}} 0. \end{equation}\] Next, we bound the within neighborhood covariances, as we also have \[\begin{equation*} \operatorname{Var}(W^{\ast}_{Kk}) = \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}(i, j \in A_{k}) \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}). \end{equation*}\] As the covariance forms an inner product on the space of square integrable random variables, the Cauchy-Schwarz inequality gives us \[\begin{equation} \mathbb{I}(i, j \in A_{k}) \left| \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \right| \leq \mathbb{I}(i, j \in A_{k}) \sqrt{\operatorname{Var}(S^{\ast}_{Ki})} \sqrt{\operatorname{Var}(S^{\ast}_{Kj})}. \end{equation}\] Then, as each \(S^{\ast}_{Ki}\) has expectation zero, we know that \[\begin{equation} \operatorname{Var}(S^{\ast}_{Ki}) = \mathbb{E}(S^{\ast 2}_{Ki}) - \mathbb{E}(S^{\ast}_{Ki})^2 = \mathbb{E}(S^{\ast 2}_{Ki}). \end{equation}\] As \(S^{\ast 2}_{Ki} \leq 1\), we know \(\operatorname{Var}(S^{\ast}_{Ki}) \leq 1\) for all tuples \(i\), so we have the bound \[\begin{equation} \mathbb{I}(i, j \in A_{k}) \left| \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \right| \leq \mathbb{I}(i, j \in A_{k}). \end{equation}\] Now all that remains is to apply the Lindeberg-Feller central limit theorem to the double sequence \(W^{\ast}_{K} = {\sum_{k = 1}^{K} W^{\ast}_{K, k}}\). To that end, first note that, as each neighborhood contains at most a finite number of nodes, \(M\), we can show that \[\begin{align} \begin{split} \operatorname{Var}(W^{\ast}_{K,k}) &= \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}(i,j \in A_k^d) \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \\ &= \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}(i,j \in A_k^d) \mathbb{E}(S^{\ast}_{Ki} S^{\ast}_{Kj}) \\ &\leq \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} 1 \\ &\leq M^{2d}. \end{split} \end{align}\] Now we prove that Lyaponouv’s condition (3.2) holds for the constant in the exponent \(\delta = 2\). So \[\begin{align} \begin{split} \lim_{K \to \infty} \sum_{k = 1}^{K} \frac{1}{\operatorname{Var}(W^{\ast}_{K})^2} \mathbb{E}(|W^{\ast}_{K,k}|^{4}) &= \lim_{K \to \infty} \frac{1}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \mathbb{E}(W^{\ast 2}_{K,k}) \mathbb{E}(W^{\ast 2}_{K,k}) \\ &\leq \lim_{K \to \infty} \frac{M^{2d}}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \mathbb{E}(W^{\ast}_{K,k})^2 \\ &= \lim_{K \to \infty} \frac{M^{2d}}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \operatorname{Var}(W^{\ast}_{K,k}) \\ &= \lim_{K \to \infty} \frac{M^{2d}}{\operatorname{Var}(W^{\ast}_{K})} \\ &= 0. \end{split} \tag{3.2} \end{align}\] where \(\operatorname{Var}(W^{\ast}_{K})\) tends to infinity by assumption. Therefore, Lyaponouv’s condition holds, and so by the Lindeberg-Feller central limit theorem, we have, \[\begin{equation} \frac{W^{\ast}_K}{\sqrt{\operatorname{Var}(W^{\ast}_K)}} \xrightarrow[K \to \infty]{\text{d}} N(0, 1). \end{equation}\] Slutsky’s theorem (3.4) gives the final result for \(S^{\ast}_{K} = W^{\ast}_{K} + B^{\ast}_{K}\). Then we have \[\begin{equation} \frac{S^{\ast}_K}{\sqrt{\operatorname{Var}(S^{\ast}_K)}} \xrightarrow[K \to \infty]{\text{d}} N(0, 1), \end{equation}\] as desired.

The second class of statistics are those that depend on both the graph and the nodal variates. These have a very similar form as the statistics previously considered. Now we require that \[\begin{equation*} S_{K} = \sum_{i \in \mathcal{N}^d} S_{Ki}, \end{equation*}\] where \[\begin{equation*} S_{Ki} = \prod_{l,p \in i} Y_{lp} h(X_l, X_p), \end{equation*}\]

a product with at most \(q\) terms.

Theorem 3.8 If \(S_K\) is a statistic depending on both the random graph and the random nodal attributes, the sequence of random networks are as before, and the function \(h\) is uniformly bounded in the sense that, for all \(l\) and \(m\), there is some \(B\) such that \[\begin{equation*} P \left( |h(X_l, X_m)|^p > B \right) = 0, \qquad (p = 1, 2) \end{equation*}\] then we also have \[\begin{equation*} S_{K} \xrightarrow[K \to \infty]{\mathrm{d}} N(0,1). \end{equation*}\]

Proof. This proof is very similar to the proof of Theorem 3.7. We write \[\begin{equation*} S_{K} = W{K} + B{K}, \end{equation*}\]

exactly as before, incorporating the function \(h\) into each \(S_{Ki}\) as we did above. Then the binary nature of the graph and the uniform boundedness of \(h\) allow us to once again recenter the \(Y_{ij}\), meaning that we will work with \(V_{ij}h(X_i, X_j) = Y_{ij}h(X_i, X_j) - \mu_{ij}\). We also have \(\mathbb{E}(V_{ij} h(X_i, X_j)) = 0\), so \(\mathbb{E}(S^{\ast}_{Ki}) = 0\) as well.

For the between neighborhood covariances, we once again choose \(V_{l_1 l_2}\), a between neighborhood network variable. Then we once again write \[\begin{align*} \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) &= \mathbb{E}\left( \prod_{m,n \in i} V_{mn} h(X_m, X_n) \prod_{m,n \in j} V_{mn} h(X_m, X_n) \right) \\ &= \mathbb{E}\left( (V_{l_1 l_2} h(X_{l_1}, X_{l_2}))^p \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} h(X_m, X_n) \right), & (p = 1,2) \\ &= \mathbb{E}\left( (V_{l_1 l_2} h(X_{l_1}, X_{l_2}))^p \right) \mathbb{E} \left( \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} h(X_m, X_n) \right), \end{align*}\] by the local dependence property. Then, when \(p = 1\), we have \(\mathbb{E} ( V_{l_1 l_2} h(X_{l_1}, X_{l_2}) ) = 0\) by assumption, so the covariance is identically zero. When \(p = 2\) we have \[\begin{equation*} \mathbb{E}\left( (V_{l_1 l_2} h(X_{l_1}, X_{l_2}))^2 \right) \leq Cn^{-\delta} \end{equation*}\] by sparsity and \[\begin{equation*} \mathbb{E} \left( \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} h(X_m, X_n) \right) \leq (DB)^{2q - 2} \end{equation*}\] almost surely by uniform boundedness and the fact this product has at most \(2q - 2\) terms. This follows from the fact that \(h\) is bounded by \(B\) and that \(V_{mn}\) is bounded by some constant \(D\), by defintion. So \[\begin{equation*} \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \leq B^{2q - 2}Cn^{-\delta}, \end{equation*}\] which tends to zero as \(K\) grows large. So, again by Chebyshev’s inequality, we have \[\begin{equation*} B^{\ast}_{K} \xrightarrow[K \to \infty]{\mathrm{p}} 0. \end{equation*}\] Next we bound the within neighborhood covariances. Now with each \(|S^{\ast}_{Ki}| \leq B^{q}\), we have \[\begin{equation} \mathbb{I}(i, j \in A_{k}) \left| \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \right| \leq \mathbb{I}(i, j \in A_{k}) B^{2q}. \end{equation}\] Now, we show that Lyaponouv’s condition (3.2) holds for the same \(\delta = 2\). Once again note that each neighborhood has at most \(M\) nodes, so \[\begin{align*} \operatorname{Var}(W^{\ast}_{K,k}) &= \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}(i,j \in A_{k}) \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \\ &\leq \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}(i,j \in A_{k}) B^{2q} \\ &\leq M^{2d}B^{2q}. \end{align*}\] Then Lyaponouv’s condition is \[\begin{align} \begin{split} \lim_{K \to \infty} \sum_{k = 1}^{K} \frac{1}{\operatorname{Var}(W^{\ast}_{K})^2} \mathbb{E}(|W^{\ast}_{K,k}|^{4}) &= \lim_{K \to \infty} \frac{1}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \mathbb{E}(W^{\ast 2}_{K,k}) \mathbb{E}(W^{\ast 2}_{K,k}) \\ &\leq \lim_{K \to \infty} \frac{M^{2d}B^{2q}}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \mathbb{E}(W^{\ast 2}_{K,k}) \\ &= \lim_{K \to \infty} \frac{M^{2d}B^{2q}}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \operatorname{Var}(W^{\ast}_{K,k}) \\ &= \lim_{K \to \infty} \frac{M^{2d}B^{2q}}{\operatorname{Var}(W^{\ast}_{K})} \label{eq:Lya} \\ &= 0. \end{split} \end{align}\] Therefore, by the Lindeberg-Feller central limit theorem and Slutsky’s theorem, we have \[\begin{equation*} \frac{S_{K}^{\ast}}{\sqrt{\operatorname{Var}(S_{K}^{\ast})}} \xrightarrow[K \to \infty]{\mathrm{d}} N(0,1).\qedhere \end{equation*}\]

Finally, the last class of statistic is that which depends only on the nodal variates. This result follows directly from a central limit theorem for \(M\)-dependent random variables, which can be found in Billingsley (1995, p. 364). Establishing this theorem requires that we assume that the statistic in question depends only on a single variable across nodes. Therefore we assume that the statistic depends only on a single nodal covariate.

Theorem 3.9 Take the sequence \(Z_{K}\) as before, and let \(X_K\) be the vector of nodal variates for each \(Z_{K}\). Call each entry of this vector \(X_{Ki}\), the variate corresponding to node \(i\). Furthermore, we assume that \(\mathbb{E}(X_{Ki}^{12}) < \infty\), \(\mathbb{E}(X_{Ki} = 0)\). Then, \[\begin{equation*} \lim_{K \to \infty}\frac{\operatorname{Var}(\sum_{i= 1}^{n} X_{Ki})}{n} = \sigma^2, \end{equation*}\] where \(n = |\mathcal{N}|\). Furthermore, if \(\sigma > 0\), then \[\begin{equation*} \frac{\sum_{i = 1}^{n} X_{Ki}}{\sqrt{n \operatorname{Var}(\sum_{i= 1}^{n} X_{Ki})}} \xrightarrow[K \to \infty]{\mathrm{d}} N(0,1). \end{equation*}\]

Proof. Two random variables \(X_{Kl}\) and \(X_{Kp}\) are dependent if and only if \(l\) and \(p\) are in the same neighborhood. Without loss of generality, assume that the neighborhoods are such that all nodes within a neighborhood are indexed by consecutive integers. Then let \(M = \limsup |A_{K}|\). Then the sequence \(X_{Kl}\) is \(M\)-dependent, so the result follows by application of Theorem 27.4 in Billingsley (1995).

In practice, the hypothesis that the twelfth moment exists is satisfied for most reasonable distributional assumptions about nodal covariates. Furthermore, the assumption that all nodal variates have expectation zero can easily be satisfied by recentering the observed data. Finally, the delta method gives us an asymptotically normal distribution for a differentiable statistic of the nodal variate. The univariate nature of the statistic is a fundamental limitation of this approach, however I am unable to find an analogous multidimensional central limit theorem that would allow us to establish the asymptotic normality of a statistic of multiple nodal variates.

References

Fellows, I., & Handcock, M. S. (2012). Exponential-family random network models. ArXiv Preprint ArXiv:1208.0121.

Schweinberger, M., & Handcock, M. S. (2015). Local dependence in random graph models: Characterization, properties and statistical inference. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(3), 647–676.

Billingsley, P. (1995). Probability and measure (3. ed., authorized reprint). New Delhi: Wiley India.

Shalizi, C. R., & Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. The Annals of Statistics, 41(2), 508–535. http://doi.org/10.1214/12-AOS1044