Chapter 3 Locally dependent exponential random network models

$A locally dependent random network with neighborhoods $A_1, A_2, \dots, A_K$ and two binary node attributes, represented as gray or black and circle or diamond.$

Figure 3.1: A locally dependent random network with neighborhoods $A_{1}, A_{2}, \dots, A_{K}$ and two binary node attributes, represented as gray or black and circle or diamond.

To begin, we define a random network, developed by Fellows & Handcock (2012). By way of motivation, note that in the ERGM the nodal variates are fixed and are included in the model as explanatory variables in making inferences about network structure. Furthermore, there is a class of models that we do not discuss here that consider the network as a fixed explanatory variable in modeling (random) nodal attributes. It is not difficult to come up with situations where a researcher would like to jointly model both the network and the node attributes. Thus we define a class of networks in which both the network structure and the attributes of the individual nodes are modeled as random quantities.

Definition 3.1 (Random network) Let

N

be a countable collection of nodes (which we take to be a subset of

N

). Let

Y

be the random graph on the nodes

N

with support

Y

. Then for each element

n \in N

, let there be a corresponding random vector of node attributes

X_{n} \in R^{q}

and collect them into the

n \times q

random matrix

X

with support

X

. The is the random variable

Z = (Y, X)

with support

Z = Y \times X

Now we wish to model these objects, so we follow the ERGM and turn to the exponential family (Fellows & Handcock, 2012). We write

\begin{matrix} (3.1) & P (Y = y, X = x | η) \propto e^{η \cdot g (y, x)} . \end{matrix}

This looks very similar to the ERGM, but note the explicit dependence on the quantity

x

. More concretely, we can include terms that depend only on

x

, which would have no place in an ERGM. We can further express the difference of the two models by rewriting the left hand side of as

P (X = x, Y = y | η) = P (Y = y | X = x, η) P (X = x | η),

where the first term on the right hand side is the ERGM and the second term is

P (X = x | η) = \frac{C (Z, η, x)}{C (Z, η)},

where

C (Z, η, x) = \int_{{(v, u) \in Z : u = x}} P (X = x | η) .

Roughly, this is the proportion of the total sample space $Z$ that is possible with $x$ fixed. This is not, in general, equal to one, so the ERNM is not equal to the ERGM (Fellows & Handcock, 2012).

3.1 Definitions and notation

We will consistently refer to a set of nodes, $A_{k}$ , as the $k$ -th neighborhood, with an uppercase $K$ representing the total number of neighborhoods and a lowercase $k$ representing a specific neighborhood. The variable $N$ will refer to the domain of a random network, usually the union of a collection of neighborhoods. Nodes within the network will be indexed by the variables $i$ and $j$ , with $Z_{i j} = ({Y_{i j}, X_{i}, X_{j}})$ , where $Y_{i j}$ is referring to the edge between nodes $i$ and $j$ , and $X_{i}$ and $X_{j}$ refer to the random vectors of node attributes. Abstracting this further, $i$ and $j$ will also refer to tuples of nodes, so we will write $\vec{i} = (i_{1}, i_{2}, \dots, i_{q}) \in N \times N \times \dots \times N$ . The variables $Z$ and $Y$ will also often carry a subscript of $W$ or $B$ (for example $Y_{B i j}$ ) which emphasizes that the edge from $i$ to $j$ is within or between neighborhoods, respectively. Finally, for lack of a better notation, the indicator function $I_{B} (i, j)$ (where $B$ is for between) is one if $i \in A_{l}$ and $j \in A_{p}$ where $l \neq p$ , and zero otherwise.

Definition 3.2 (Local dependence property) Extending the definition in Schweinberger & Handcock (2015), a random network model satisfies if there is a partition of the node set

N

into neighborhoods

A_{1}, A_{2}, \dots, A_{K}

for

K \geq 2

such that the network variables

Z_{i j}

are dependent when

i, j \in A_{ℓ}

for some

ℓ

and independent otherwise. We also require that nodal attributes depend only on the attributes of nodes within the same neighborhood. Thus, the probability measure can be written as

P (Z \in Z) = \prod_{k = 1}^{K} [P_{k k} (Z_{k k} \in Z_{k k}) \prod_{ℓ = 1}^{k - 1} P_{k l} (Z_{k l} \in Z_{k l}, Z_{l k} \in Z_{l k})],

where

Z_{m n}

is the subnetwork consisting of the random graph ties from nodes in

A_{m}

to those in

A_{n}

and the appropriate node variables and

Z_{m n}

is a subset of the sample space of

Z_{m n}

. Furthermore, the measures

P_{k k}

can induce dependence between dyads while the measures

P_{k l}

induce independence.

Definition 3.3 (Sparsity) Also from Schweinberger & Handcock (2015), we say a locally dependent random network is if there is some

δ > 0

and some

C > 0

such that

E ({| Y_{B i j} |}^{p}) \leq C n^{- δ}, (p = 1, 2)

where

n = | N |

and

Y_{B i j}

signifies the tie between neighborhoods from node

i \in A_{l}

to node

j \in A_{m}

where

l \neq m

3.2 Preliminary theorems

In proving our theorems, we will make use of several other central limit theorems, all of which can be found in Billingsley (1995). The first is the Lindeberg-Feller central limit theorem for triangular arrays. The second is Lyapounov’s condition, which gives a convenient way to show that the Lindeberg-Feller theorem holds. Finally, we make use of a central limit theorem for dependent random variables. For the sake of brevity, in this section we state each of these without proof.

Theorem 3.1 (Billingsley, 1995 Theorem 27.2) For each

n

take

X_{n 1}, \dots, X_{n r_{n}}

, independent with

E (X_{n s}) = 0

for all

n

and

s

(where no generality is lost in this assumption). Then we have

σ_{n s}^{2} = Var (X_{n s}) = E (X_{n s}^{2})

. Next, set

s_{s}^{2} = \sum_{s = 1}^{r_{n}} σ_{n s}^{2}

. Now set

S_{n} = X_{n 1} + \dots + X_{n r_{n}} .

If the ,

lim_{n \to \infty} \sum_{s = 1}^{r_{n}} \frac{1}{s_{n}^{2}} \int_{| X_{n s} \geq ϵ s_{n}} X_{n s}^{2} = 0

holds for all

ϵ > 0

, then

S_{n} \overset{d}{\to} N (0, 1)

Theorem 3.2 (Billingsley, 1995 Theorem 27.3) Let

S_{n}

be as before. Then if ,

\begin{matrix} (3.2) & lim_{n \to \infty} \sum_{s = 1}^{r_{n}} \frac{1}{s_{n}^{2 + δ}} E ({| X_{n s} |}^{s + δ}) = 0 \end{matrix}

holds for some

δ > 0

, then the Lindeberg condition also holds. Therefore

S_{n} \overset{d}{\to} N (0, 1)

Theorem 3.3 (Billingsley, 1995 Theorem 27.4) Suppose that

X_{1}, X_{2}, \dots

is stationary and

α

-mixing with

α_{n} = O (n^{- 5})

and that

E (X_{n}) = 0

and

E (X_{n}^{12}) < \infty

. Note that the condition on

α

is stronger than what we require. Our

X_{n}

will be

M

-dependent, meaning that each

X_{n}

is independent of all

X_{m}

where

| n - m | > M

. It is true that an

M

-dependent sequence is

α

-mixing for constant

α

. Then, if

S_{n} = X_{1} + \dots X_{n}

, we have

\frac{Var (S_{n})}{n} \to σ^{2} .

Then, if

σ > 0

, we have

S_{n} \overset{d}{\to} N (0, 1)

The final theorem is Slutsky’s theorem, a classic result of asymptotic theory in statistics.

Theorem 3.4 (Wasserman, 2004 Theorem 5.5) Let

X_{n}, X, Y_{n}

be random variables and let

c

be a constant. Then, if

X_{n} \overset{d}{\to} X

and

Y \overset{p}{\to} c

we have

X_{n} + Y_{n} \overset{d}{\to} X + c

and

X_{n} Y_{n} \overset{d}{\to} c X

3.3 Consistency under sampling

With these in place, we attempt to extend a result about locally dependent ERGMs proven by Schweinberger & Handcock (2015) to locally dependent ERNMs. In short, this theorem states that the parameters estimated by modeling a small sample of a larger network can be generalized to the overall network. It was shown by Shalizi & Rinaldo (2013) that most useful formulations of ERGMs do not form projective exponential families in the sense that the distribution of a subgraph cannot be, in general, recovered by marginalizing the distribution of a larger graph with respect to the edge variables not included in the smaller graph. Hence, we are unable to generalize parameter estimates from the subnetwork to the total network.

To show that locally dependent ERNMs do form a projective family, let $A$ be a collection of sets $A$ , where each $A$ is a finite collection of neighborhoods. Also, allow the set $A$ to be an ideal, so that if $A \in A$ , every subset of $A$ is also in $A$ and if $B \in A$ , then $A \cup B \in A$ . If $A \subset B$ , think of passing from the set $A$ to the set $B$ as taking a larger sample of the (possibly infinite) set of neighborhoods in the larger network. Then let ${P_{A, θ}}_{A \in A}$ be the collection of ERNMs with parameter $θ$ indexed by the sets in $A$ . For each $A \in A$ , let $P_{A, Θ} = {P_{A, θ}}_{θ \in Θ}$ be the collection of ERNMs on the neighborhoods in $A$ with parameter $θ \in Θ$ where $Θ \subset R^{p}$ is open. Assume that each distribution in $P_{A, Θ}$ has the same support $Z_{A}$ and that $A \subset B$ if and only if $Z_{B} = Z_{A} \times Z_{B ∖ A}$ . Then, the exponential family ${P_{A, Θ}}_{A \in A}$ is projective in the sense of Shalizi & Rinaldo (2013 Definition 1) precisely when Theorem 3.6 holds.

This follows from a specific case of the general definition given by Shalizi & Rinaldo (2013). There, for every pair $A$ and $B$ with $A \subset B$ , they define the natural projection mapping $π_{B \to A} : Z_{B} \to Z_{A}$ . Informally, this mapping projects the set $Z_{B}$ down to $Z_{A}$ by simply removing the extra data. For example if $B = {A_{1}, A_{2}}$ and $A = {A_{1}}$ as in Figure 3.1, then the mapping $π_{B \to A}$ is shown in Figure 3.2.

$The projection mapping from $\mathcal{B} = \{A_1, A_2\}$ to $\mathcal{A} = \{A_1\}$.$

Figure 3.2: The projection mapping from $B = {A_{1}, A_{2}}$ to $A = {A_{1}}$ .

This is desirable because Shalizi & Rinaldo (2013) have demonstrated the following theorem.

Theorem 3.5 (Shalizi & Rinaldo, 2013 Theorem 3) If the exponential model family

{P_{A Θ}}_{A \in A}

is projective and the log of the normalizing constant can be written as

\begin{aligned} \begin{aligned} \log (C (θ, Z)) & = \log (\int_{Z} e^{θ \cdot g (z)} d z) \\ = r (| Z |) a (θ), \end{aligned} \end{aligned}

where

r

is a positive, monotone increasing function of some positive measure on

Z

and

a

is a differentiable function of

θ

, then the maximum likelihood estimator exists and is strongly consistent, meaning that the MLE,

\hat{θ} \overset{a.s.}{\to} θ

, where

θ

is the unknown parameter being estimated.

This is trivially achieved by setting $r = 1$ for all values of $| Z |$ and setting $a (θ) = \log (C (θ, Z))$ . We have differentiability of $a$ with respect to $θ$ by a result from multivariable calculus that follows from Fubini’s theorem. From a practical perspective, this means that a researcher using this model can assume that parameters estimated from samples of a large network are increasingly good approximations for the true parameter values as the sample size increases.

Theorem 3.6 Let

A_{1}, A_{2}, \dots

be a sequence of neighborhoods and define the sequence

{N_{K}} = ⋃_{i = 1}^{K} A_{i}

. Then let

Z_{1}, Z_{2}, \dots

be the sequence of locally dependent random networks on the

N_{K}

. For each

Z_{K}

, there is the corresponding set of neighborhoods

A_{K}

. Let

P_{K}

be a generic probability distribution from the family

{P_{K θ}}_{θ \in Θ}

. Let the network

π_{A_{K + 1} \to A_{K}} (Z_{K + 1}) = Z_{K + 1 ∖ K},

with corresponding distirbution

P_{K + 1 ∖ K}

. Then

P_{K} (Z_{K} \in Z_{K}) = P_{K + 1} (Z_{K} \in Z_{K}, Z_{K + 1 ∖ K} \in Z_{K + 1 ∖ K}),

my_dev where

Z_{K}

is the sample space of the distribution

P_{K}

and

Z_{K} \subset Z_{K}

. This is a specific case of the definition of projectibility for a general exponential family given by Shalizi & Rinaldo (2013).

Proof. This follows from the definition of local dependence, in much the same way as the proof for ERGMs by Schweinberger & Handcock (2015) does. We have

\begin{aligned} P_{K + 1} (Z_{K} \in Z_{K}, Z_{K + 1 ∖ K} \in Z_{K + 1 ∖ K}) & = P_{K + 1} (Z_{K} \in Z_{K}) P_{K + 1 ∖ K} (Z_{K + 1 ∖ K} \in Z_{K + 1 ∖ K}) \\ = P_{K} (Z_{K} \in Z_{K}) (1) \\ = P_{K} (Z_{K} \in Z_{K}), \end{aligned}

where the measure becomes

P_{k}

from the product definition of a locally dependent random network.

3.4 Asymptotic normality of statistics

In this section we will prove that certain classes of statistics of locally dependent random networks are asymptotically normally distributed as the number of neighborhoods tends to infinity. The statistics we consider can be classified into three types: first, statistics which depend only on the graph structure; second, statistics that depend on both the graph and the nodal variates; and third, statistics that depend only on the nodal variates. The first class of statistics has already been considered by Schweinberger & Handcock (2015), but we will reproduce the proof here, as the second proof is very similar. The third class of statistics becomes normal in the limit by a central limit theorem for $M$ dependent random variables in Billingsley (1995).

Before we begin to explicitly define each of these classes, we clarify the notation that will be used. A general statistic will be a function

S : N^{d} \to R,

where

N^{d}

is the

d

-fold Cartesian product of the set of nodes,

N

, with itself:

N^{d} = \underset{d times}{\underset{⏟}{N \times \dots \times N}} .

Additionally, the statistic will often carry a subscript $K$ , indicating that the statistic is of the random network with $K$ neighborhoods.

Formally, as explained in Schweinberger & Handcock (2015), the first class of statistics contains those that have the form

S_{K} = \sum_{i \in N^{d}} S_{K i},

where

S_{K i} = \prod_{l, p \in i} Y_{l p},

a product $q$ of edge variables that captures the interaction desired. We will also make use of the set $A_{k}^{d},$ wich is a similar cartesian product. When we write $i \in A_{k}^{d}$ , we mean the every component of the $d$ -tuple $i$ is an element of $A_{k}$ . Furthermore, by a catachrestic abuse of notation, we will write $l, p \in i$ to mean that $l$ and $p$ are vertices contained in the $d$ -tuple $i$ . Now we are ready to prove the first case of the theorem.

Theorem 3.7 Let

A_{1}, A_{2}, \dots, A_{K}

be a sequence of neighborhoods of size at most

M

and form the sequence of domains

N_{K} = ⋃_{k = 1}^{K} A_{k}

. Then let

Z_{1}, Z_{2}, \dots, Z_{K}

be the sequence of unweighted random networks on the

N_{K}

. Then, let the statistic

S_{K} : N_{K}^{d} \to R

be given. Furthermore, assume the statistic depends only on the graph variables of the

Z_{K}

. We also assume that the

Z_{K}

satisfy the local dependence property and that they are

δ

-sparse, for some

δ > d

. Finally, we require that

Var (W_{K}^{*}) \to \infty

, where

W_{K}^{*}

is defined in (3.3). Then

\frac{S_{K} - E (S_{K})}{\sqrt{V a r (S_{K})}} \to_{K \to \infty}^{d} N (0, 1) .

Proof. As the networks $Z_{K}$ are unweighted, all edge variables $Y_{i j} \in {0, 1}$ . Let $μ_{i j} = E (Y_{i j})$ . Then define $V_{i j} = Y_{i j} - μ_{i j}$ . Therefore, without loss of generality, we may work with $V_{i j}$ , which has the convenient property that $E (V_{i j}) = 0$ . This means that we can similarly shift our statistics of interest, $S_{K}$ . Therefore, call $S_{K}^{*} = S_{K} - E (S_{K})$ , so that $E (S_{K}^{*}) = 0$ .

Note that we can write

S_{K}^{*} = W_{K}^{*} + B_{K}^{*},

with

\begin{matrix} (3.3) & W_{K}^{*} = \sum_{k = 1}^{K} W_{K, k}^{*} = \sum_{k = 1}^{K} \sum_{i \in N_{K}^{d}} I (i \in A_{k}^{d}) S_{K i}^{*} \end{matrix}

and

B_{K}^{*} = \sum_{i \in N_{K}^{d}} I_{B} (i) S_{K i}^{*},

where the indicator functions restrict the sums to within the $k$ -th neighborhood and between neighborhoods of the graph, respectively. Specifically, $I_{B} (i) = 1$ when the $d$ -tuple of nodes $i$ contains nodes from different neighborhoods, or exactly when $I (i i n A_{k}^{d}) = 0$ for all neighborhoods $k$ . By splitting the statistic into the within and between neighborhood portions, we are able to make use of the independence relation between edges that connect neighborhoods. We also have $E (W_{K}^{*}) = 0$ and $E (B_{K}^{*}) = 0$ , as each quantity is a sum of random variables with mean zero.

The idea of this proof is to gain control over the variances of $B_{K}^{*}$ and all the elements of the sequence $W_{K k}^{*}$ . We can then show that $B_{K}^{*}$ is converging in probability to zero and that the triangular array $W_{K}^{*}$ satisifies Lyaponouv’s condition, and is thus asymptotically normal. Finally, Slutsky’s theorem allows us to extend the result to $S_{K}^{*}$ .

To bound the variance of

B_{K}^{*}

, note that

Var (B_{K}^{*}) = \sum_{i \in N_{K}^{d}} \sum_{j \in N_{K}^{d}} I_{B} (i) I_{B} (j) Cov (S_{K i}^{*}, S_{K j}^{*}) .

Despite independence, some of these covariances may be nonzero if the two terms of the statistic both involve the same edge. For example, in Figure , a statistic that counted the number of edges between gray nodes plus the number of edges between diamond shaped nodes would have a nonzero covariance term because of the edge between the two nodes that are both gray and diamond shaped. To show that, in the limit, these covariances vanish, we need only concern ourselves with the nonzero terms in the sum. That is, only those terms where

I_{B} (i) I_{B} (j) = 1

. This happens exactly when both

S_{K i}^{*}

and

S_{K j}^{*}

involve a between neighborhood edge variable. So, note that we have

\begin{aligned} \begin{aligned} Cov (S_{K i}^{*}, S_{K j}^{*}) & = E (S_{K i}^{*} S_{K j}^{*}) - E (S_{K i}^{*}) E (S_{K j}^{*}) \\ = E (S_{K i}^{*} S_{K j}^{*}), \end{aligned} \end{aligned}

as the expectation of each term is zero. Next we take

Y_{l_{1} l_{2}}

to be one of the (possibly many) between neighborhood edge variables in this product (such that

I_{B} (i) = 1

where

i

is any tuple containing

l_{1}

and

l_{2}

) and

V_{l_{1} l_{2}}

to be the recentered random variable corresponding to

Y_{l_{1} l_{2}}

. Then,

\begin{aligned} Cov (S_{K i}^{*}, S_{K j}^{*}) & = E (\prod_{m, n \in i} V_{m n} \prod_{m, n \in j} V_{m n}) \\ = E (V_{l_{1} l_{2}}^{p} \prod_{m, n \in (i \cup j) ∖ {l_{1}, l_{2}}} V_{m n}), & (p = 1, 2) \end{aligned}

where we must consider the case where

p = 1

to account for the covariance of

S_{K i}^{*}

and

S_{K j}^{*}

when

i \neq j

and the case where

p = 2

to account for the variance of

S_{K i}^{*}

, which is computed in the case where

i = j

. So, if

p = 1

, then

\begin{aligned} E (V_{l_{1} l_{2}} \prod_{m, n \in (i \cup j) ∖ {l_{1}, l_{2}}} V_{m n}) & = E (V_{l_{1} l_{2}}) E (\prod_{m, n \in (i \cup j) ∖ {l_{1}, l_{2}}} V_{m n}) \\ = 0, \end{aligned}

by the local dependence property and the assumption that

E (V_{l_{1} l_{2}}) = 0

. The local dependence property allows us to factor out the expectation of

V_{l_{1} l_{2}}

, as this edge is between neighborhoods, and therefore independent of every other edge in the graph. Now, if we have

p = 2

, then, by sparsity and the fact that the product below is at most

1

E (V_{l_{1} l_{2}}^{2}) E (\prod_{m, n \in (i \cup j) ∖ {l_{1}, l_{2}}} V_{m n}) \leq D C n^{- δ},

where D is a constant that bounds the expectation above. There exists such a constant because each of the V_{mn} are bounded by definiton, so a product of them is bounded. So as

K

grows large, the between neighborhod covariances all become asymptotically negligible. Therefore, we can conclude that

Var (B_{K}^{*}) = \sum_{i \in N_{K}^{d}} \sum_{j \in N_{K}^{d}} I_{B} (i) I_{B} (j) Cov (S_{K i}^{*}, S_{K j}^{*}) \leq D C n^{2 d (- δ)} .

So we have

Var B_{K}^{*} \to 0.

Then, for all

ϵ > 0

, Chebyshev’s inequality gives us

lim_{K \to \infty} P (| B_{K}^{*} | > ϵ) \leq lim_{K \to \infty} \frac{1}{ϵ^{2}} Var (B_{K}^{*}) = 0,

B_{K}^{*} \to_{K \to \infty}^{p} 0.

Next, we bound the within neighborhood covariances, as we also have

Var (W_{K k}^{*}) = \sum_{i \in N_{K}^{d}} \sum_{j \in N_{K}^{d}} I (i, j \in A_{k}) Cov (S_{K i}^{*}, S_{K j}^{*}) .

As the covariance forms an inner product on the space of square integrable random variables, the Cauchy-Schwarz inequality gives us

I (i, j \in A_{k}) | Cov (S_{K i}^{*}, S_{K j}^{*}) | \leq I (i, j \in A_{k}) \sqrt{Var (S_{K i}^{*})} \sqrt{Var (S_{K j}^{*})} .

Then, as each

S_{K i}^{*}

has expectation zero, we know that

Var (S_{K i}^{*}) = E (S_{K i}^{* 2}) - E (S_{K i}^{*})^{2} = E (S_{K i}^{* 2}) .

S_{K i}^{* 2} \leq 1

, we know

Var (S_{K i}^{*}) \leq 1

for all tuples

i

, so we have the bound

I (i, j \in A_{k}) | Cov (S_{K i}^{*}, S_{K j}^{*}) | \leq I (i, j \in A_{k}) .

Now all that remains is to apply the Lindeberg-Feller central limit theorem to the double sequence

W_{K}^{*} = \sum_{k = 1}^{K} W_{K, k}^{*}

. To that end, first note that, as each neighborhood contains at most a finite number of nodes,

M

, we can show that

\begin{aligned} \begin{aligned} Var (W_{K, k}^{*}) & = \sum_{i \in N_{K}^{d}} \sum_{j \in N_{K}^{d}} I (i, j \in A_{k}^{d}) Cov (S_{K i}^{*}, S_{K j}^{*}) \\ = \sum_{i \in N_{K}^{d}} \sum_{j \in N_{K}^{d}} I (i, j \in A_{k}^{d}) E (S_{K i}^{*} S_{K j}^{*}) \\ \leq \sum_{i \in N_{K}^{d}} \sum_{j \in N_{K}^{d}} 1 \\ \leq M^{2 d} . \end{aligned} \end{aligned}

Now we prove that Lyaponouv’s condition (3.2) holds for the constant in the exponent

δ = 2

. So

\begin{aligned} (3.2) & \begin{aligned} lim_{K \to \infty} \sum_{k = 1}^{K} \frac{1}{Var (W_{K}^{*})^{2}} E (| W_{K, k}^{*} |^{4}) & = lim_{K \to \infty} \frac{1}{Var (W_{K}^{*})^{2}} \sum_{k = 1}^{K} E (W_{K, k}^{* 2}) E (W_{K, k}^{* 2}) \\ \leq lim_{K \to \infty} \frac{M^{2 d}}{Var (W_{K}^{*})^{2}} \sum_{k = 1}^{K} E (W_{K, k}^{*})^{2} \\ = lim_{K \to \infty} \frac{M^{2 d}}{Var (W_{K}^{*})^{2}} \sum_{k = 1}^{K} Var (W_{K, k}^{*}) \\ = lim_{K \to \infty} \frac{M^{2 d}}{Var (W_{K}^{*})} \\ = 0. \end{aligned} \end{aligned}

where

Var (W_{K}^{*})

tends to infinity by assumption. Therefore, Lyaponouv’s condition holds, and so by the Lindeberg-Feller central limit theorem, we have,

\frac{W_{K}^{*}}{\sqrt{Var (W_{K}^{*})}} \to_{K \to \infty}^{d} N (0, 1) .

Slutsky’s theorem (3.4) gives the final result for

S_{K}^{*} = W_{K}^{*} + B_{K}^{*}

. Then we have

\frac{S_{K}^{*}}{\sqrt{Var (S_{K}^{*})}} \to_{K \to \infty}^{d} N (0, 1),

as desired.

The second class of statistics are those that depend on both the graph and the nodal variates. These have a very similar form as the statistics previously considered. Now we require that

S_{K} = \sum_{i \in N^{d}} S_{K i},

where

S_{K i} = \prod_{l, p \in i} Y_{l p} h (X_{l}, X_{p}),

a product with at most $q$ terms.

Theorem 3.8 If

S_{K}

is a statistic depending on both the random graph and the random nodal attributes, the sequence of random networks are as before, and the function

h

is uniformly bounded in the sense that, for all

l

and

m

, there is some

B

such that

P (| h (X_{l}, X_{m}) |^{p} > B) = 0, (p = 1, 2)

then we also have

S_{K} \to_{K \to \infty}^{d} N (0, 1) .

Proof. This proof is very similar to the proof of Theorem 3.7. We write

S_{K} = W K + B K,

exactly as before, incorporating the function $h$ into each $S_{K i}$ as we did above. Then the binary nature of the graph and the uniform boundedness of $h$ allow us to once again recenter the $Y_{i j}$ , meaning that we will work with $V_{i j} h (X_{i}, X_{j}) = Y_{i j} h (X_{i}, X_{j}) - μ_{i j}$ . We also have $E (V_{i j} h (X_{i}, X_{j})) = 0$ , so $E (S_{K i}^{*}) = 0$ as well.

For the between neighborhood covariances, we once again choose

V_{l_{1} l_{2}}

, a between neighborhood network variable. Then we once again write

\begin{aligned} Cov (S_{K i}^{*}, S_{K j}^{*}) & = E (\prod_{m, n \in i} V_{m n} h (X_{m}, X_{n}) \prod_{m, n \in j} V_{m n} h (X_{m}, X_{n})) \\ = E ((V_{l_{1} l_{2}} h (X_{l_{1}}, X_{l_{2}}))^{p} \prod_{m, n \in (i \cup j) ∖ {l_{1}, l_{2}}} V_{m n} h (X_{m}, X_{n})), & (p = 1, 2) \\ = E ((V_{l_{1} l_{2}} h (X_{l_{1}}, X_{l_{2}}))^{p}) E (\prod_{m, n \in (i \cup j) ∖ {l_{1}, l_{2}}} V_{m n} h (X_{m}, X_{n})), \end{aligned}

by the local dependence property. Then, when

p = 1

, we have

E (V_{l_{1} l_{2}} h (X_{l_{1}}, X_{l_{2}})) = 0

by assumption, so the covariance is identically zero. When

p = 2

we have

E ((V_{l_{1} l_{2}} h (X_{l_{1}}, X_{l_{2}}))^{2}) \leq C n^{- δ}

by sparsity and

E (\prod_{m, n \in (i \cup j) ∖ {l_{1}, l_{2}}} V_{m n} h (X_{m}, X_{n})) \leq (D B)^{2 q - 2}

almost surely by uniform boundedness and the fact this product has at most

2 q - 2

terms. This follows from the fact that

h

is bounded by

B

and that

V_{m n}

is bounded by some constant

D

, by defintion. So

Cov (S_{K i}^{*}, S_{K j}^{*}) \leq B^{2 q - 2} C n^{- δ},

which tends to zero as

K

grows large. So, again by Chebyshev’s inequality, we have

B_{K}^{*} \to_{K \to \infty}^{p} 0.

Next we bound the within neighborhood covariances. Now with each

| S_{K i}^{*} | \leq B^{q}

, we have

I (i, j \in A_{k}) | Cov (S_{K i}^{*}, S_{K j}^{*}) | \leq I (i, j \in A_{k}) B^{2 q} .

Now, we show that Lyaponouv’s condition (3.2) holds for the same

δ = 2

. Once again note that each neighborhood has at most

M

nodes, so

\begin{aligned} Var (W_{K, k}^{*}) & = \sum_{i \in N_{K}^{d}} \sum_{j \in N_{K}^{d}} I (i, j \in A_{k}) Cov (S_{K i}^{*}, S_{K j}^{*}) \\ \leq \sum_{i \in N_{K}^{d}} \sum_{j \in N_{K}^{d}} I (i, j \in A_{k}) B^{2 q} \\ \leq M^{2 d} B^{2 q} . \end{aligned}

Then Lyaponouv’s condition is

\begin{aligned} \begin{aligned} lim_{K \to \infty} \sum_{k = 1}^{K} \frac{1}{Var (W_{K}^{*})^{2}} E (| W_{K, k}^{*} |^{4}) & = lim_{K \to \infty} \frac{1}{Var (W_{K}^{*})^{2}} \sum_{k = 1}^{K} E (W_{K, k}^{* 2}) E (W_{K, k}^{* 2}) \\ \leq lim_{K \to \infty} \frac{M^{2 d} B^{2 q}}{Var (W_{K}^{*})^{2}} \sum_{k = 1}^{K} E (W_{K, k}^{* 2}) \\ = lim_{K \to \infty} \frac{M^{2 d} B^{2 q}}{Var (W_{K}^{*})^{2}} \sum_{k = 1}^{K} Var (W_{K, k}^{*}) \\ = lim_{K \to \infty} \frac{M^{2 d} B^{2 q}}{Var (W_{K}^{*})} \\ = 0. \end{aligned} \end{aligned}

Therefore, by the Lindeberg-Feller central limit theorem and Slutsky’s theorem, we have

\frac{S_{K}^{*}}{\sqrt{Var (S_{K}^{*})}} \to_{K \to \infty}^{d} N (0, 1) . \qedhere

Finally, the last class of statistic is that which depends only on the nodal variates. This result follows directly from a central limit theorem for $M$ -dependent random variables, which can be found in Billingsley (1995, p. 364). Establishing this theorem requires that we assume that the statistic in question depends only on a single variable across nodes. Therefore we assume that the statistic depends only on a single nodal covariate.

Theorem 3.9 Take the sequence

Z_{K}

as before, and let

X_{K}

be the vector of nodal variates for each

Z_{K}

. Call each entry of this vector

X_{K i}

, the variate corresponding to node

i

. Furthermore, we assume that

E (X_{K i}^{12}) < \infty

E (X_{K i} = 0)

. Then,

lim_{K \to \infty} \frac{Var (\sum_{i = 1}^{n} X_{K i})}{n} = σ^{2},

where

n = | N |

. Furthermore, if

σ > 0

, then

\frac{\sum_{i = 1}^{n} X_{K i}}{\sqrt{n Var (\sum_{i = 1}^{n} X_{K i})}} \to_{K \to \infty}^{d} N (0, 1) .

Proof. Two random variables

X_{K l}

and

X_{K p}

are dependent if and only if

l

and

p

are in the same neighborhood. Without loss of generality, assume that the neighborhoods are such that all nodes within a neighborhood are indexed by consecutive integers. Then let

M = lim sup | A_{K} |

. Then the sequence

X_{K l}

M

-dependent, so the result follows by application of Theorem 27.4 in Billingsley (1995).

In practice, the hypothesis that the twelfth moment exists is satisfied for most reasonable distributional assumptions about nodal covariates. Furthermore, the assumption that all nodal variates have expectation zero can easily be satisfied by recentering the observed data. Finally, the delta method gives us an asymptotically normal distribution for a differentiable statistic of the nodal variate. The univariate nature of the statistic is a fundamental limitation of this approach, however I am unable to find an analogous multidimensional central limit theorem that would allow us to establish the asymptotic normality of a statistic of multiple nodal variates.

References

Fellows, I., & Handcock, M. S. (2012). Exponential-family random network models. ArXiv Preprint ArXiv:1208.0121.

Schweinberger, M., & Handcock, M. S. (2015). Local dependence in random graph models: Characterization, properties and statistical inference. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(3), 647–676.

Billingsley, P. (1995). Probability and measure (3. ed., authorized reprint). New Delhi: Wiley India.

Shalizi, C. R., & Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. The Annals of Statistics, 41(2), 508–535. http://doi.org/10.1214/12-AOS1044