Chapter 1 Introduction

1.1 Random graphs

An undirected graph on four vertices.

Figure 1.1: An undirected graph on four vertices.

The mathematical object we call a graph is a set of vertices and edges, so for a graph \(G\), we write \(G = (V, E)\) where \(V\) is the set of vertices and \(E\) are ordered or unordered pairs \((i, j)\) for \(i, j \in V\). These objects can be represented visually by pictures like Figure 1.1. If the pairs in \(E\) are ordered, then we say that \(G\) is directed (or sometimes that \(G\) is a digraph), otherwise we say that \(G\) is undirected. The example in Figure 1.1 is undirected; when drawing directed graph, we can indicate the direction of the the edge \((i, j)\) by drawing an arrow from node \(i\) to node \(j\), as in Figure 1.2. If we only consider the presence or absence of an edge between vertices, then we say that the graph \(G\) is unweighted. In a weighted graph, each edge is assigned a numeric value, called the weight. The case of an unweighted graph is equivalent a weighted graph where each edge weight \(w_{ij} \in \{0,1\} \).

A directed graph on four vertices

Figure 1.2: A directed graph on four vertices

Generally, the most convenient way to represent a graph is as an adjacency matrix. For a graph with \(n\) vertices, the adjacency matrix is the \(n \times n\) matrix with the weight of edge \((i,j)\) in the \(i, j\) position. Note that if a graph \(G\) is undirected, then its adjacency matrix is symmetric and if \(G\) is an unweighted graph, then all its entries are either \(1\) or \(0\). For example, the graph in Figure 1.1 has adjacency matrix \[\begin{equation} \begin{bmatrix} 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 \\ 1 & 1 & 1 & 0 \end{bmatrix}. \tag{1.1} \end{equation}\]

Here, we have adopted the convention that vertices do not have an edge with themselves, so the diagonal elements of this matrix are all zero. Sometimes we will also represent a graph as an edge list, which is simply the set of pairs in \(E\) with their weights.

If we assume that a graph \(G\) is generated by some kind of stochastic process, then we consider the whole graph as a random variable. This is equivalent to considering the random matrix \(Y\), where \(Y\) is the adjacency matrix of \(G\). We can consider the \(n^2\) random variables \(Y_{ij}\) for \(i, j \in \{1, \dots, n\}\), taking values in the set of possible weights for the random graph in consideration. For example, if \(G\) is an undirected binary random graph, then this can be simplified to the \(n^2/2 - n\) Bernoulli random variables \(Y_{(i, j)}\) for \(1 \leq i \leq j \leq n\). By breaking the random graph into these simpler parts, we can begin to apply familiar statistical ideas to these complex objects.

1.2 Network modeling

Graphs are a convenient way to represent relational data, or data that contain information about a relationship between actors. This brings us to the area of network analysis, where random graphs are used to model many different kinds of relationships. For example, in sociology, networks are commonly used to model relationships among people, like marriages within a group of people. Political scientists use random graphs to represent things like the network of trade agreements between nations. Finally, in epidemiology, random networks can model an underlying contact network, say of people who shake hands or otherwise come into contact in a way that could spread a disease, over which a transmissible infection spreads. This allows the researcher to estimate how the medical features of the disease and the sociological features of the population interact to promote or inhibit the spread of disease. As an example of this, see Groendyke, Welch, & Hunter (2012), where, among other things, they examine if closing schools would have had an effect on the spread of an 1861 measles outbreak.

To formalize these notions, we begin with a random graph \(Y\) on \(n\) vertices (called nodes in the field of network analysis) considered as a random matrix with sample space \(\mathcal{Y},\) where \(\mathcal{Y}\) is the set of all possible graphs on \(n\) vertices. There may be some restrictions on \(\mathcal{Y}\), depending on the application being considered. Usually, we prohibit edges that connect a vertex to itself by forcing the diagonal entries of \(Y\) to be \(0\), as in (1.1). However, specific contexts may require nonobvious constraints put in place by the researcher; for example, when using a network to model the romantic relationships within a group of people, the researcher must exclude networks with ties between nodes that do not have the appropriate gender, depending on the respective preferences of each pair of nodes.

This thesis explores two classes of network models: the exponential random graph model (ERGM) and the exponential random network model (ERNM). These models are quite similar in that they both model the structural features of the network, such as node degree and triangle formation, as a function of nodal covariates, say age or gender. However, the ERNM also models the effect of network features on the nodes. For example, Fellows & Handcock (2012) show that teens with friends who smoke or drink are more likely to smoke or drink themselves.

The notion of local dependence was introduced by Schweinberger & Handcock (2015) in the context of ERGMs. Here we extend this idea to ERNMs and provide new proofs of their two main theorems in this context. The first theorem shows that the stochastic process of sampling smaller subnetworks from a large network satisfies a desirable consistency condition. This means that as we sample larger and larger subnetworks, the model parameter estimates become better and better approximations of the true parameter values. Shalizi & Rinaldo (2013) showed that this is not the case for most ERGMs. The second theorem shows that certain types of statistics of locally dependent random networks have an asymptotically normal distribution, which is also desirable from the point of view of making inferences based on these models.

References

Groendyke, C., Welch, D., & Hunter, D. R. (2012). A network-based analysis of the 1861 Hagelloch measles data. Biometrics, 68(3), 755–765. http://doi.org/10.1111/j.1541-0420.2012.01748.x

Fellows, I., & Handcock, M. S. (2012). Exponential-family random network models. ArXiv Preprint ArXiv:1208.0121.

Schweinberger, M., & Handcock, M. S. (2015). Local dependence in random graph models: Characterization, properties and statistical inference. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(3), 647–676.

Shalizi, C. R., & Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. The Annals of Statistics, 41(2), 508–535. http://doi.org/10.1214/12-AOS1044