Associative Arrays

Associative arrays generalize the more familiar notions of matrices, graphs, databases, and spreadsheets.

Formally, an associativey array is a map $\mathbf{A} \colon I \times J \to S$ where (i) $I$ and $J$ are arbitrary sets whose elements are called 'row indices' (or 'row labels') and 'column indices' (or 'column labels'), respectively, (ii) $S$ is the underlying set of a semiring $(S, \oplus, \otimes, 0, 1)$, and (iii) $\mathbf{A}(i, j) = 0$ for all but finitely many pairs $(i, j) \in I \times J$.

A semiring is a quintuple $(S, \oplus, \otimes, 0, 1)$ where $\oplus$ is a commutative ($a \oplus b = b \oplus a$) and associative ($a \oplus (b \oplus c) = (a \oplus b) \oplus c$) binary operation with identity $0$ ($a \oplus 0 = 0 \oplus a = a$) and $\otimes$ is an associative binary operation with identity $1$ which distributes over $\oplus$ ($a \otimes (b \oplus c) = (a \otimes b) \oplus (a \otimes c)$ and $(a \oplus b) \otimes c = (a \otimes c) \oplus (b \otimes c)$) and which has $0$ as an annihilator ($a \otimes 0 = 0 \otimes a = 0$). The choice of semiring is usually fixed for a particular context, and includes the max-min algebra $(\mathbb{R} \cup \{\pm \infty\}, \max, \min, -\infty, \infty)$, the max-plus algebra $(\mathbb{R} \cup \{-\infty\}, \max, +, -\infty, 0)$, and the usual plus-times algebra $(\mathbb{R}, +, \cdot, 0, 1)$.

In comparison, an $m \times n$ matrix with real entries can be described as a map $\mathbf{A} \colon \{1, 2, \ldots, m\} \times \{1, 2, \ldots, n\} \to \mathbb{R}$, so associative arrays generalize matrices both by allowing for more general row and column indices and alternative values and algebraic operations associated with those values. Like matrices, associative arrays support their own versions of matrix addition, matrix multiplication, and element-wise multiplication—given associative arrays $\mathbf{A} \colon I_\mathbf{A} \times J_\mathbf{A} \to S$ and $\mathbf{B} \colon I_\mathbf{B} \times J_\mathbf{B} \to S$ then

$$\begin{align*}
(\mathbf{A} \oplus \mathbf{B})(i, j) & = \mathbf{A}(i, j) \oplus \mathbf{B}(i, j), & (\text{associative array addition}) \\
(\mathbf{A} \mathbin{{\oplus}.{\otimes}} \mathbf{B})(i, j) & = \bigoplus_k{\mathbf{A}(i,k) \otimes \mathbf{B}(k, j)} & (\text{associative array multiplication}) \\
(\mathbf{A} \otimes \mathbf{B})(i, j) & = \mathbf{A}(i, j) \otimes \mathbf{B}(i, j), & (\text{associative array element-wise multiplication})
\end{align*}$$

with the understanding that if $\mathbf{C} \colon I \times J \to S$ is an associative array then $\mathbf{C}(i, j) = 0$ whenever $(i, j) \notin I \times J$, leveraging the assumed algebraic properties of $S$. These operations can be shown to satisfy many natural algebraic properties—e.g., associative array multiplication is associative and distributes over associative array addition—allowing them to be used in any linear algreba application that normally applies matrices or any other object which matrices can model. Moreover, the added flexibility of associative arrays can lead to shortened and simpler code by keeping track of row and column labels along the way or by utilizing nonstandard semirings.Mathematics of Big Data

A standard reference for the theory and application of associative arrays is Mathematics of Big Data by Jeremy Kepner and myself, particularly covering the D4M (Dynamic Distributed Dimensional Data Model) technology architected by Jeremy Kepner implemented in Python, MATLAB, and Julia. The Python implementation, D4M.py, is written and maintained by me.