Tensor: vectorized computational graphs

Tensor: vectorized computational graphs#

Although Scalar provides a complete implementation for computational graphs, they are terribly hard to work with, not to mention their performance issues. As we’ve seen it earlier, training a simple neural net with a hidden layer of eight Scalar neurons takes a lot of time, and defining it without matrix multiplications is a pain.

Fortunately, we don’t have to be so granular when defining computational graphs. Via the magic of linear algebra, we can seriously cut down the number of nodes and edges in our graphs, resulting in a blazing increase in speed and notational simplicity.

To give you an example, consider the dot product operation, defined by

\[ \mathbf{x} \cdot \mathbf{y} = \sum_{i=1}^{n} x_i y_i, \quad \mathbf{x}, \mathbf{y} \in \mathbb{R}^n. \]

This is a computational graph that contains \(4n - 1 \) nodes and \(4n - 2 \) edges. Here it is for three-dimensional vectors.

../_images/f662e663ee1a3d9b9672bb2045c4fa35882c77af7b097184e79d07b547b99996.svg

Consider doing a backward pass in this graph: you have to traverse every node, every edge, and execute functions there. It’s even worse for matrix multiplications, and the computational complexity piles up fast. For a basic one-layer network like \( \sigma(\mathrm{ReLU}(xA)B) \), we already have a ton of components. If the inputs are images, we are in the tens of thousands. That’s not going to work out in the long run.

What if we replace scalars with vectors and matrices in our computational graphs?

This is how the dot product would look like:

../_images/2ce0cd31c7d61453efe3550f9fe5f5cd060455d2242471e25420956ef6216e17.svg

Structurally, this is identical to the one of matrix multiplication:

../_images/8c3a839e511dbd9c6f2536d8c97ba7c1ae2e2e0cb8024b3b570407fa60d19baa.svg

This is what the Tensor class implements in mlfz. Let’s see what tensors are!

How to work with Tensor#

Similarly to Scalar, the Tensor class is a node in a computational graph. This time, instead of a scalar value, they represent NumPy arrays.

import numpy as np
from mlfz.nn.tensor import Tensor


x = Tensor.ones(3, 4)
y = Tensor.zeros_like(x)
x + y
Tensor([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

A Tensor can be initialized in multiple ways; the principal one is via a NumPy array.

Tensor(np.array([[1, 2], [3, 4]]))
Tensor([[1, 2],
       [3, 4]])
Tensor.from_random(2, 4)
Tensor([[0.66423909, 0.2134525 , 0.02665669, 0.63707883],
       [0.92479228, 0.93492116, 0.81375867, 0.33039595]])
Tensor.zeros(5, 2)
Tensor([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])
x = Tensor.zeros(3, 4)
Tensor.ones_like(x)
Tensor([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

Just like a Scalar, Tensor instances hold three important attributes as well:

  • a tensor value,

  • the backwards gradient,

  • and the list of incoming edges.

x = Tensor.ones(3, 4)
y = Tensor.ones(3, 4)
z = x * y
z.backwards_grad    # this is a NumPy array
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
z.prevs   # and this is a list of Edges
[Edge(prev=Tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]), local_grad=array([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]), backward_fn=<function _pointwise at 0x7776905d0c10>),
 Edge(prev=Tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]), local_grad=array([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]), backward_fn=<function _pointwise at 0x7776905d0c10>)]

We’ll talk about the Edge object later, but if you are observant, you may have noticed that compared to the Scalar version, we have an additional backward_fn attribute. This is the price we pay for vectorization.

First, let’s see what Tensors can do.

Tensors and operations#

Tensors support quite a few operations:

  • pointwise addition +,

  • pointwise subtraction -,

  • pointwise multiplication *,

  • pointwise division /,

  • pointwise exponentiation **,

  • matrix multiplication @,

  • and matrix transposition T.

Thanks to the broadcasting magic of NumPy, the other operand need not be a Tensor, it can be a vanilla Python number. (We’ll open a whole other can of worms with broadcasting, but we’ll cross that bridge later.)

x = Tensor.ones(3, 4)
2 + x
Tensor([[3., 3., 3., 3.],
       [3., 3., 3., 3.],
       [3., 3., 3., 3.]])

Just like Scalars, we can apply functions to Tensors.

from mlfz.nn.tensor.functional import exp

x = Tensor(value=np.array([[1, 2], [3, 4]]))
exp(x)
Tensor([[ 2.71828183,  7.3890561 ],
       [20.08553692, 54.59815003]])

As a Tensor is essentially a wrapper over NumPy arrays, it has its own sum and mean methods, working identically to the original versions.

x = Tensor.ones(3, 4)
x.sum()
Tensor(12.)
x.sum(axis=0)
Tensor([3., 3., 3., 3.])
x.sum(axis=1)
Tensor([4., 4., 4.])
y = Tensor(np.array([[1, 2], [3, 4]]))
y.mean()
Tensor(2.5)
y.mean(axis=0)
Tensor([2., 3.])

Computational graphs with Tensors#

Just like most features, the graph-building is the same as well: it is dynamically built upon applying functions and operations.

from mlfz.nn.tensor.functional import sigmoid, tanh

x = Tensor.from_random(1, 3)
A = Tensor.from_random(3, 5)
B = Tensor.from_random(5, 1)

y = sigmoid(tanh(x @ A) @ B)
y
Tensor([[0.82274369]])
y.shape
(1, 1)

Again, the y.backward method calculates the gradient of y with respect to all nodes in the graph. As our nodes are tensors, the backwards gradient will be a tensor as well, with the shape matching the node’s shape.

y.backward()
x.backwards_grad
array([[0.09267923, 0.08430855, 0.09109362]])
A.backwards_grad
array([[0.01385932, 0.007838  , 0.00854733, 0.00027595, 0.02864256],
       [0.04586972, 0.02594114, 0.0282888 , 0.0009133 , 0.0947973 ],
       [0.02096891, 0.01185875, 0.01293196, 0.00041751, 0.04333568]])
B.backwards_grad
array([[0.04840085],
       [0.10909941],
       [0.12004132],
       [0.11893912],
       [0.07835384]])

In essence, tensors allow us to vectorize computational graphs, increasing the speed and simplicity like you wouldn’t believe. Let’s proceed to build one!