import torch
import numpy as np
= torch.tensor(range(6))
a = a.reshape(2, 3)
a a.shape, a.stride()
(torch.Size([2, 3]), (3, 1))
Tensors:
- We could define a Tensor as container for numerical data, arranged in regular grid, with a defined: shape and layout.
- basically we have 4 levels of tensors:
- Scalar: just a number
e.i 8
==> 0D - Vector: a
1D
row of numbers[1, 2]
- Matrix: a
2D
grid of rows & columns[[1, 2], [4, 1]]
- Tensor: any
nD
generalization of this3D, 4D..
- Scalar: just a number
# scalar has 0 dimension:
= torch.tensor(8)
scalar scalar.ndim
0
# vector is 1D row of numbers:
= torch.tensor([1, 2])
vec vec.ndim
1
# matrix is a 2D grid:
= torch.tensor([[1, 2], [3, 4]])
mat mat.ndim
2
# tensor is and shape of data that could be represented in 3 or more dimensions:
= torch.tensor(range(12))
ten = ten.reshape(2, 3, 2)
ten ten.ndim
3
Data BLOBs:
- The last tensor we’ve created has a very interesting proprety, first we created the number of elements we want
12
then we reshape it by repspecting 2 rules:- the total number of elements should be
12
exactly - these
12
elements should be distributed on3
dimensions in order to call it tensor - we decided to ge with
(2, 3, 2)
but we could go with any distribution as long as we respect the 2 rules.
- the total number of elements should be
- The
12
represent the data BLOB while the distribution represent the metadata that tells us how the data is shaped. - Data Blob is a large, row chunk of numerical data with no assumed structure untill interpreted, it’s shapeless untill we attach metadata to it.
- In the context of Kernel engineering we are not working with well defined tensors shapes, but with:
- pointers to data blobs in memory
- some metadata (shape, strides, dtype)
- a set of indexing rules to access the correct slice
- So if we write a tensor:
= torch.tensor((3, 4, 5)) x
- Under the hood the data is stored in a single flat buffer
60 floats
- The shape tells us: This is 3 blocks of 4 rows of 5 elements.
Stride:
- In the context of Kernel engineering the Stride is the most important key. Since data is stored in the memory as blobs, stride tells us how many elements to skip in memory to move to the next element along a specific dimension. Think of it as the “memory jump” for each axis.
= torch.tensor(range(6))
z = z.reshape(3, 2)
z z.shape, z.stride()
(torch.Size([3, 2]), (2, 1))
- The stride says:
- to move one row: jump 3 elements
stride[0]
- to move one column: jumpt one element
stride[1]
- to move one row: jump 3 elements
- So in our case the tensor
z
has a stride of(2, 1)
:2
is the number of jumps in order to get to the nextrow
- while
1
is the number of jump to get to the nextcolumn
Transposed Stride:
- What if we transposed the tensor
z
? will the stride remain the same?
= z.t()
y z.stride(), y.stride()
((2, 1), (1, 2))
- The transpose changed the stride but the data blob remain the same:
== y.data_ptr() z.data_ptr()
True
Stride exercises:
- Learn how stride works with some simple Pytorch examples: #### Exercise 1: Basic 2D Tensor Create a 2D tensor and inspect its stride.
# x is a 2D tensor
= torch.tensor(range(6)).reshape(2, 3)
x # its shape:
x.shape
torch.Size([2, 3])
- How to think about its stride?:
- In order to move to the next row how many element should we pass? ==>
3
- In order to get to the next column how many elements we need to jump? ==>
1
- In order to move to the next row how many element should we pass? ==>
- So the stride is
(3, 1)
x, x.stride()
(tensor([[0, 1, 2],
[3, 4, 5]]),
(3, 1))
Exercise 2: Transposed Tensor
- Transpose the tensor and observe how the stride changes.
= x.t()
y y, y.shape
(tensor([[0, 3],
[1, 4],
[2, 5]]),
torch.Size([3, 2]))
- In this case and since we reversed the shape, its obvious that the stride also will be reversed:
(1, 3)
- What’s important is that
Pytorch
doesn’t create new copy ofx
when trasnposed, it only redefine the way the new tensor isviewed
with creating new shape and new stride.
y.stride()
(1, 3)
Exercise 3: Unsqueezed Tensor
- Add a new dimension and understand how stride adjusts.
= x.unsqueeze(0)
z x.shape, z.shape
(torch.Size([2, 3]), torch.Size([1, 2, 3]))
- What happend here is that Pytorch pretend there’s a new outer dim, so the shape is changed from
[2, 3]
to[1, 2, 3]
. - the new dim
dim[0]
should have stride of6
, because in order to get to a new element in that dim (even that there’s only one element in that dim) we need to pass all other elements in both dimensions[1]
and[2]
, which both contain2*3 = 6
. - RULE: the stride of the new dim is always the product of the inner strides
- so the stride should be:
(6, 3, 1)
- so the stride should be:
z.stride()
(6, 3, 1)
= torch.tensor(range(8)).reshape(2, 4)
d d, d.shape
(tensor([[0, 1, 2, 3],
[4, 5, 6, 7]]),
torch.Size([2, 4]))
d.stride()
(4, 1)
= d.unsqueeze(1)
d1 d1
tensor([[[0, 1, 2, 3]],
[[4, 5, 6, 7]]])
d1.shape, d1.stride()
(torch.Size([2, 1, 4]), (4, 4, 1))
Exercise 4: Expanded Tensor
- Broadcast a tensor without copying memory.
= torch.ones(1, 3)
a = a.expand(2, 3) b
a.shape, b.shape
(torch.Size([1, 3]), torch.Size([2, 3]))
a.stride()
(3, 1)
- Here we have a tensor
a
of shape[1, 3]
then we useexpand
to make tensorb
with shape of[2, 3]
. - the method
expand
doesn’t create a new copy of the original tensor rather then virtually expanding a dimension by repeating it without chnaging the memory. - In this case the
dim[0]
will be virtually repeated2
times. - In the original tensor
a
we have a stride of(3, 1)
:- In order to get to the next element along
dim=0
(rows) we have to move 3 steps in memory - To move to the next element along
dim=1
(columns), step by 1 in memory.
- In order to get to the next element along
- Now with tensor
b
, as we sayexpand
add a virtuall element to thedim=0
, it add a new row, but in memory we don’t change anything. So to move to the next row we don’t have to step at all, so the stride at that dimension will be0
. - The other
dim=1
remain the same1
b.stride()
(0, 1)
Exercise 5: Permuted Tensor
Change dimension order and inspect stride layout.
= torch.randn(2, 3, 4)
x3 = x3.permute(2, 0, 1) y3
x3, x3.shape
(tensor([[[-4.9521e-01, -1.5715e+00, 9.7796e-01, -2.6375e-01],
[ 1.0992e+00, 4.2912e-01, 7.5855e-02, 1.6052e+00],
[-7.1012e-01, 7.3460e-01, -3.9331e-01, 1.0008e+00]],
[[ 5.4850e-01, -1.6360e+00, 1.8978e-01, -1.3920e-01],
[ 1.4362e-01, 4.4029e-01, -2.0576e-01, -2.7227e-01],
[-1.2247e-03, 1.3967e+00, -5.3473e-01, -7.4465e-01]]]),
torch.Size([2, 3, 4]))
x3.stride()
(12, 4, 1)
y3, y3.shape
(tensor([[[-4.9521e-01, 1.0992e+00, -7.1012e-01],
[ 5.4850e-01, 1.4362e-01, -1.2247e-03]],
[[-1.5715e+00, 4.2912e-01, 7.3460e-01],
[-1.6360e+00, 4.4029e-01, 1.3967e+00]],
[[ 9.7796e-01, 7.5855e-02, -3.9331e-01],
[ 1.8978e-01, -2.0576e-01, -5.3473e-01]],
[[-2.6375e-01, 1.6052e+00, 1.0008e+00],
[-1.3920e-01, -2.7227e-01, -7.4465e-01]]]),
torch.Size([4, 2, 3]))
To move along the new dimension 0 (size 4, originally dim 2), you step by 1 in memory (same as original dim 2).
To move along the new dimension 1 (size 2, originally dim 0), you step by 12 in memory (same as original dim 0).
To move along the new dimension 2 (size 3, originally dim 1), you step by 4 in memory (same as original dim 1).
This shows that permutation changes the order of strides but not their values. The new strides correspond to the original strides in the permuted order.
View:
In PyTorch, viewing a tensor refers to creating a new tensor that shares the same underlying data storage as the original tensor but with a different shape, stride, or metadata. This means the viewed tensor does not copy the data; instead, it provides an alternative way to interpret the existing data in memory. 1- Memory:
View allow tensors to share memory.
Modifying the viewed tensor will modifies the original tensor. 2- Shape and Stride adjustement:
As we saw earlier view can reinterpret a tensor shape end stride without copying it or changing the memory. 3- Zero-Cost Operation:
Viewing is efficient because it does not allocate new memory or copy data.
Operations like
view()
,transpose()
,permute()
,expand()
, and slicing often return views.
Broadcasting:
- Broadcasting automatically expands smaller tensors to match the shape of larger tensors for element-wise operations by following specific rules:
- Tensors are aligned from rights to left
- if sizes are equal then they are compatible
- If one tensor size is 1, it’s streched to match the other
- If one tensor is missing a dimension, it’s treated like size 1 dimension (then streched to match)
# adding vector to scalar
= torch.tensor([1, 2, 3])
vec = torch.tensor(5)
scal = vec + scal out
print(vec, vec.shape)
print(scal, scal.shape)
print(out, out.shape)
tensor([1, 2, 3]) torch.Size([3])
tensor(5) torch.Size([])
tensor([6, 7, 8]) torch.Size([3])
# tensor size 1
= torch.tensor([[1, 2], [3, 4]]) # Shape (2, 2)
A = torch.tensor([[10, 20]]) # Shape (1, 2)
B = A + B C
print(A, A.shape)
print(B, B.shape)
print(C, C.shape)
tensor([[1, 2],
[3, 4]]) torch.Size([2, 2])
tensor([[10, 20]]) torch.Size([1, 2])
tensor([[11, 22],
[13, 24]]) torch.Size([2, 2])
# tensor missing a dimension:
= torch.tensor([[1, 2], [3, 4]]) # Shape (2, 2)
D = torch.tensor([10, 20]) # Shape (2,)
R = D + R S
print(D, D.shape)
print(R, R.shape)
print(S, S.shape)
tensor([[1, 2],
[3, 4]]) torch.Size([2, 2])
tensor([10, 20]) torch.Size([2])
tensor([[11, 22],
[13, 24]]) torch.Size([2, 2])