New Py5Vector Class

Over the past few months, one of the most requested py5 features has been a dedicated vector class. Since there is a lot of interest, I wanted to build a draft version of a vector class that I am now making available for discussion with the growing py5 community.

The vector class code associated with this blog post is available on gist, but since this is an evolving process, the most current version will always be in the github repo. I've also created a discussion on github for feedback; please direct your comments and questions there. Feel free to comment or question anything you like.

Background

The py5 library, as you know, is a Python version of Processing that uses the JPype bridge to connect Python to the Processing Java code. Therefore, the functionality provided by py5 comes from actual Java objects that are created while programming in Python. Early on in the development of py5 I had to make choices about which Java classes to include and which fields and methods to expose to Python. It was at that time that I made the decision that Processing's PVector and PMatrix classes would be left out, because it didn't make sense to me to make calls to Java to do math calculations that could also be done with numpy arrays. Recall the goal of py5 is to create a new version of Processing that is integrated into the Python ecosystem. I want to leverage Python's strengths whenever possible, while making intelligent choices about when to provide functionality with Processing's code and when to use other Python libraries instead.

Also be aware that there would be performance implications to using Processing's PVector and PMatrix classes from Python. A likely use case for these classes is to do many math operations in each call to draw(), which would result in a lot of going back and forth between Python and Java. I don't want py5 to be slow for reasons that are hard for beginners to understand.

Using numpy arrays for vector calculations does work well but it quickly became apparent that for some vector uses cases, such as rotations, it can be a real drag. Can any of you type out the code for a rotation matrix to rotate one vector around another without looking it up? I can't. Does anyone want to stop and look that up when they are writing code? I don't. Instead, let's make our lives easier by providing a dedicated vector class to do this stuff for us. This new class will be backed by a numpy array and provide methods for useful vector operations so we will get good performance without having to think so much about the details of the vector calculations.

Goals

Before delving into the details, I wanted to list the goals or guiding principles that drove the design decisions. Feel free to comment or question these as well.

  • Consistency with general programming conventions established by numpy, Processing, and general Python programming. This will accomplish two things:
    1. The vector class will be more intuitive for folks who already have experience with numpy and Processing
    2. The vector class will help introduce new concepts that beginner programmers will see again in their learning paths. This will help build the foundation for intuition and help py5 become a useful teaching tool.
  • Leverage Python's strengths where it makes sense to do so.
  • This shouldn't expand into a scientific computing vector library. There are already other Python libraries that provide that functionality, and I know it would be a tremendous amount of work to try to duplicate that. I'm willing to sacrifice some performance in favor of simplicity and usability.
  • The vector class should work well with numpy functions so users are not limited to only the mathematical operations built into the class. I don't want the vector class to need to include every possible linear algebra function to maximize its potential.
  • The quantitative calculations need to be correct because to do otherwise is misleading and a disservice to beginners.
  • Error messages should be appropriate and helpful. Any error message that falls short of that should be reported. (This applies to all of py5.)

Creating Vectors

Now let's get to the fun part: creating our first Py5Vector.

In [1]:
import numpy as np
from py5 import Py5Vector

v1 = Py5Vector(1, 2, 3)

v1
Out[1]:
Py5Vector3D([1., 2., 3.])

Observe that we used the Py5Vector constructor and got an instance of Py5Vector3D back. What's actually happening here is there are 3 vector classes: Py5Vector2D, Py5Vector3D, and Py5Vector4D, all of which inherit from Py5Vector, and all of which can be constructed with Py5Vector. (All of this comes about by implementing __new__ instead of __init__ in the parent class). You'll really only need Py5Vector except when you want to use isinstance.

In [2]:
from py5 import Py5Vector2D, Py5Vector3D, Py5Vector4D

print('is Py5Vector?', isinstance(v1, Py5Vector))
print('is Py5Vector2D?', isinstance(v1, Py5Vector2D))
print('is Py5Vector3D?', isinstance(v1, Py5Vector3D))
print('is Py5Vector4D?', isinstance(v1, Py5Vector4D))
is Py5Vector? True
is Py5Vector2D? False
is Py5Vector3D? True
is Py5Vector4D? False

The Py5Vector constructor is reasonably sophisticated in what inputs it will accept. It will figure out the appropriate dimensionality of the inputs and create an instance of the proper vector class. The other vector classes like Py5Vector3D are similar except they are specific to vectors with a certain dimension (i.e., only 3D vectors). Most of the time you'll only need Py5Vector.

In [3]:
Py5Vector3D(10, 20, 30)
Out[3]:
Py5Vector3D([10., 20., 30.])
In [4]:
try:
    Py5Vector2D(10, 20, 30)
except RuntimeError as e:
    print(e)
Error: dim parameter is 2 but Py5Vector values imply dimension of 3
In [5]:
Py5Vector2D(10, 20)
Out[5]:
Py5Vector2D([10., 20.])

You can create a vector using other inputs besides a sequence of numbers.

In [6]:
Py5Vector([2, 4, 6])
Out[6]:
Py5Vector3D([2., 4., 6.])
In [7]:
Py5Vector(np.arange(3))
Out[7]:
Py5Vector3D([0, 1, 2])

You can create a vector using another vector. In this example, we create a 4D vector from a 3D vector and an extra number (0) for the 4th dimension:

In [8]:
Py5Vector(v1, 0)
Out[8]:
Py5Vector4D([1., 2., 3., 0.])

Sometimes you need a vector of zeros. Either of the following will work:

In [9]:
Py5Vector(dim=3)
Out[9]:
Py5Vector3D([0., 0., 0.])
In [10]:
Py5Vector3D()
Out[10]:
Py5Vector3D([0., 0., 0.])

There are class methods for creating random vectors. Random vectors will have a magnitude of 1.

In [11]:
Py5Vector.random(dim=4)
Out[11]:
Py5Vector4D([-0.19608574, -0.03775483,  0.67372136,  0.71149454])
In [12]:
Py5Vector4D.random()
Out[12]:
Py5Vector4D([ 0.48237797,  0.02526025, -0.81360335, -0.32360935])

Also, a from_heading() class method, for creating vectors with a specific heading:

In [13]:
Py5Vector.from_heading(np.pi / 4)
Out[13]:
Py5Vector2D([0.70710678, 0.70710678])

Implementing 3D headings caused some headaches. The 3D heading calculations are consistent with Wikipedia's Spherical Coordinate System article, which is also consistent with this Coding Train video. Note that neither will give the same results as p5's fromAngles() calculations because p5 measures angles relative to different axes.

In [14]:
Py5Vector.from_heading(0.1, 0.2)
Out[14]:
Py5Vector3D([0.0978434 , 0.01983384, 0.99500417])

Implementing 4D headings caused even bigger headaches. The 4D heading calculations are consistent with Wikipedia's N-Sphere article.

In [15]:
Py5Vector.from_heading(0.1, 0.2, 0.3)
Out[15]:
Py5Vector4D([0.99500417, 0.0978434 , 0.01894799, 0.0058613 ])

Data Types

Like numpy arrays, Py5Vector instances have a data type (dtype).

In [16]:
v1
Out[16]:
Py5Vector3D([1., 2., 3.])
In [17]:
v1.dtype
Out[17]:
dtype('float64')

On 64 bit computers, the data type of vectors will default to 64 bit floating point numbers.

If you like, you can specify a different sized floating data type. Only floating types are allowed.

In [18]:
v2 = Py5Vector(1, 3, 5, dtype=np.float16)

v2
Out[18]:
Py5Vector3D([1., 3., 5.], dtype=float16)

Much like numpy arrays, the data type will be propagated through math operations:

In [19]:
v3 = Py5Vector(0.1, 0.2, 0.3, dtype=np.float128)

v2 + v3
Out[19]:
Py5Vector3D([1.1, 3.2, 5.3], dtype=float128)

Accessing Vector Data

You can access the vector's data using array indexing or vector properties.

In [20]:
v1 = Py5Vector(1, 2, 3)

v1
Out[20]:
Py5Vector3D([1., 2., 3.])
In [21]:
v1.x, v1.y, v1.z
Out[21]:
(1.0, 2.0, 3.0)
In [22]:
v1[0], v1[1], v1[2]
Out[22]:
(1.0, 2.0, 3.0)

A 2D vector does not have the third z attribute. A 4D vector has a fourth w attribute.

All of these support assignment, including inplace operations.

In [23]:
v1.x = 10
v1[1] = 20
v1.z += 30

v1
Out[23]:
Py5Vector3D([10., 20., 33.])

The vector has properties such as magnitude and heading that work the same way:

In [24]:
v1.mag
Out[24]:
39.8622628559895
In [25]:
v1.mag = 3

v1
Out[25]:
Py5Vector3D([0.7525915 , 1.50518299, 2.48355193])
In [26]:
v1.mag
Out[26]:
3.0
In [27]:
v1.heading
Out[27]:
(0.5955311914005236, 1.1071487177940904)
In [28]:
v1.heading = np.pi / 4, np.pi / 4

v1
Out[28]:
Py5Vector3D([1.5       , 1.5       , 2.12132034])
In [29]:
v1.heading
Out[29]:
(0.7853981633974482, 0.7853981633974483)
In [30]:
v1.mag
Out[30]:
3.0
In [31]:
v1.mag_sq
Out[31]:
9.0
In [32]:
v1.mag_sq = 100

v1.mag
Out[32]:
10.0
In [33]:
v1
Out[33]:
Py5Vector3D([5.        , 5.        , 7.07106781])

There are also methods like set_mag(), set_mag_sq(), set_limit(), and set_heading() if you don't want to use the properties. Each of these will modify the vector in place and will return the vector itself to support method chaining.

In [34]:
v1 = Py5Vector.random(dim=3)

v1.set_mag(5)
Out[34]:
Py5Vector3D([-3.90580284,  2.76823014, -1.44277721])
In [35]:
v1
Out[35]:
Py5Vector3D([-3.90580284,  2.76823014, -1.44277721])
In [94]:
v1.set_mag(2).set_heading(np.pi / 4, np.pi / 4)
Out[94]:
Py5Vector3D([1.        , 1.        , 1.41421356])
In [95]:
v1
Out[95]:
Py5Vector3D([1.        , 1.        , 1.41421356])

Use normalize() to modify the vector in place and normalize it. This will set the vector magnitude to 1.

In [96]:
v1.mag
Out[96]:
2.0
In [97]:
v1.normalize()

v1
Out[97]:
Py5Vector3D([0.5       , 0.5       , 0.70710678])
In [98]:
v1.mag
Out[98]:
1.0

Each Py5Vector stores its vector data in a small numpy array. To access that, use the data attribute.

In [99]:
v1.data
Out[99]:
array([0.5       , 0.5       , 0.70710678])

You can also use the dim and dtype attributes to get the size and data type.

In [100]:
v1.dim
Out[100]:
3
In [101]:
v1.dtype
Out[101]:
dtype('float64')

Use the norm attribute to create a normalized copy of a vector:

In [102]:
v1 *= 10

v1
Out[102]:
Py5Vector3D([5.        , 5.        , 7.07106781])
In [103]:
v1.norm
Out[103]:
Py5Vector3D([0.5       , 0.5       , 0.70710678])

Use the copy attribute to create an unmodified copy of the vector:

In [104]:
v2 = v1.copy

v2.x = 42

v2
Out[104]:
Py5Vector3D([42.        ,  5.        ,  7.07106781])

Observe that v1 is unchanged.

In [105]:
v1
Out[105]:
Py5Vector3D([5.        , 5.        , 7.07106781])

Swizzling

Vector Swizzling) is a useful feature inspired by OpenGL's vector class#Vectors). The basic idea is you can compose new vectors by rearranging components of other vectors. For example:

In [106]:
v1 = Py5Vector(1, 2, 3)

v1
Out[106]:
Py5Vector3D([1., 2., 3.])
In [107]:
v1.yx
Out[107]:
Py5Vector2D([2., 1.])
In [108]:
v1.xyzz
Out[108]:
Py5Vector4D([1., 2., 3., 3.])

Swizzles support item assignment. Possible assignments include constants as well as and properly sized numpy arrays, Py5Vectors, and iterables.

In [109]:
v1.xy = 10, 20

v1
Out[109]:
Py5Vector3D([10., 20.,  3.])
In [110]:
v1.zx += 100

v1
Out[110]:
Py5Vector3D([110.,  20., 103.])

You can use x, y, z, and w to refer to the first, second, third, and fourth components. A "swizzle" can be up to 4 components in length. Using the same component multiple times is allowed when accessing data but not for assignments.

Math Operations

You can do math operations on Py5Vectors. Operands can be constants, or properly sized numpy arrays, Py5Vectors, or iterables.

In [111]:
v1 = Py5Vector(1, 2, 3)
v2 = Py5Vector(10, 20, 30)

v1 + 10
Out[111]:
Py5Vector3D([11., 12., 13.])
In [112]:
v1 + v2
Out[112]:
Py5Vector3D([11., 22., 33.])

Numpy array operands must be broadcastable to a shape that numpy can work with. If operation's result is appropriate for a Py5Vector, the result will be a Py5Vector. Otherwise, it will be a numpy array. For example:

In [113]:
v1 + np.random.rand(3)
Out[113]:
Py5Vector3D([1.33601529, 2.54840901, 3.22092252])

Below, the numpy array is broadcastable because the size of the last dimension is 3, which matches the size of the vector v1. The result of the operation is a 2D array and cannot be a vector. This operation effectively adds v1 to each row of the numpy array.

In [114]:
v1 + np.random.rand(4, 3)
Out[114]:
array([[1.36492683, 2.534238  , 3.03382322],
       [1.98618679, 2.74935798, 3.80628041],
       [1.60972545, 2.29602945, 3.99954846],
       [1.11146893, 2.04949383, 3.30554232]])

Next, a 3D vector is matrix multiplied with a 3x2 array. The result of the calculation is an array with 2 elements, which will be returned as a 2D vector:

In [115]:
v1 @ np.random.rand(3, 2)
Out[115]:
Py5Vector2D([0.37575242, 2.22185454])

Note that if the operands are reversed (and the matrix size is modified appropriately) the result is a numpy array, not a Py5Vector. It is a numpy array because this calculation is done by numpy's matrix multiplication method and not py5's.

In [116]:
np.random.rand(2, 3) @ v1
Out[116]:
array([3.42223233, 1.87167682])

Doing a matrix multiplication with a 3x5 array creates a 5 element numpy array because py5 does not support 5D vectors:

In [117]:
v1 @ np.random.rand(3, 5)
Out[117]:
array([1.67496132, 3.05884219, 3.60909022, 2.8925475 , 4.47391309])

You can add or subtract Py5Vectors, like so:

In [118]:
v1 - v2
Out[118]:
Py5Vector3D([ -9., -18., -27.])

Other operations like multiplication, division, modular division, or power don't really make sense for two vectors and are not allowed. But that's ok because you can just use the vector's data attribute to access the vector's data as a numpy array. Any properly sized operation with a Py5Vector and a numpy array is always allowed:

In [119]:
v1 / v2.data
Out[119]:
Py5Vector3D([0.1, 0.1, 0.1])
In [120]:
v2 ** v1.data
Out[120]:
Py5Vector3D([1.0e+01, 4.0e+02, 2.7e+04])

You can do in place operations on a Py5Vector:

In [121]:
v1 += v2

v1
Out[121]:
Py5Vector3D([11., 22., 33.])

In place operations that would try to change the size or type of the output operand are not possible and therefore not allowed.

In [122]:
try:
    v1 += np.random.rand(4, 3)
except RuntimeError as e:
    print(e)
Unable to perform addition on a Py5Vector and a numpy array, probably because of a size mismatch. The error message is: non-broadcastable output operand with shape (3,) doesn't match the broadcast shape (4,3)

Py5Vectors work well with other Python builtins:

In [123]:
v1 = Py5Vector4D.random() - 5

v1
Out[123]:
Py5Vector4D([-4.34877488, -5.03780931, -4.95748879, -5.75674903])
In [124]:
round(v1)
Out[124]:
Py5Vector4D([-4., -5., -5., -6.])
In [125]:
abs(v1)
Out[125]:
Py5Vector4D([4.34877488, 5.03780931, 4.95748879, 5.75674903])
In [126]:
divmod(v1, [1, 2, 3, 4])
Out[126]:
(Py5Vector4D([-5., -3., -2., -2.]),
 Py5Vector4D([0.65122512, 0.96219069, 1.04251121, 2.24325097]))

A Py5Vector will evaluate to True if it has at least one non-zero element.

In [127]:
bool(v1)
Out[127]:
True
In [128]:
v2 = Py5Vector3D()

v2
Out[128]:
Py5Vector3D([0., 0., 0.])
In [129]:
bool(v2)
Out[129]:
False

Other Math Functions

There is a lerp() method for doing linear interpolations between two vectors:

In [130]:
v1 = Py5Vector(10, 100)
v2 = Py5Vector(20, 200)

v1.lerp(v2, 0.1)
Out[130]:
Py5Vector2D([ 11., 110.])
In [131]:
v1.lerp(v2, 0.9)
Out[131]:
Py5Vector2D([ 19., 190.])

The dist() method calculates the distance between two vectors:

In [132]:
v1.dist(v2)
Out[132]:
100.4987562112089

The dot() method calculates the dot product of two vectors:

In [133]:
v1.dot(v2)
Out[133]:
20200.0

And finally, the cross() method...this one required a lot of thought. Technically the cross product is only defined for 3D vectors, but many vector implementations allow 2D vectors for cross calculations. Unfortunately there is little consistency in how 2D vectors are handled.

Processing will always assume that a 2D vector's z component is zero and the calculation will always return a 3D vector. But then again Processing has only one class for both 2D and 3D vectors, so what else can it do.

Numpy implements the cross method differently. When calculating a cross between a 2D and a 3D vector, numpy will assume that the 2D vector's z component is zero and will proceed accordingly. For two 2D vectors, it will return just the z value, which is sometimes called a "wedge product".

Personally, I am not a fan of making assumptions about the z component being zero or supporting 2D vectors at all because I think it misleads people about what a cross product actually is. However, I felt that being inconsistent with np.cross() would be even more confusing, and decided to build Py5Vector's cross implementation by following what numpy does.

In [134]:
v1 = Py5Vector3D.random()
v2 = Py5Vector3D.random()

Here is the cross product of two 3D vectors:

In [135]:
v1.cross(v2)
Out[135]:
Py5Vector3D([ 0.32218696,  0.93060695, -0.03344815])

The cross product of a 3D vector and a 2D vector:

In [136]:
v1.cross(v2.xy)
Out[136]:
Py5Vector3D([ 0.22244346,  0.6018577 , -0.03344815])

That calculation assumed that the z component was zero:

In [137]:
v1.cross(Py5Vector(v2.xy, 0))
Out[137]:
Py5Vector3D([ 0.22244346,  0.6018577 , -0.03344815])

Note that the values are the same as what np.cross() returns, which is important.

In [138]:
np.cross(v1, v2.xy)
Out[138]:
array([ 0.22244346,  0.6018577 , -0.03344815])

The cross product of two 2D vectors returns a scalar.

In [139]:
v1.xy.cross(v2.xy)
Out[139]:
-0.033448148822155965

This is also consistent with what np.cross() returns:

In [140]:
np.cross(v1.xy, v2.xy)
Out[140]:
array(-0.03344815)

I believe it would be a design mistake to throw an error or return different results than numpy.

All numpy functions should accept Py5Vector instances as if they were any other iterable. This can be used to do calculations that the Py5Vector class does not directly support.

In [141]:
np.outer(v1, v2)
Out[141]:
array([[ 0.50532159, -0.18676422, -0.32874925],
       [-0.15331607,  0.05666482,  0.0997435 ],
       [ 0.6018577 , -0.22244346, -0.39155316]])
In [142]:
np.sin(v1)
Out[142]:
array([ 0.59004579, -0.19031434,  0.68286939])
In [143]:
np.ceil(v1)
Out[143]:
array([ 1., -0.,  1.])

Open Questions

Numpy's Array Interface

I made a deliberate choice to not implement numpy's array interface. Although I do believe that numpy's array interface and memory sharing is a good feature, I think it could potentially be a bit confusing to beginners because memory sharing might lead to unexpected results. However, I don't fully understand the array interface and might be wrong about leaving it out.

In any case, note that if pass the Py5Vector constructor a numpy array you can explicitly tell it not to make it's own copy of the data with the copy parameter.

In [144]:
numbers = np.random.rand(2, 3)

numbers
Out[144]:
array([[0.52302751, 0.49406047, 0.41204369],
       [0.05374364, 0.09111643, 0.36880163]])
In [145]:
v1 = Py5Vector(numbers[0], copy=False)
v2 = Py5Vector(numbers[1], copy=False)
In [146]:
v1
Out[146]:
Py5Vector3D([0.52302751, 0.49406047, 0.41204369])
In [147]:
v2
Out[147]:
Py5Vector3D([0.05374364, 0.09111643, 0.36880163])

Now changes to v1 and v2 will change the numbers array, and vice versa:

In [148]:
v1.x = 10
v2.x = 20
numbers[:, 2] = 42
In [149]:
v1
Out[149]:
Py5Vector3D([10.        ,  0.49406047, 42.        ])
In [150]:
v2
Out[150]:
Py5Vector3D([20.        ,  0.09111643, 42.        ])
In [151]:
numbers
Out[151]:
array([[10.        ,  0.49406047, 42.        ],
       [20.        ,  0.09111643, 42.        ]])

With this feature and some thought you will be able to create something clever using a vector field.

Vector Math Operations

Previously I said:

Other operations (besides addition and subtraction) like multiplication, division, modular division, or power don't really make sense for two vectors and are not allowed. But that's ok because you can just use the vector's data attribute to access the vector's data as a numpy array.

This makes sense to me but I am not convinced that is the right decision. Should I allow element-wise calculations between two vectors, much like what can be done with two numpy arrays?

Feedback

Questions? Comments? Please direct your feedback to the github discussion thread for the Py5Vector class.