Tomáš Kuzma

Python > Numpy Basics

Numpy

numpy is a widely-used Python library for numerical computations, centering around a ndarray class, that can represent vectors, matrices and arbitrary n-dimensional tensors. Compared to nested Python lists, it’s more convenient, memory-efficient and much faster.¹

Installation

If numpy is not included in your Python installation, you can get it by opening a command prompt and entering²:

pip3 install numpy

(In Thonny, you can get the shell with the correct paths set using Tools/Open System shell.)

Usage

Many of the classes and function that numpy provides share names with builtins, so it’s unwise to import everything; i.e. don’t do:

from numpy import *

However, it’s convenient to use an alias:

import numpy as np

Multidimensional arrays

The central class for the numpy functionality is ndarray, representing an n-dimensional array of numbers.³ ndarrays can be easily constructed from python lists using np.array(…):

>>> a = [1, 2, 3, 4, 5]
>>> x = np.array(a)
>>> x
array([1, 2, 3, 4, 5])

To obtain multiple dimensions, we can use a nested list:

>>> a = [1, 2, 3, 4]
>>> b = [5, 6, 7, 8]
>>> c = [a, b]
>>> c
[[1, 2, 3, 4], [5, 6, 7, 8]]
>>> A = np.array(c)
>>> A
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

(Note that ndarrays get printed nicely: row-by-row, with columns aligned.)

Or we can combine multiple ndarrays with stack(…) (or array(…)):

>>> x = np.array(a)
>>> y = np.array(b)
>>> np.array([a, b])
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
>>> np.stack([a, b])
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
>>> np.array([x, y])
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
>>> np.stack([x, y])
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

(Most numpy functions coerce arguments to ndarrays, so you can mix lists and ndarrays.)

Array atributes

ndarrays have some useful attributes:

>>> A
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
>>> A.ndim   # 0=scalar, 1=vector, 2=matrix, more=tensor
2
>>> A.shape  # some tuple
(2, 4)
>>> A.size   # number of elements
8

Array indexing

multi-axis indexing “advanced” indexing

Loading and saving

numpy also has extensive support for loading/saving numeric data from/to files in different formats.

Text files

The simplest method is loadtxt(…), which can load a matrix of numbers from a text file. The basic format stores a \(m \times n\) matrix as a file with \(m\) lines, each of them containg \(n\) numbers separated by any number of tabs and spaces. For example the iris.dat:

0.51 0.35 0.14 0.02 1
0.49 0.30 0.14 0.02 1
0.47 0.32 0.13 0.02 1
...

Can be loaded simply as:

>>> data = np.loadtxt('iris.dat')
>>> data.ndim
2
>>> data.shape
(150, 5)
>>> data
array([[ 0.51,  0.35,  0.14,  0.02,  1.  ],
       [ 0.49,  0.3 ,  0.14,  0.02,  1.  ],
       [ 0.47,  0.32,  0.13,  0.02,  1.  ],
       ...
       [ 0.65,  0.3 ,  0.52,  0.2 ,  3.  ],
       [ 0.62,  0.34,  0.54,  0.23,  3.  ],
       [ 0.59,  0.3 ,  0.51,  0.18,  3.  ]])

If the values are separated by a tabulator character, such a format is called tab-separated values (.tsv). Possibly the most common format separates the values with commas, and so it’s called comma-separated values (.csv). Unforunately, loading iris.csv results in an error:

>>> np.loadtxt('iris.csv')
...
ValueError: could not convert string to float: b'0.51,0.35,0.14,0.02,1'

It’s trying to interperet the whole line as a single number and failing. For it to work, we will need to specify that the separator, or delimiter, is a comma:

>>> data = np.loadtxt('iris.csv', delimiter=',')
>>> data
array([[ 0.51,  0.35,  0.14,  0.02,  1.  ],
       [ 0.49,  0.3 ,  0.14,  0.02,  1.  ],
       [ 0.47,  0.32,  0.13,  0.02,  1.  ],
       ...
       [ 0.65,  0.3 ,  0.52,  0.2 ,  3.  ],
       [ 0.62,  0.34,  0.54,  0.23,  3.  ],
       [ 0.59,  0.3 ,  0.51,  0.18,  3.  ]])

Binary files

load save savez savez_compressed

Python, being a very high-level programming language, allows a very natural* and concise expression of ideas, but pays the price heavily in being amongst the slowest programming languages in use.↩︎
This only works if the Python and PIP are installed, and the environment is correctly set up, i.e. rarely.↩︎
Various numeric types, and even some non-numeric ones, are supported through dtype= arguments.↩︎