Python > Numpy Basics
(See also the official tutorial.)
Numpy
numpy
is a widely-used Python library for numerical computations, centering around a ndarray
class, that can represent vectors, matrices and arbitrary n-dimensional tensors. Compared to nested Python lists, it’s more convenient, memory-efficient and much faster.1
Installation
If numpy
is not included in your Python installation, you can get it by opening a command prompt and entering2:
pip3 install numpy
(In Thonny, you can get the shell with the correct paths set using Tools/Open System shell.)
Usage
Many of the classes and function that numpy provides share names with builtins, so it’s unwise to import everything; i.e. don’t do:
However, it’s convenient to use an alias:
Multidimensional arrays
The central class for the numpy
functionality is ndarray
, representing an n-dimensional array of numbers.3 ndarray
s can be easily constructed from python lists using np.array(…)
:
To obtain multiple dimensions, we can use a nested list:
>>> a = [1, 2, 3, 4]
>>> b = [5, 6, 7, 8]
>>> c = [a, b]
>>> c
[[1, 2, 3, 4], [5, 6, 7, 8]]
>>> A = np.array(c)
>>> A
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
(Note that ndarray
s get printed nicely: row-by-row, with columns aligned.)
Or we can combine multiple ndarray
s with stack(…)
(or array(…)
):
>>> x = np.array(a)
>>> y = np.array(b)
>>> np.array([a, b])
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
>>> np.stack([a, b])
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
>>> np.array([x, y])
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
>>> np.stack([x, y])
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
(Most numpy
functions coerce arguments to ndarray
s, so you can mix lists and ndarray
s.)
Array atributes
ndarray
s have some useful attributes:
>>> A
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
>>> A.ndim # 0=scalar, 1=vector, 2=matrix, more=tensor
2
>>> A.shape # some tuple
(2, 4)
>>> A.size # number of elements
8
Array indexing
multi-axis indexing “advanced” indexing
Loading and saving
numpy
also has extensive support for loading/saving numeric data from/to files in different formats.
Text files
The simplest method is loadtxt(…)
, which can load a matrix of numbers from a text file. The basic format stores a \(m \times n\) matrix as a file with \(m\) lines, each of them containg \(n\) numbers separated by any number of tabs and spaces. For example the iris.dat:
0.51 0.35 0.14 0.02 1
0.49 0.30 0.14 0.02 1
0.47 0.32 0.13 0.02 1
...
Can be loaded simply as:
>>> data = np.loadtxt('iris.dat')
>>> data.ndim
2
>>> data.shape
(150, 5)
>>> data
array([[ 0.51, 0.35, 0.14, 0.02, 1. ],
[ 0.49, 0.3 , 0.14, 0.02, 1. ],
[ 0.47, 0.32, 0.13, 0.02, 1. ],
...
[ 0.65, 0.3 , 0.52, 0.2 , 3. ],
[ 0.62, 0.34, 0.54, 0.23, 3. ],
[ 0.59, 0.3 , 0.51, 0.18, 3. ]])
If the values are separated by a tabulator character, such a format is called tab-separated values (.tsv
). Possibly the most common format separates the values with commas, and so it’s called comma-separated values (.csv
). Unforunately, loading iris.csv results in an error:
>>> np.loadtxt('iris.csv')
...
ValueError: could not convert string to float: b'0.51,0.35,0.14,0.02,1'
It’s trying to interperet the whole line as a single number and failing. For it to work, we will need to specify that the separator, or delimiter, is a comma:
>>> data = np.loadtxt('iris.csv', delimiter=',')
>>> data
array([[ 0.51, 0.35, 0.14, 0.02, 1. ],
[ 0.49, 0.3 , 0.14, 0.02, 1. ],
[ 0.47, 0.32, 0.13, 0.02, 1. ],
...
[ 0.65, 0.3 , 0.52, 0.2 , 3. ],
[ 0.62, 0.34, 0.54, 0.23, 3. ],
[ 0.59, 0.3 , 0.51, 0.18, 3. ]])
Binary files
load save savez savez_compressed
Python, being a very high-level programming language, allows a very natural* and concise expression of ideas, but pays the price heavily in being amongst the slowest programming languages in use.↩︎
This only works if the Python and PIP are installed, and the environment is correctly set up, i.e. rarely.↩︎
Various numeric types, and even some non-numeric ones, are supported through
dtype=
arguments.↩︎