NumPy By Example

A Beginner's Guide to Learning NumPy (updated for NumPy 2) (revised for NumPy 2)

DAT Linux

This book is available at https://leanpub.com/numpybyexample

This version was published on 2025-03-18

publisher's logo

*   *   *   *   *

*   *   *   *   *

© 2025 DAT Linux

Table of Contents

  1. Preface to the 2025 edition
  2. About the Book
  3. About DAT Linux
  4. Chapter 1. Introduction
    1. 1.1  What is NumPy?
    2. 1.2  Why NumPy is Important
    3. 1.3  NumPy as a Base or Integration Library
    4. 1.4  Why Learn NumPy?
  5. Chapter 2. Getting Started
    1. 2.1  Python 3
    2. 2.2  Using pip
    3. 2.3  Installing NumPy
    4. 2.4  Execution Environment
    5. 2.5  NumPy First Steps
  6. Chapter 3. Array Basics
    1. 3.1  Dimensions & Axes
    2. 3.2  Data Types
    3. 3.3  Compound Types
    4. 3.4  Mathematical Constants
    5. 3.5  Exercises
  7. Chapter 4. Array Creation
    1. 4.1  Create Arrays using Python Lists
    2. 4.2  Create Empty Arrays
    3. 4.3  Create Arrays Filled with Preferred Values
    4. 4.4  Create Arrays Filled with Incremental Sequences
    5. 4.5  Create Arrays Filled with Random Values using numpy.random
    6. 4.6  Array-like objects
    7. 4.7  Create Arrays from Other Arrays or Array-like Objects
    8. 4.8  Creating Common Matrices (2-D arrays)
    9. 4.9  Structured Arrays
    10. 4.10  Record arrays
    11. 4.11  Other Ways to Create Arrays
    12. 4.13  Exercises
  8. Chapter 5. Array Inspection
    1. 5.1  Shape & Size Information
    2. 5.2  Truth Evaluation
    3. 5.3  Type Properties
    4. 5.4  String Representation
    5. 5.5  Exercises
  9. Chapter 6. Input & Output
    1. 6.1  Persisting & Loading a Single Array
    2. 6.2  Persisting & Loading Multiple Arrays
    3. 6.3  Write Data to a CSV (text) File
    4. 6.4  A Note About File Paths
    5. 6.5  Exercises
  10. Chapter 7. Array Selection & Modification
    1. 7.1  Common Indexing & Slicing: 1-D Arrays
    2. 7.2  Common Indexing & Slicing: n-D Arrays
    3. 7.3  Fancy Indexing
    4. 7.4  Exercises
  11. Chapter 8. Array Computation
    1. 8.1  Unary Ufuncs — Operating on a Single Array
    2. 8.2  Binary Ufuncs
    3. 8.3  Broadcasting — Binary Operations with Arrays of Dissimilar Dimension
    4. 8.4  Matrix Operations
    5. 8.5  Set Operations
    6. 8.6  Other Logic Operations
    7. 8.7  Statistical Operations
    8. 8.8  Exercises
  12. Chapter 9. Array Transformation
    1. 9.1  Transposing
    2. 9.2  Reshaping
    3. 9.3  Flattening
    4. 9.4  Rotating
    5. 9.5  Combining & Splitting
    6. 9.6  Sorting
    7. 9.7  Exercises
  13. Chapter 10. String Arrays
    1. 10.1 Common String Processing Operations
    2. 10.2 Exercises
  14. Appendix A. Virtual Environments & Containers
    1. A.1  virtualenv
    2. A.2  Docker Containers
  15. Appendix B. Python Lists & array Vs NumPy Arrays
    1. B.1  Python Lists
    2. B.2  The array Array
    3. B.3  The Case for (or Against) NumPy Arrays
  16. Appendix C. NumPy Function & Property Reference
    1. C.1  numpy
    2. C.2  numpy.linalg
    3. C.3  numpy.fft
    4. C.4  numpy.random
    5. C.5  numpy.polynomial
    6. C.6  numpy.strings
  17. Appendix D. Solutions to Exercises
    1. D.1  Chapter 3
    2. D.2  Chapter 4
    3. D.3  Chapter 5
    4. D.4  Chapter 6
    5. D.5  Chapter 7
    6. D.6  Chapter 8
    7. D.7  Chapter 9
    8. D.8  Chapter 10

Guide

  1. Cover
NumPy logo (c) NumPy Developers.
Figure 1. NumPy logo (c) NumPy Developers.

Preface to the 2025 edition

NumPy 2 is the biggest major update to NumPy in almost a decade. And whilst many of the updates in the new major and minor releases to date are related to re-organisation and efficiency improvements, there are also a considerable number of removals, deprecations, and additions. With this book being an introductory level guide, the impacts were manageable, and revisions were made to bring the text up to date where relevant; but also to bring focus to important changes such as the new numpy.strings module - to which an additional chapter is dedicated.

Summary of revisions and updates:

  • Dropped features removed
  • Deprecated features marked or replaced
  • Examples tested on NumPy version 2.2.3, some code changes for suitability
  • Clean up of data types table with new, removed types accounted for
  • New chapter on string arrays
  • Revised and updated the reference appendix, with a new numpy.strings section
  • Some minor revisions to grammar and prose throughout.

For a fuller insight into the Numpy 2.x changes, the reader is encouraged to at least look over the Highlights sections of each 2.x release, starting with the NumPy 2.0 release notes.

About the Book

This book started out as a set of learning notes, which later turned into an (admittedly oversized) cheat-sheet of sorts. Converting those notes into a book format was a fairly natural step, as they were designed to introduce NumPy features in a structured way, with each chapter building on your knowledge incrementally.

You’ll soon discover that NumPy is about a few core concepts:

  • Arrays

  • Their types

  • The operations you perform on them.

The material is suitable for the NumPy beginner, although some basic programming knowledge — preferably Python — is assumed. It’s also expected that you have at least a rudimentary understanding of running shell commands from a terminal on your system of choice.

Many courses, especially in data science, have modules or bridging lessons for learning NumPy as a prerequisite. This book would certainly be suitable as an assigned text, or support material for this kind of introductory course. Chapters are concluded with a number of exercises (starting from chapter 3) to challenge the student. Solutions to the exercises are given in Appendix D. Solutions to Exercises.

The examples are concise (easy to reproduce by hand), designed to aid comprehension, and to capture the basics of a procedure or function that’s being introduced. We all have different learning styles, but with programming the goal is to write code, and code examples help express what a procedure does (programmers often say “the code is the ultimate documentation”). When you see the input, the code, and the output at once, then the purpose of a function is better revealed. “See also” sections will point the reader to functions that are related to the current topic, but not covered in detail.

We recommend reading the book from beginning to end for the first read. Even if you have some exposure to NumPy, you may still pick up some useful tidbits to fill the gaps in your knowledge.

As a note of caution, the code examples were written in an oversimplified way to assist learning, and should not be considered best practice in terms of code craft. Variables, for example, should always have meaningful names, and code generally adhere to a preferred style — or, as they call it in Python circles, “Pythonic”1. Some output has been formatted to improve readability, and may display a little differently to the format you see when running the example code yourself.

If you come across any issues with spelling, grammar, or code in the edition you’re reading, please send an email2. Errata will be added to a page on the companion website.

About DAT Linux

DAT Linux3 is a Linux distribution for data science. It brings together all your favourite open-source data science tools and apps into a ready-to-run desktop environment. It’s based on Ubuntu, so it’s easy to install and use. The custom DAT Linux Control Panel provides a centralised one-stop-shop for running and managing dozens of data science programs.


  1. Code Style. Real Python. https://docs.python-guide.org/writing/style/↩︎

  2. Errata email: info@datlinux.com↩︎

  3. DAT Linux: https://datlinux.com↩︎

Chapter 1. Introduction


“Commitment is an act, not a word.”

- Jean-Paul Sartre


Learning something new, especially in the applied sciences, can take a great deal of commitment. The goal of this introductory chapter is to help you understand what NumPy is, why it has become so important for data computation, and why taking time to learn and practice it can be a rewarding endeavour.

1.1  What is NumPy?

NumPy is a programming language library for scientific computing. It provides support for working with small or large multidimensional arrays & matrices, along with a host of mathematical functions to operate on them efficiently.

NumPy’s versatility and utility make it essential for many applications where numerical computing is required. Its widespread adoption and integration with other libraries means it has become an indispensable software tool across many scientific disciplines.

Although the Num is short for Numerical, NumPy is in fact pronounced: “Num” (as in number) — “Py” (as in python).

1.2  Why NumPy is Important

NumPy is in use across a wide range of scientific fields, some of these include:

Field Uses
Data Science & Analytics Data manipulation, cleaning, transformation, and analysis.
Machine Learning & Artificial Intelligence Implementing algorithms for model training and inference.
Scientific Computing Solving equations, performing Fourier analysis, signal processing.
Engineering Simulation and modelling.
Finance & Economics Analysing financial data, modelling economic systems, algorithms for forecasting and risk management.
Image & Signal Processing Image manipulation, filtering, and analysis.
Bio-informatics & Computational Biology Analysing genomic data, modelling biological systems, algorithms for sequence analysis.
Academic Research Frameworks for analysing research data.

1.3  NumPy as a Base or Integration Library

NumPy integrates with, or serves as a foundation library upon which many other scientific libraries are built. Here are a few examples:

Pandas

The central feature of Pandas is the tabular data-frame data structure. It adds functionality on top of NumPy arrays such as labelled indexing, time-series analysis, and data manipulation routines (querying, filtering, sub-setting).1

SciPy

SciPy relies on NumPy to build functionality such as modules for optimisation, interpolation, integration, linear algebra, signal processing, and more.2

scikit-learn

scikit-learn is a popular machine learning library. NumPy’s efficient array operations and memory management makes it well-suited for large-scale computations required in machine learning tasks such as regression and classification.3

Matplotlib and Seaborn

Matplotlib is a plotting library in Python that works with NumPy arrays to create static, interactive, and animated visualisations.4 Seaborn, another data visualisation library, also makes use of NumPy arrays for data input to plots.5

TensorFlow and PyTorch

NumPy serves as a bridge between deep learning frameworks like TensorFlow6 and PyTorch7 and the rest of the Python scientific computing ecosystem. These frameworks allow conversion between NumPy arrays and their own data objects.

PyArrow

PyArrow is the Python interface to the Apache Arrow columnar data format; a specialised data format for fast in-memory analytics. It integrates seamlessly with NumPy arrays.8

Statsmodels

Statsmodels is a library for statistical modelling and hypothesis testing in Python. It relies on NumPy arrays for data representation and manipulation, allowing users to perform various statistical analyses such as regression, time series analysis, and hypothesis testing, using familiar NumPy syntax.9

OpenCV

OpenCV is a computer vision library that utilises NumPy arrays for representing and processing image and video data, implementing processing tasks such as filtering, transformation, and feature extraction.10

1.4  Why Learn NumPy?

Learning NumPy — even if you rely more on higher level libraries in your daily work or projects — can still be highly beneficial for several reasons:

Understanding core concepts

NumPy introduces fundamental concepts in numerical computing such as array and matrix manipulation, vectorised operations, and broadcasting. Understanding these concepts can help you make better choices when solving data manipulation problems.

Transferable skills

Many concepts and techniques learned in NumPy are transferable to other libraries and languages. For example, understanding array operations in NumPy can make it easier to work with similar data structures in other languages like MATLAB or R.

Foundation for advanced topics

NumPy serves as a foundation for more advanced topics in numerical computing such as linear algebra, Fourier analysis, optimisation, and signal processing. Knowledge of these foundational concepts can be valuable in various fields.

Integration with other libraries

As we’ve seen, many other libraries and frameworks in the Python ecosystem build upon or interact with NumPy. Understanding NumPy can facilitate your use of these libraries.

Industry relevance

Proficiency in NumPy is often a requirement or desirable skill for roles in data science, machine learning, scientific computing, and related fields. Familiarity with it can enhance your employ-ability.

Community support and resources

NumPy has a large and active community, so there are ample resources available for learning and troubleshooting. You gain access to a wealth of tutorials, documentation, forums, and open-source projects to support learning and implementation.

A programmer’s perspective

Understanding NumPy provides a solid foundation for expanding into more advanced topics in data analysis and scientific computing, empowering data developers to tackle the complexity of data-driven applications.

Python itself does not have built-in support for arrays of more than basic utility (i.e. lists), which is where libraries like NumPy come in. The array object provides some extra capability around strict typing (this is discussed in more detail in Appendix B - Python Lists & array Vs NumPy Arrays) but they also fall seriously short of what NumPy can do.

Arguably, NumPy has become the de facto standard for array, matrix, and numerical computation in the scientific Python community.


  1. Pandas. pydata.org. https://pandas.pydata.org/↩︎

  2. SciPy. scipy.org. https://scipy.org/↩︎

  3. Scikit-Learn. scikit-learn.org. https://scikit-learn.org/stable/↩︎

  4. Matplotlib. matplotlib.org. https://matplotlib.org/↩︎

  5. Seaborn. pydata.org. https://seaborn.pydata.org/↩︎

  6. TensorFlow. tensorflow.org. https://www.tensorflow.org/↩︎

  7. PyTorch. pytorch.org. https://pytorch.org/↩︎

  8. PyArrow. scipy.org. https://arrow.apache.org/docs/python/↩︎

  9. Statsmodels. statsmodels.org. https://www.statsmodels.org/stable/index.html↩︎

  10. OpenCV. opencv.org. https://opencv.org/↩︎

Chapter 2. Getting Started


This chapter will help you get started with using NumPy if you’re a relative beginner. You can skip any of these sections depending on how advanced you are with your system set-up.

2.1  Python 3

Before you can use NumPy you need to have the Python (version 3) interpreter installed on your system. To check if you have Python installed, run the following command from a console:

    python --version

You should see something like the following displayed:

Python 3.10.12

If instead you see an error, then Python may not be installed. Sometimes the python command is not mapped to python3, so try again with the latter command instead.

Many Linux distributions have Python pre-installed via their package manager. If yours is a point or two behind the latest version, this is perfectly fine — a good distribution will ensure security patches are updated even if the latest versions of packages are not yet in a distribution’s “stable” branch.

Otherwise, the Python downloads1 web page is where you can find Python binaries for most operating systems. Choose the latest version that is available. Once installed, try the python --version command again.

2.2  Using pip

The pip command is used to install packages from the Python Package Index2. To ensure you have pip installed, run the following from the command line:

    pip --version

You should see output similar to this:

pip 22.0.2 /usr/lib/python3/dist-packages/pip (python 3.10)

If you get an error then pip may need to be installed manually. Please follow the instructions via the pip documentation web page3 and retry the above command.

2.3  Installing NumPy

NumPy installation is very simple, just run the following from the console and wait for the package manager to complete the install process:

    pip install numpy

The package installation should not take very long, and it will automatically install any other libraries that NumPy depends on (that aren’t already installed in the Python environment). You can easily get a snapshot of all Python packages that are currently installed in this Python environment using the following command:

    pip list
Package           Version
----------------- --------------
asttokens         2.4.1
contourpy         1.2.0
cycler            0.12.1
decorator         5.1.1
fonttools         4.50.0
ipython           8.22.2
jedi              0.19.1
kiwisolver        1.4.5
matplotlib        3.8.3
matplotlib-inline 0.1.6
numpy             2.2.3
packaging         24.0
pandas            2.2.1
parso             0.8.3
...

The package numpy will be listed here.

2.4  Execution Environment

Once you have Python and NumPy ready, you need a way to write and execute your code. This section will highlight three methods of achieving this.

Running Python scripts

Python is a scripting language, which means all you need is a script — a text file containing Python code — and a Python interpreter to execute the code. It will then output the results of your program, or any errors that are preventing the code from being executed correctly. There can be more than one script file for a program, allowing you to separate code into modules4. But for simplicity, begin with a single script file, name it anything you like, for example numpy_starter.py, and write the following line of code:

    print("Hello world!")

Save your file, and open a terminal console and change into the directory of the script. To execute, simply type in the following and press [Enter]:

    python numpy_starter.py

You should see 'Hello world!' echoed to the console.

There are many online resources for learning how to run Python scripts, including: How to Run Your Python Scripts and Code5.

Interacting with IPython

IPython is a special shell environment for interacting with Python in a REPL* fashion. It’s simple to install using pip:

    pip install ipython

Once installed, run the command ipython from your console and you’ll enter the IPython interactive environment. Here you can write Python code as you would in a script and execute that code with the [Enter] key. Here is an example interaction:

$ipython
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more 
information.
IPython 8.22.2 — An enhanced Interactive Python. Type '?' 
for help.

In [1]: print("Hello world!")
Hello world!

In [2]: 

As you can see, once you run the code (it can be over more than one line) it presents the results, then prompts you for more input — and, handily, it remembers variables or functions declared during a session. To learn more about IPython, go to the IPython website6.

* REPL stands for Run-Eval-Print-Loop. See the REPL Wikipedia entry7.

Jupyter Lab / Notebooks

Jupyter originated from IPython, expanding its capabilities within interactive scientific notebooks. The application is easily installed using pip:

    pip install jupyterlab

Jupyter runs as a web application on your local machine, and can be started with the following command:

    jupyter lab

To run system commands from within an IPython shell, you need to begin the command with an !:

    !jupyter lab

Once the web application launches, you can either open an existing notebook (they have the .ipynb file suffix), or create a new one — choosing the “Python kernel” option for your new notebook. You run code from within a notebook inside code “cells”, like so:


Jupyter markdown and code cells
Figure 2.1. Jupyter markdown and code cells

To learn more about installing and using Jupyter, visit the official Quick Start Guide8.

Using either IPython or Jupyter notebooks is a more engaging and convenient way to interact with and run Python code, but please experiment with different methods and find one that best suits your style. Refer to Appendix A for instructions on setting up virtual (sand-boxed) environments, which are a safer way to execute Python code and run applications.

2.5  NumPy First Steps

Import convention

Importing the NumPy library as np is considered a standard practice.

    import numpy as np 

NumPy version

The example code in this book was prepared and tested using NumPy version:

    np.__version__
2.2.3

Get information on an object

It’s easy to get in-line help information (often with example usage) on numpy objects such as classes, arrays, and functions:

    np.info(np.add)
add(x1, x2, /, out=None, *, where=True, 
    casting='same_kind', order='K', dtype=None, 
    subok=True[, signature, extobj])

Add arguments element-wise.

Parameters
----------
x1, x2 : array_like
    The arrays to be added.
    If `x1.shape != x2.shape`, they must be broadcast-able
    to a common shape (which becomes the shape of the 
    output).
out : ndarray, None, or tuple of ndarray and None, 
      optional
    A location into which the result is stored. If 
    provided, it  must have a shape that the inputs 
    broadcast to. If not  provided or None, a freshly-
    allocated array is returned. A tuple (possible only as
    a keyword argument) must have length equal to the 
    number of outputs.

Returns
-------
add : ndarray or scalar
    The sum of `x1` and `x2`, element-wise.

Examples
--------
> np.add(1.0, 4.0)
5.0

(truncated for brevity)

numpy.add is one of many array computation functions that NumPy offers. This, and many other array operations will be highlighted throughout the book.

Search the NumPy docs

You can use Python’s built-in help() function to search for documentation related to specific modules, classes, or functions. For example:

    help('numpy.random')
    help('numpy.random.normal')

If you’re looking for all the attributes and methods available in a module or object, dir() is helpful:

    import numpy as np
    dir(np)

Setting print options

You can programmatically control the way arrays and numbers are formatted for display. Here’s an example output with default display (no explicit setting):

    a = np.random.normal(0, 10, (3,4))
    print(a)
[[  6.80427875 -11.80632522  -4.17167948 -10.96244192]
 [-18.37081627  11.81332856  16.68953163  -9.13215022]
 [ -4.69665534   9.74299173   8.83553348   6.49425207]]

Don’t worry about np.random.normal for now, it’s a way of creating random arrays. This topic will be covered in detail in Chapter 4. Array Creation.

Here’s the same array output with a (temporary) custom display setting:

    with np.printoptions(precision=3):
        print(a)
[[  6.804 -11.806  -4.172 -10.962]
 [-18.371  11.813  16.69   -9.132]
 [ -4.697   9.743   8.836   6.494]]

If you expect to do this often, it’s a good idea to create a reusable function:

def cust_print(arr):  
    with np.printoptions(precision=3):  
        print(arr)  

Then you call it when needed:

cust_print(a)

  1. Python Releases. python.org. https://www.python.org/downloads↩︎

  2. PyPI. pypi.org. https://pypi.org↩︎

  3. Pip Installation. pip developers. https://pip.pypa.io/en/stable/installation/↩︎

  4. Modules. python.org. https://docs.python.org/3/tutorial/modules.html↩︎

  5. Run Python Scripts. Real Python. https://realpython.com/run-python-scripts↩︎

  6. IPython — Interactive Computing. IPython. https://ipython.org↩︎

  7. REPL. Wikipedia. https://en.wikipedia.org/wiki/Read-eval-print_loop↩︎

  8. Jupyter Quick Start. Antonino Ingargiola; contributors. https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest↩︎

Chapter 3. Array Basics


The multidimensional array is the central data structure in NumPy, consisting of a fixed-size grid of elements of the same type. The elements of each dimension can be indexed by a tuple of positive integers starting at zero, Boolean indexes, or other arrays. Dimensions can be referenced by their axis, also indexed from zero. The NumPy array class is the ndarray — of which all NumPy arrays are an instance.

3.1  Dimensions & Axes

The number of dimensions of an array indicates its “shape”. The shape of a NumPy array (ndarray.shape) is made up of a tuple of integers — each value indicating the size along that dimension, in order of axis.

1-D array 2-D array 3-D array N-D array*
Vector Matrix Cube Tensor
shape:(4,) shape:(2,4) shape:(3,4,2) shape:(a,b,c,d,..)

* Whilst it can be difficult to visualise dimensions of 4-D or greater, multidimensional arrays (cubes1 or tensors2) are an essential data representation structure used in business intelligence, machine learning, and deep learning applications.

Elements in a NumPy array are accessed using square brackets. Don’t forget that indexing in NumPy (as with Python lists) starts at zero, for example:

    ele_1 = arr[0]

There are many powerful ways to access data from arrays in NumpPy, examples of which are covered in more detail in Chapter 7. Array Selection & Modification.

Elements are identified by order of axis, and position along that axis. Take the following 2-D array, a = np.array([[1, 2, 3], [4, 5, 6]]):

    [[1, 2, 3], 
     [4, 5, 6]]

The position of a specific element is represented as a tuple (x,y), best illustrated as follows:

    [[(0,0), (0,1), (0,2)], 
     [(1,0), (1,1), (1,2)]]

Therefore the element with value 2 is referenced using a[(0,1)] (1st row, 2nd column). Note the parentheses are optional, so arr[0,1] also works.

The indexing with tuples approach might seem peculiar compared to what you may have encountered with other languages. For example the equivalent of NumPy’s a[(0,1)] would be a[0][1] in Java. It helps to think of the NumPy approach as a way of querying the data, rather than using reference points. This will become more apparent as you learn about indexing and sub-setting with NumPy in the following chapters.

3.2  Data Types

NumPy comes with a larger set of available data types than standard Python. It defines a host of array-scalar types3, as well as aliases for some. There’s a built-in numpy.dtype object associated with each of the array-scalar types. This table includes most of the NumPy data types that you’ll encounter or need.

Name (numpy.?) Python type * Alias ^ Short code
Signed byte int8 b
integer short int16 h
intc int32 i
int_, long int64 l
longlong q
Unsigned ubyte uint8 B
integer ushort uint16 H
uintc uint32 I
uint, ulong uint64 L
ulonglong Q
Float half float16 e
single float32 f
double * float float64 d
longdouble float128 g
Complex csingle complex64 F
cdouble * complex complex128 D
clongdouble complex256 G
String bytes_ S
str_, string_ * string U, <U? ~
Other bool, bool_ * bool ?
object_ O
datetime64 * datetime.datetime M
timedelta64 * datetime.timedelta m
void V

^ Aliases displayed based on a Linux x86_64 system. An alias is referenced using np.<alias> in the same way as numpy.<name>; or declared as either dtype="<name>" or dtype="<alias>" during array creation.

* These NumPy types are directly compatible with (drop-in replacements for) the given Python type. For example dtype="double", or its alias dtype="float64", may instead be declared as dtype=float.

~ When you specify a Unicode string type such as dtype="<U10", you’re setting a fixed size of 10 characters per string. There isn’t a strict upper limit on this number, but the size of the array (in terms of system resources) will impose practical limits.

The plethora of choices with NumPy data types may leave you wondering which to use, and when. A good “rule of thumb” is to choose the largest type that’s suitable for your purposes, but no more. A data field for a person’s “age” for example should require no more than a numpy.byte (maximum value of 127). Or if you know that a numeric field will always be a whole number, and can never be negative, then using an unsigned integer may be preferred. You should also consider the potential ranges you might need to store, especially if the data you ingest varies over time - you want to be sure to future-proof your code without going overboard on the data type.


Selecting the right data type can improve array processing performance and reduce the amount of computer memory required, but it can also be difficult to find the correct balance between size and performance. Which is when testing and comparing results using realistic data and hardware can be useful to do. But keep in mind the sentiment that

“premature optimization is the root of all evil..” [Donald Knuth]

and understand that often it may not matter too much. But if and when it does, then you are at least aware of the possible influences.

Example 3-1

dtype creation with a NumPy object, and its string (alternative) variation:

    np.dtype(np.int_)
    np.dtype('long')
    np.dtype('l')

Example 3-2

dtype creation using a sized alias:

    np.dtype(np.int64)
    np.dtype('int64')
    np.dtype('i8')

Example 3-3

dtype creation using Python built-in aliases:

    np.dtype(float)
    np.dtype('float')

The following chapter on Array Creation will demonstrate how explicitly referencing or creating dtype objects can be done during the array creation process.

3.3  Compound Types

These are special user-defined data types used with structured arrays. See 4.9 Structured Arrays in the next chapter for more on this topic. The following are a few examples of how they’re created.

Example 3-4

dtype creation, each field is assigned a name and type (like a key/value pair):

    np.dtype([('name1', np.float64), ('name2', np.int32)])  

Example 3-5

dtype creation, specify names and formats separately as lists:

    np.dtype({'names': ['name1', 'name2'], 
              'formats': ['f', 'i']})

Example 3-6

dtype creation, specify formats as strings alone (where names not required):

    np.dtype('f, i')

3.4  Mathematical Constants

NumPy includes some useful predefined constants, including:

  • numpy.nan
  • numpy.inf
  • numpy.e
  • numpy.pi
  • numpy.newaxis
  • numpy.euler-gamma

Example 3-7

Here’s the output for numpy.e:

    np.e
2.718281828459045

Example 3-8

And for numpy.pi:

    np.pi
3.141592653589793

See also:

Function Description
nan IEEE 754 floating point representation of Not a Number.
inf IEEE 754 floating point representation of (positive) infinity.
newaxis A convenient alias for None, useful for indexing arrays.
euler_gamma gamma = 0.5772156649015328606065120900824024310421….

3.5  Exercises

Exercise 3-1

Create a new 64 byte dtype of float type, using a sized alias, and assign it to the variable t. Print out the variable.

Exercise 3-2

Repeat the above dtype creation, but instead using an equivalent native Python type.

Exercise 3-3

Write a Python expression that calculates the area of a circle with radius of 30mm. (Result will be in units of mm2)

Exercise 3-4

Convert the result of the area of the circle to units of cm2. Print the result.


  1. OLAP Cube. Wikipedia. https://en.wikipedia.org/wiki/OLAP_cube↩︎

  2. Tensors. W3 Schools. https://www.w3schools.com/ai/ai_tensors.asp↩︎

  3. Scalars. numpy.org. https://numpy.org/doc/stable/reference/arrays.scalars.html↩︎

Chapter 4. Array Creation


NumPy arrays (of type numpy.ndarray) can be easily constructed using the numpy function array, but there are other convenient ways of creating pre-populated arrays. The following sections outline a number of these approaches. Always consult a function’s documentation to gain a deeper understanding of its inputs, outputs, and limitations.

4.1  Create Arrays using Python Lists

The simplest way to create a NumPy array is to pass a Python list to the array function.

Example  4-1

An array’s data type is determined by the provided list’s values (integers default to numpy.int64):

    a = np.array([1, 2, 3])
    print(a)
[1 2 3]

Example 4-2

Same as the previous example, but with a preassigned list:

    d = [1, 2, 3]
    a = np.array(d)
    print(a)
[1 2 3]

Example 4-3

Array values are implicitly promoted to the largest type (in this example cast to numpy.float32):

    # 2-D array..
    a = np.array([(1.0, 2), (3, 4)])
    print(a)
[[1. 2.]
 [3. 4.]]

Example 4-4

Explicitly force the type (values are promoted to numpy.float64):

    a = np.array([(1, 2, 3)], dtype='f')
    print(a)
[[1. 2. 3.]]

Example 4-5

Explicitly force the type (demote all values to numpy.int32):

    a = np.array([(1.1, 2, 3)], dtype='int32')
    print(a)
[[1 2 3]]

Example 4-6

Assign a dtype and pass it to array():

    # Declaring a type once and re-using it is more 
    # efficient if done often..
    t = np.dtype('float')
    a = np.array([(1, 2), (3, 4)], dtype=t)
    print(a)
[[1. 2.]
 [3. 4.]]

4.2  Create Empty Arrays

If you want an array with a certain shape but need to defer filling it with values to a later time, then you can create an “empty” array.

Example 4-7

Fill an array (of this shape) with meaningless values, requiring correct initialisation later:

    a = np.empty([2, 3])
    print(a)
[[4.83367796e-310 0.00000000e+000 1.33554846e+185]
 [4.98131536e+151 4.63456076e+228 4.94065646e-322]]

You might expect an “empty” array to be filled with zeros or None (Python’s version of Null) values. np.empty should therefore be used with caution, as whilst the values it generates are meaningless, they may introduce invalid computation results in your code until populated with meaningful values.

4.3  Create Arrays Filled with Preferred Values

You can fill a newly created array with specific preferred values based on your needs.

Example 4-8

Just some zeros please:

    a = np.zeros(4)
    print(a)
[0. 0. 0. 0.]

Example 4-9

Nothing but ones:

    a = np.ones((2,2))
    print(a)
[[1. 1.]
 [1. 1.]]

Example 4-10

Fill / pad a new larger array with a smaller list:

    # First argument is shape, second contains the values 
    # to fill with..
    a = np.full((2,2), (3,4))
    print(a)
[[3 4]
 [3 4]]

4.4  Create Arrays Filled with Incremental Sequences

Creating new arrays filled with sequences is surprisingly convenient and powerful.

Example 4-11

Get a spread of values from within the range {0 < 12} in increments of 2:

    a = np.arange(0., 12, 2)
    print(a)
[ 0.  2.  4.  6.  8. 10.]

Example 4-12

Get four values equally spaced within the range {0 <= 6} (inclusive):

    a = np.linspace(0, 6, 4)
    print(a)
[0. 2. 4. 6.]

Example 4-13

Get four values spaced evenly on a base 10 log scale between {0 <= 2}:

    # Each random value ^10..
    a = np.logspace(0, 2, 4)
    print(a)
[1.  4.64158883  21.5443469  100.]

4.5  Create Arrays Filled with Random Values using numpy.random

The results you get from executing the following code will differ from the output shown, unless you run the same seed call before each execution.

Example 4-14

Use a seed to ensure the same values are reproduced, if required:

    np.random.seed(42)

42 was chosen arbitrarily, but it can be any other number of your choosing.

Example 4-15

Create a 2x1 array containing random float values in the range {0. <= 1}:

    a = np.random.random((2,1))
    print(a)
[[0.37454012]
 [0.95071431]]

Example 4-16

3x4 array of random int values in the range {–2 <= 10}:

    a = np.random.randint(-2, 10, (3,4))
    print(a)
[[ 0  3  5 -2]
 [ 0  3  3  2]
 [ 9  4  6  5]]

Example 4-17

Modify a sequence in-place by shuffling its contents (only shuffles the array along the first axis):

    np.random.shuffle(a)
    print(a)
[[ 9  4  6  5]
 [ 0  3  3  2]
 [ 0  3  5 -2]]

Example 4-18

Randomly permute a sequence:

    a = [1, 2, 3, 4, 5, 6]
    np.random.permutation(a)
    print(a)
array([5, 6, 2, 4, 3, 1])

Example 4-19

3x4 array of random, normally distributed values in the range {0. <= 10.}:

    a = np.random.normal(0, 10, (3,4))
    print(a)
[[ 3.9257975  -9.29184667  0.79831812 -1.59516502]
 [ 0.22221827 -4.27792914 -5.3181741  -1.17475502]
 [ 2.22078902 -7.67976502  1.42464602 -0.34652184]]

Example 4-20

3x4 array of random values in the standard normal distribution range:

    a = np.random.standard_normal((3,4))
    print(a)
[[ 1.13433927 -0.10474555 -0.52512285  1.91277127]
 [-2.02671962  1.11942361  0.77919263 -1.10109776]
 [ 1.13022819  0.37311891 -0.38647295 -1.15877024]]

See also:

Function * Description
numpy.random.binomial Draw samples from a binomial distribution
numpy.random.chisquare Draw samples from a chi-square distribution
numpy.random.gamma Draw samples from a Gamma distribution
numpy.random.uniform Draw samples from a uniform distribution

* “See also” short descriptions and links, courtesy of numpy.org.

4.6  Array-like objects

Array-like objects are data structures that may be used as inputs to a wide variety of NumPy array creation functions. This book will only touch on the most common ones. Consult a function’s documentation to understand its acceptable input types.

Common array-like objects include:

  • Other NumPy arrays ( of type numpy.ndarray )
  • Python scalars or lists ( 1. or [1, 2, 3] )
  • Python tuples ( (1, 2, 3) )
  • Parse-able sequences such as strings ( '1;2;3' )
  • Buffers and raw memory

4.7  Create Arrays from Other Arrays or Array-like Objects

These functions create copies of an array, therefore changes to the copy will not effect the original array. We’ll be using the array a, a 3 x 4 matrix, based on the result of the last operation:

    print(a)
[[ 1.13433927 -0.10474555 -0.52512285  1.91277127]
 [-2.02671962  1.11942361  0.77919263 -1.10109776]
 [ 1.13022819  0.37311891 -0.38647295 -1.15877024]]

Example 4-21

Create array from an array-like object (creates a copy if a is an ndarray):

    b = np.asarray(a)
    print(b)
[[ 1.13433927 -0.10474555 -0.52512285  1.91277127]
 [-2.02671962  1.11942361  0.77919263 -1.10109776]
 [ 1.13022819  0.37311891 -0.38647295 -1.15877024]]

Example 4-22

Array of ones with the same dimensions as the array-like input:

    b = np.ones_like(a)
    print(b)
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

Example 4-23

Array of zeros with the same dimensions as the array-like input:

    b = np.zeros_like(a)
    print(b)
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

Example 4-24

Array of preferred values with the same dimensions as the array-like input:

    b = np.full_like(a, 2.)
    print(b)
[[2. 2. 2. 2.]
 [2. 2. 2. 2.]
 [2. 2. 2. 2.]]

Example 4-25

Empty (meaningless values) array with the same dimensions as array-like input:

    b = np.empty_like(a)
    print(b)
[[2. 2. 2. 2.]
 [2. 2. 2. 2.]
 [2. 2. 2. 2.]]

Example 4-26

Cast to different type (creates a copy of the array):

    b = a.astype(np.int_)
    print(b)
[[ 1  0  0  1]
 [-2  1  0 -1]
 [ 1  0  0 -1]]

Example 4-27

Explicitly copy an array:

    b = a.copy() # OR: b = np.copy(a) 
    print(b)
[[ 1.13433927 -0.10474555 -0.52512285  1.91277127]
 [-2.02671962  1.11942361  0.77919263 -1.10109776]
 [ 1.13022819  0.37311891 -0.38647295 -1.15877024]]

Example 4-28

copy also works with sub-setting:

    b = a[1:].copy()
    print(b)
[[-2.02671962  1.11942361  0.77919263 -1.10109776]
 [ 1.13022819  0.37311891 -0.38647295 -1.15877024]]

Sub-setting is a feature of NumPy that allows you to extract precise subsets of an array. Chapter 7. Array Selection & Modification covers this topic in detail, along with other powerful ways to access array data.

Example 4-29

Create an array from an array-like string:

    s = '1,2,3.2'
    a = np.fromstring(s, sep=',')
    print(a)
[1.  2.  3.2]

4.8  Creating Common Matrices (2-D arrays)

NumPy offers a range of functions for creating matrices. Matrix operations will be covered in Chapter 8. Array Computation.

Example 4-30

Identity matrix of dimension n x n:

    a = np.identity(n=3, dtype=int)
    print(a)
[[1 0 0]
 [0 1 0]
 [0 0 1]]

Example 4-31

Matrix of suitable dimension given diagonal:

    a = np.diag([1, 2, 3])
    print(a)
[[1 0 0]
 [0 2 0]
 [0 0 3]]

Example 4-32

Identity matrix of dimension NxN shifted by a diagonal offset (k):

    a = np.eye(N=3, k=1)
    print(a)
[[0. 1. 0.]
 [0. 0. 1.]
 [0. 0. 0.]]

Example 4-33

Generate a Vandermonde matrix:

    a = np.vander([1, 2, 3])
    print(a)
[[1 1 1]
 [4 2 1]
 [9 3 1]]

Example 4-34

Lower triangle of an array, with zeros above:

    b = np.tril(a)
    print(b)
[[1 0 0]
 [4 2 0]
 [9 3 1]]

See also:

Function Description
numpy.asmatrix Interpret the array-like input as a matrix
numpy.diagflat Create a 2-D array with the flattened input as a diagonal

4.9  Structured Arrays

Structured data types were mentioned in the previous chapter, but here we show how an array can be created using them.

Example 4-35

Structured arrays are useful for when you need to work with heterogeneous data:

    # Compound type..
    t = np.dtype([('name', 'U20'), ('age', int)]) 
    a = np.array([('Alice', 25), ('Bob', 30)], dtype=t)
    print(a)
    print(a[0])
    print(a['name'][1])
[('Alice', 25) ('Bob', 30)]

('Alice', 25)

Bob

Structured arrays are a way of implementing mixed types in NumPy. They may affect the performance and utility of your array, so use them with caution.

4.10  Record arrays

Record arrays are a special kind of structured array that permits field access using the form x.y. Unlike regular NumPy arrays (of type numpy.ndtype) record arrays are instances of numpy.recarray. Record arrays can be created explicitly, or converted from regular arrays.

Example 4-36

Create a regular array with named compound type, and convert it to a record array:

    a = np.array([(1, 2.), (3, 4.)], 
                 dtype=[('x', '<i2'), ('y', '<f2')])
    b = a.view(np.recarray)
    b.y
array([2., 4.], dtype=float16)

Confirm this is a recarray object:

    type(b)
<class 'numpy.recarray'>

Example 4-37

Create an uninitialised record array of shape (2,2) and initialise the array:

    a = np.recarray((2,2), 
                    dtype=[('x', '<i2'), ('y', '<f2')])
    a.x = [1, 2.]
    a.y = [3, 4.]
    a
rec.array([[(1, 3.), (2, 4.)],
           [(1, 3.), (2, 4.)]],
           dtype=[('x', '<i2'), 
                  ('y', '<f2')])

Notice that the data type is an instance of record:

    a.dtype
dtype((numpy.record, [('x', '<i2'), 
                      ('y', '<f2')]))

4.11  Other Ways to Create Arrays

Output is not shown for the following examples. The reader is encouraged to experiment with these functions in their Python environment of choice.

Example 4-38

Create an array from a text or binary file:

    # File f typically created with ndarray.tofile()..
    a = np.fromfile(f)

Example 4-39

Load an array from a text (commonly CSV) file:

    a = np.loadtxt(f, delimiter=',')

Example 4-40

Load an array from a Python pickle object:

    a = np.load(p)

Example 4-41

Load text into an array with tidying capabilities:

    a = np.genfromtxt(a, ...)

See also:

Function Description
numpy.asanyarray Convert input to an ndarray, except pass ndarray through
numpy.bmat Build matrix object from string, nested sequence, or array
numpy.ascontiguousarray Return a contiguous array (ndim >= 1) in memory
numpy.choose Create an array, by ele index, from a choice of arrays
numpy.fromfunction Construct an array by executing a function over each coordinate
numpy.fromiter Create a new 1-dimensional array from an iterable object
numpy.frombuffer Interpret a buffer as a 1-dimensional array
numpy.fromregex Construct an array from a text file, using regular expression parsing
numpy.geomspace Get numbers spaced evenly on a log scale (geometric progression)
numpy.mat Interpret the input as a matrix (no copy)
numpy.meshgrid Return a list of coordinate matrices from coordinate vectors
numpy.tri An array with ones at and below the given diagonal and zeros elsewhere
numpy.triu Upper triangle of an array (zeros below)

Refer to Appendix C. NumPy Function & Property Reference (or the online documentation1) for a full list of array creation functions available in the numpy module.

4.13  Exercises

The following table shows data relating to three students’ test scores (0 — 100%) over four different tests.

Student No. Test #1 Test #2 Test #3 Test #4
1 63.5 56. 68 73.5
2 53 77.5 61 83
3 59 79 67.5 70

Exercise 4-1

Create a 2-D Python list of the test scores, with the student scores as “rows” in test order. Assign this list to the variable student_scores_list. Print out this list.

Exercise 4-2

Using the list from Exercise 4-1, create a NumPy array assigned to student_scores_arr, and explicitly assign it an appropriate floating point dtype. Print out the array.

Exercise 4-3

Create a copy of student_scores_arr whilst assigning it to a new variable. Change the type of the copied array to a suitable integer. What do the scores look like now? What effect did the conversion have on the values?

Exercise 4-4

Create a new array filled with ones, of the same dimensions as student_scores_arr. Print the array. What is the dtype of this array?

Exercise 4-5

Create an identity matrix of 4x4 size, and print the result.

Exercise 4-6

Design a suitable named compound type for student_scores_arr (using sensible names without spaces), where the student id is an integer, and the scores are a floating point. Recreate the scores array to use this compound type. Print out the dtype for the array. (Hint: use tuples as rows.)

Exercise 4-7

Convert the structured array created in Exercise 4-6 to a recarray array. Print out the 2nd field (column) of the record array by name.


  1. NumPy Reference. numpy.org. https://numpy.org/doc/stable/reference↩︎

Chapter 5. Array Inspection


Inspecting a NumPy array involves examining its properties and attributes to gain a better understanding of its characteristics and contents. The following examples rely on the array defined here:

    a = np.array([(1., 2.), (3., 4.)]) 
    print(a)
[[1. 2.]
 [3. 4.]]

5.1  Shape & Size Information

Example 5-1

Get the shape of the array — a tuple indicating the length of each dimension:

    a.shape
(2, 2)

In “regular” Python scripting you would need to use print() to output an expression. However, in an IPython (including Jupyter) environment there’s no need to use print() explicitly (although you can) to see output — expressions are automatically evaluated and output to the console or adjacent to the Jupyter cell.

Example 5-2

The number of dimensions:

    a.ndim
2

Example 5-3

The total number of elements in the array:

    a.size
4

Example 5-4

Length of an element, in bytes:

    # This is directly associated with the array's type..
    a.itemsize
8

Example 5-5

Information about the memory layout of the array:

    a.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False

5.2  Truth Evaluation

An array value will evaluate to True if it represents anything other than zero.

Example 5-6

Test if all elements in an array evaluate to True:

    np.all(a)
True

Example 5-7

Test if any (at least a single) element evaluates to True:

    np.any(a)
True

5.3  Type Properties

Example 5-8

The array’s data type (dtype):

    a.dtype  # OR: np.dtype(a)
dtype('float64')

Example 5-9

Name of the array’s data type:

    a.dtype.name
'float64'

Example 5-10

Character code of the data type:

    a.dtype.char
'd'

Example 5-11

Get the unique number for this data type:

    a.dtype.num
12

5.4  String Representation

Example 5-12

Get a printable string of an array’s contents:

    np.array2string(a)
'[[1. 2.]\n [3. 4.]]'

Example 5-13

Get a string of an array plus info about its type:

    a = np.array([(1, 2), (3, 4)], np.int32) 
    np.array_repr(a)
'array([[1, 2],\n [3, 4]], dtype=int32)'

See also:

Function Description
numpy.dtype.byteorder A character indicating the byte-order of a dtype object
numpy.dtype.fields Dictionary of names defined for this data type, or None
numpy.dtype.flags Bit-flags describing how this data type is to be interpreted
numpy.dtype.isbuiltin Integer indicating how this dtype relates to the built-in dtypes
numpy.dtype.isnative Boolean if the byte order of dtype is native to the platform
numpy.dtype.kind A character code (one of ‘biufcmMOSUV’) identifying the kind of data
numpy.ndarray.strides Tuple of bytes to step in each dimension when traversing an array
numpy.ndarray.nbytes Total bytes consumed by the elements of the array
numpy.nonzero Return the indices of the elements that are non-zero

5.5  Exercises

These exercises refer to the following 2-D array:

    a = np.array([(1, 2, 3), (4, 5, 6)]) 

Exercise 5-1

What is the shape of the above NumPy array? Use the array’s shape property to confirm your conclusion.

Exercise 5-2

What is the size of each element in this array, in bytes?

Exercise 5-3

Use the appropriate inspection property to find the total bytes consumed by the array. How does this compare to the multiple of the previous exercise’s result by the total count of elements?

Exercise 5-4

What is the string name of the data type of this array?

Exercise 5-5

If a third row containing the elements [7 8 9] was added to the array, what would be the number of dimensions?

Chapter 6. Input & Output


Input and output (I/O) operations in NumPy primarily involve fetching data from external sources into NumPy arrays, and saving NumPy arrays to external files. We’ve already seen input with numpy.fromfile(), numpy.loadtxt(), and numpy.load(), but persisting to, and retrieving arrays from, various file formats is also possible and simple to do.

6.1  Persisting & Loading a Single Array

Example 6-1

Persist a single array to a file in binary format:

    a = np.array([(1., 2.), (3., 4.)])
    np.save('file.npy', a)

Example 6-2

Read the file back into an array variable:

    b = np.load('file.npy')
    print(b)
[[1. 2.]
 [3. 4.]]

6.2  Persisting & Loading Multiple Arrays

You can just as easily persist (and therefore retrieve) multiple arrays.

Example 6-3

Persist multiple arrays to an archive file:

    # 'a', 'b' are keys for retrieval (can be any valid 
    # python variable name)..
    np.savez('file.npz', a=a, b=b)

Example 6-4

Load the archive back into the array variables:

    c = np.load('file.npz')
    a = c['a']
    b = c['b']
    print(a)
    print(b)
[[1. 2.]
 [3. 4.]]
    
[[1. 2.]
 [3. 4.]]

An alternative to savez is savez_compressed, which is used exactly the same way except with compression to reduce the size of the file. The data is uncompressed on load in the same way as before using np.load(), and nothing special needs to be done. Be aware that extra compress or uncompress steps can affect program performance with larger files.

6.3  Write Data to a CSV (text) File

CSV (comma separated values) is a common format for storing tabular data1 in text files.

Example 6-5

Here we see how simple it is to save an array to CSV:

    np.savetxt('file.csv', a, delimiter=',')

The contents of the CSV file will look like this:

1.,2.
3.,4.

A CSV file very often comes with a header row that represent a name for each “column”, for example:

length,width
1.,2.
3.,4.

The skiprows argument can be used to ignore the header when reading the file into an array:

    a = np.loadtxt(f, skiprows=1)
    print(a)
1.,2.
3.,4.

Instead of np.load(), read a CSV file using loadtxt() or numpy.genfromtxt() to deal with missing values. The reader is encouraged to experiment with loading CSV files, as they are one of the most common data formats for dataset storage and sharing.

6.4  A Note About File Paths

In Python, file paths can be specified as relative — to the ‘current’ directory, e.g.:

dir/file.npy

or absolute — from the ‘root’ directory, e.g.:

/home/dir/file.npy

Relative paths depend on where the application was launched (and you may not know this reliably). So if in doubt prefer absolute paths.

If you are developing a data solution that needs to run on multiple platforms, you should use the os.path.sep constant to ensure file and directory paths are system-independent:

    import os
    path = 'dir' + os.path.sep + 'file.csv'
    np.savetxt(path, a, delimiter=',')

6.5  Exercises

Exercise 6-1

Create a file called student_scores.csv and save it to your computer, adding to it the following contents:

student_no,test_1,test_2,test_3,test_3
1,63.5,56.,68,73.5   
2,53,77.5,61,83     
3,59,79,67.5,70     

Load the file into a NumPy array, excluding the header, and print the array.

Exercise 6-2

Load the array instead using the compound type:

    t = np.dtype([('student_no', 'int'), 
                  ('test_1', float), 
                  ('test_2', float), 
                  ('test_3', float)])

Print the new array.

Hint: Try using genfromtext with the names=True attribute.

Exercise 6-3

Convert the array you created in Exercise 6-2 to a recarray, and print out a list of the student numbers using the array.field notation.


  1. CSV. Wikipedia. https://en.wikipedia.org/wiki/Comma-separated_values↩︎

Chapter 7. Array Selection & Modification


Array selection in NumPy relates to the activity of locating and extracting specific elements, or sub-sets, from an array. NumPy has flexible options for selecting array data based on the principle of indexing (using bracket notation []) that can hold: scalars; tuples; slices; or Boolean expressions, used to identify and locate elements as sub-sets of interest.

Modifying values goes hand in hand with sub-setting, however slicing creates a view that shares memory with the original array where modifications to the sub-set will change the original array. On the other hand simple indexing returns a scalar value, whilst fancy- or boolean- expression indexing creates a copy of the data — and changes to the new array are therefore not volatile to the original array. To explicitly copy a sub-set, use the ndarray.copy() function.

7.1  Common Indexing & Slicing: 1-D Arrays

Selecting arrays

Indexing on 1-D arrays (vectors) is similar to indexing with Python lists. The following examples use the array a = np.array([1, 2, 3, 4]), depicted as:

1 2 3 4

Remember that Python indexing is zero-based, so the index at value 1 is [0], value 2 is [1], and so on. Let’s look at some common techniques for indexing simple NumPy arrays:

Method Example- Sub-set Comment
a[m] * 7.1 a[0] 1 _ _ _ First element of the vector array
7.2 a[2] _ _ 3 _ Third element
7.3 a[-1] _ _ _ 4 Last element
a[m:n] ^ 7.4 a[0:3] 1 2 3 _ From index 0 < index 3
7.5 a[1:-2] _ 2 _ _ From index 1 < index at len-2
a[:] 7.6 a[:] 1 2 3 4 Select all elements
a[m:] 7.7 a[2:] _ _ 3 4 From index 2 up to last element
a[:n] 7.8 a[:2] 1 2 _ _ From index 0 up to < index 2
a[m:n:p] 7.9 a[0:3:2] 1 _ 3 _ Using from:to:step-by notation
7.10 a[::-1] 4 3 2 1 Reverse the array

* The first method uses simple indexing, but the rest are slicing operations — which create a volatile “view” of the underlying array.

With slicing operations the left index is inclusive but the right index is exclusive. So a[0:3] reads as: get the sub-set starting at the value at index 0, up to but not including the value at index 3. If it helps understand it better, think of m:n as where to put a cursor — you put the first cursor to the left of index m, and the second cursor to the left of index n. In the case of a[0:3], like so: |0 1 2 |3. The indices between the cursors define the sub-set of interest.

The output style (i.e. _ _ 3 4 ) shown in the sub-set column is for illustration purposes, to help compare the result against the original array. The actual resulting array does not have place-holders as this might suggest, but only the values matched via the indexing operation. For example the slice a[2:] results in the smaller NumPy array: array([3, 4]). This applies to similar array depictions in some of the output that follows.

Modifying arrays

Array modification is also possible using indexing or slicing, but the data being targeted must match the dimensions of the slice, or be a scalar. (This is an early peak into broadcasting, see the section Chapter 8. Array Computation for more on this topic).

The following examples show the array a = np.array([1, 2, 3, 4])being progressively modified:

Method Example- Result Comment
a[m] = p 7.11 a[1] = 9 1 9 3 4 Second element modified
a[m:n] = [q,r,..] 7.12 a[0:2] = [7, 8] 7 8 3 4 Lengths must match
a[m:n] = p 7.13 a[:] = 9 9 9 9 9 Replace all with scalar
7.14 a[2:] = 8 9 9 8 8 Replace from idx 2 to last
a[m:n:p] = [q,r..] 7.15 a[::2] = [1, 2] 1 9 2 8 Step-wise modification
7.16 a[:2] = [1, 2, 3] ValueError! Lengths incompatible

Recall that common indexing or slicing creates a “view” of the underlying array, meaning changes to the sub-set will affect the original array it was sliced from. As a demonstration:

    a = np.array([1, 2, 3, 4])
    b = a[0:2]
    b[1] = 9
    print(b)
    print(a)
[1 9]

[1 9 3 4]    

7.2  Common Indexing & Slicing: n-D Arrays

2-D (matrices) or n-D (multidimensional) array selection permits indexing and slicing in a similar way, using a tuple of expressions applied to each axis. The following examples use the array:

a = np.array([[1, 2, 3], [4, 5, 6]])

a 2x3 matrix depicted as:

1 2 3

4 5 6

Example 7-17

Select all rows, 2nd column:

    a[:,1]

_ 2 _

_ 5 _

Example 7-18

Select 2nd row, all columns:

a[1,:]

_ _ _

4 5 6

Example 7-19

Select all rows, every 2nd column:

    a[:,::2]

1 _ 3

4 _ 6

Example 7-20

Single element (2nd row, 3rd column):

a[1,2] # Or a[(1,2)] — optional braces

_ _ _

_ _ 6

Example 7-21

Modify the last column:

    a[:,2] = [8,9]

1 2 8

4 5 9

7.3  Fancy Indexing

Fancy indexing allows you to access and manipulate specific elements or slices of an array with the use of arrays, lists, expressions, or Boolean lists to locate target elements. Fancy indexing creates a copy of the data, so there’s no danger of modifying the original array.

Using lists or integer sequences

The following examples use the arrays:  

1 2 3 4

and:

1 2

    a = np.array([1, 2, 3, 4])
    b = np.array([1, 2])

Example 7-22

Pass a list of indexes to match:

    a[[0, 1, 2]]

1 2 3 _

Example 7-23

Pass an array of indexes to match:

    a[b]

_ 2 3 _

The following examples use the array:

    a = np.array([[1, 2, 3], 
                  [4, 5, 6]])

Depicted as:

1 2 3

4 5 6

Example 7-24

Mixed mode (slicing & fancy indexing):

    # All rows, and these columns..
    a[:,[1,2]]

_ 2 3

_ 5 6

Recall that fancy indexing creates a copy of the array, meaning changes to the sub-set won’t affect the original array. As a demonstration:

    b = a[:,[1,2]]
    b[0,1] = 9
    print(b) # Value at [0,1] = 9
    print(a) # Value at [0,1] unchanged
[[2. 9.]
 [5. 6.]]

[[1. 2. 3.]
 [4. 5. 6.]]

Boolean-expression indexing

Boolean indexing with NumPy allows you to select elements from an array based on a condition, targeting the array’s values at indexes that meet the condition (is True). Consider the following examples, where:

    a = np.array([1., 2., 3., 4.])
    b = (a < 2)

First, let’s see what b evaluates to:

    print(b)
[ True False False False]

b is assigned the result of an element-wise conditional evaluation; in this case returning True where each value, v, in a, meets the condition v < 2, otherwise returning False. So to proceed, we have the variables a and b to work with:

a = 1 2 3 4 — a NumPy array.

b = True False False False — a simple Python list.

Example 7-25

Explicit boolean list selection:

    a[[True, False, False, True]]

1 _ _ 3

Example 7-26

Variable boolean list selection:

    a[b]

1 _ _ _

Example 7-27

Negated boolean list selection:

    # Negation converts all False values to True, and True
    # to False..
    a[~b]

_ 2 3 4

Example 7-28

Combining arithmetic with a boolean expression:

    # For each value in a, True if odd number else False
    a[a%2 == 1]

1 _ 3 _

7.4  Exercises

The exercises will refer to the following NumPy array:

    student_scores_list = [
        [1, 63.5, 56.0, 68.0, 73.5],
        [2, 53.0, 77.5, 61.0, 83.0],
        [3, 59.0, 79.0, 67.5, 70.0]
    ]
    scores_array = np.array(student_scores_list)

Note: answers to array selection questions will be affected by any previous question’s modifications.

Exercise 7-1

Get a list of (only) the scores of the 1st student.

Exercise 7-2

Print out student IDs of the 1st two students.

Exercise 7-3

Modify the score of the 2nd student in the second test to 87.

Exercise 7-4

Print the scores of all students in the 3rd test.

Exercise 7-5

Modify the scores of all students in the 4th test to 75.

Exercise 7-6

Print the scores of the 2nd student in the last two tests.

Exercise 7-7

Modify the scores of all students in the 1st test to 70.

Exercise 7-8

Print the ID and scores of the last student.

Exercise 7-9

Repeat the selection from Example 7-8 but assign it to the variable sub_arr. Change the first score of sub_arr to 77. Is this sub-selection a view? Confirm by printing both sub_arr and scores_array to see if the original array has also been modified.

Exercise 7-10

Retrieve an array of the scores only (no student id) and apply a conditional expression to return True|False for any scores over 80.

Chapter 8. Array Computation


Array computation in NumPy is about performing efficient and versatile mathematical operations and data manipulations on multidimensional arrays. These include arithmetic, logical, matrix, set, and statistical operations. Ufuncs (universal functions) allow you to perform operations in an element-wise fashion, while broadcasting allows arrays of different shapes to be combined and operated on by automatically adjusting the dimensions of a smaller array to match that of a larger one.

Performing element-wise operations on lists or arrays is also know as vectorisation. Aside from being syntactically more concise than loops, very often the underlying implementation of a vectorisation operation can lead to dramatic improvements in processing times.

8.1  Unary Ufuncs — Operating on a Single Array

Where required, the examples below use the array:

    a = np.array([1, 2, 3])

Example 8-1

Element-wise absolute value for int or float:

Functions Example Result
abs, fabs np.abs(np.array([-1, 2, -3.1])) [1, 2, 3]

Example 8-2

Ceiling or floor of each element:

Functions Example Result
ceil, floor np.floor(np.array([-1.3, 2, 3.8])) [-2., 2., 3.]

Example 8-3

Round to nearest int or decimal:

Functions Example Result
rint, round np.rint(np.array([-1.3, 2, 3.8])) [-1., 2., 4.]

Example 8-4

Calculate the square root or square:

Functions Example Result
sqrt, square np.sqrt(np.array([9, 16, 4])) [3., 4., 2.]

Example 8-5

Exponentiation — (ex, 2x, 1/x):

Functions Example Result
exp, exp2, reciprocal np.exp(np.array([1, 2])) [2.71828183, 7.3890561]

Example 8-6

Natural log, base 10 log, base 2, log(1+x):

Functions Example Result
log, log10, log2, log1p np.log10(np.array([100, 1000])) [2., 3.]

Example 8-7

Get the sign of each number (1 or -1):

Functions Example Result
sign np.sign(np.array([-2, 2.5])) [-1., 1.]

Example 8-8

Split an array into [fraction, integral] parts:

Functions Example Result
modf np.modf(np.array([1, -2.1])) [ 0. , -0.1], [ 1., -2.]

Example 8-9

Test for NaN, +/-infinity, finiteness:

Functions Example Result
isnan, isinf, isfinite np.modf(np.array([1, -2.1])) [ 0. , -0.1], [ 1., -2.]

Example 8-10

Trigonometric functions and their hyperbolic and inverse relations:

Functions Example Result
cos, sin, tan np.sin(np.array([0, 1])) [0., 0.84147098]
cosh, sinh, tanh np.sinh(np.array([0, 1])) [0., 1.17520119]
arccos, arccosh, arcsin np.arcsin(np.array([0, 1])) [0., 1.57079633]
arcsinh, arctan, arctanh np.arcsinh(np.array([0, 1])) [0., 0.88137359]

Example 8-11

Evaluate this if condition met, otherwise that:

Functions Example Result
where np.where(a < 3, a , a * 3) [1, 2, 9]

Example 8-12

Get the truth values of a negation:

Functions Example Result
logical_not np.logical_not(a<2) [False, True, True]
~ (operator) ~(a<2) [False, True, True]

Example 8-13

Real or imaginary component of imaginary numbers:

Functions Example Result
real np.real(np.array([1+2j, 3+4j])) [1., 3.]
imaginary np.imag(np.array([1+2j, 3+4j])) [2., 4.]

8.2  Binary Ufuncs

Basic math and logic functions have corresponding operators that can be used interchangeably, whilst math operators can also be used in the operator-assignment style (i.e. +=). Where required, the following examples use the arrays:

    a = np.array([1, 2, 3])
    b = np.array([4, 5, 6])
    c = a.copy()

In the following tables, some functions display the function name paired with its equivalent math operator.

Example 8-14

Common arithmetic functions and their operator shortcuts:

Functions Example Result
add (+), subtract (-) np.add(a, b) [5, 7, 9]
multiply (*), divide (/) c *= b [4, 10, 18]
floor_divide (//) np.floor_divide(a, b), or (a // b) [0, 0, 0]
remainder, modulus np.remainder(b, a) [0, 1, 0]
divmod np.divmod(b, a) [4, 2, 2], [0, 1, 0]
power (**) np.power(a, b), or a ** b [1, 32, 729]
minimum, maximum np.maximum(a, b) [4, 5, 6]
copysign np.copysign(a, [-4, 5, 6]) [-1., 2., 3.]

Example 8-15

Element-wise equality tests:

Functions Example Result
greater (>), greater_equal (>=) np.greater(b, a) [True, True, True]
less (<), less_equal (<=) np.less(b, a) [False, False, False]
equal (==), not_equal (!=) np.equal(a, [1, 4, 5]) [True, False, False]

8.3  Broadcasting — Binary Operations with Arrays of Dissimilar Dimension

Broadcasting allows arrays with different shapes to be combined in operations, but restricted to arrays with compatible dimensions – the axes of the trailing dimensions are equal or either array has a dimension of 1.

Example 8-16

Scalar operations with 1-D arrays:

Where: a = np.array([1, 2, 3])

Example Broadcast (intermediate step) Result
a * 2 a * [2,2,2] [2,4,6]

Example 8-17

Scalar operations with 2-D arrays:

Where: a = np.array([[1,2,3], [4,5,6]])

Example Broadcast (intermediate step) Result
a * 2 a * [[2,2,2], [2,2,2]] [[2,4,6], [8,10,12]]

Example 8-18

Operations between 1-D and 2-D arrays:

Where: a = np.array([[1,2,3], [4,5,6]])

Where: b = np.array([[2.], [3.]])

Example Broadcast (intermediate step) Result
np.multiply(a, b) a * [[2.,2.,2.], [3.,3.,3.]] [[2.,4.,6.], [12.,15.,18.]]

8.4  Matrix Operations

Prefer nump.dot for matrix products, instead of the * (arithmetic) operator. The dot matrix operation results in a scalar value for vectors, or an m x p matrix (from m x n . n x p), whereas * attempts to perform an element-wise arithmetic operation.

Example 8-19

Perform a ‘dot’ product on two matrices:

    a = np.array([[1, 2, 3], [4, 5, 6]])  # 2x3 matrix
    b = np.array([1, 2, 3])               # 3x1 matrix
    np.dot(a, b)

[14 32]

With matrix multiplication (where at least one array is 2-D), the number of columns of the first matrix must match the number of rows of the second matrix (where A is an m x n matrix, and B is n x p). The resulting matrix is of dimension m x p. Here is the above example in matrix notation:

math

Example 8-20

The dot product will behave differently if one side is a scalar. Let’s compare a scalar vs a 1x1 array dot product with a 2-D array:

    # A 2 x 3 matrix:
    a = np.array([[1, 2, 3], [4, 5, 6]]) 

    # A scalar:
    b = 2    

    # A 1 x 1 matrix:                            
    c = np.array([2])      

    # Applies element-wise multiplication:
    np.dot(a, b)

[2  4 6

8  10 12]

The dot operation falls back to a “scalar * array” operation. Whereas an explicit 1x1 array fails, as it will attempt to perform the “matrix dot matrix” product operation with incompatible dimensions:

    np.dot(a, c)  # Matrix dimensions incompatible!
ValueError: shapes (2,3) and (1,) not aligned: 3 (dim 1)
!= 1 (dim 0)

See also:

Function Description
cross(a,b) Return the cross product of two (arrays of) vectors
inner(a,b) Ordinary inner product 1-D arrays or sum product over the last axes
kron(a,b) Kronecker product of two arrays
outer(a,b) Compute the outer product of two vectors
tensordot(a,b) Compute tensor dot product along specified axes

8.5  Set Operations

A set, by definition, is an unordered collection of unique objects. Set functions can be unary (e.g. unique) or binary (e.g. union1d).

Example 8-21

Where: a = np.array([[1,2,2], [4,4,6]]), get the unique values:

Example Result
np.unique(a) [[1,2,4,6]]

Example 8-22

Find the union of two arrays:

Example Result
np.union([-1,2,3], [1,3,5]) [[-1,1,2,3,5]]

Example 8-23

Test whether each element of a 1-D array is also present in a second array:

Example Result
np.isin([-1,2,3], [1,3,5]) [False, False, True]

See also:

Function Description
intersect1d(a,b) Find the intersection of two arrays
isin(a,b) Boolean array of size a, where elements of a exist in b
setdiff1d(a,b) Find the set difference of two arrays
setxor1d(a,b) Find the set exclusive-or of two arrays

8.6  Other Logic Operations

Example 8-24

Logic tests. Where a = np.array([[1, 2], [3, 4]]):

Function Example Result
array_equal(a,b) np.array_equal([1,1],[1]) False
array_equiv(a,b) np.array_equiv([1,1],[1]) True
select(a,b) np.select([a>1], [a*2], 99) [[99, 4], [6, 8]]

See also:

Function Description
all(a) Test whether all elements along a given axis evaluate to True
allclose(a,b) Test if two arrays are element-wise equal within a tolerance
any(a) Test whether any array element along a given axis evaluates to True
nonzero(a) Return the indices of the elements that are non-zero

8.7  Statistical Operations

Statistical functions are typically unary, but can be made to operate along individual axes.

Example 8-25

Stats operations. Where: a = np.array([[1, 2], [3, 4]]):

Function Example Result
mean np.mean(a) 2.5
max np.max(a, axis=1) [2, 4]
cumsum np.cumsum(a) [ 1, 3, 6, 10]

See also:

Function Description
argmax Returns the indices of the maximum values along an axis
argmin Returns the indices of the minimum values along an axis
cumprod Return the cumulative product of elements along a given axis
min Return the minimum of an array or minimum along an axis
std Compute the standard deviation along the specified axis
sum Sum of array elements over a given axis
var Compute the variance along the specified axis

8.8  Exercises

The first five exercises will refer to the following NumPy arrays:

    a = np.array([1, 2, 3])
    b = np.array([4, 5, 6])

Exercise 8-1

Perform element-wise addition (+) and multiplication (*) of the two arrays and print the results.

Exercise 8-2

Does the matrix dot operation between the arrays result in a valid array? Print the result. Can you explain how the result was generated?

Exercise 8-3

Perform element-wise comparison (greater than, less than, equal to) between the elements of these arrays. Print the result of each comparison.

Exercise 8-4

Perform scalar division on the array a with the value 5. Print the resulting array. What is the dtype of the result?

Exercise 8-5

Create a larger NumPy array b with dimensions (3, 3) containing random values. Then, add array a to b. Print the resulting array.

Exercise 8-6

Given the following NumPy array:

    student_scores_list = [
        [1, 63.5, 56.0, 68.0, 73.5],
        [2, 53.0, 77.5, 61.0, 83.0],
        [3, 59.0, 79.0, 67.5, 70.0]
    ]
    scores_array = np.array(student_scores_list)

What percentage of student scores are over 80?

Hint: think about summation of a Boolean array from the result of an element-wise math expression (True evaluates to 1, False to 0).

Exercise 8-7

Compute the mean and standard deviation of the NumPy array arr:

    arr = np.array([1, 2, 3, 4, 5])

Exercise 8-8

Compute and print out the median and quartiles (25th and 75th percentiles) of the following NumPy array arr:

    arr = np.array([10, 20, 30, 40, 50])

Exercise 8-9

Compute and print out the correlation coefficient between the two NumPy arrays x and y:

    x = np.array([1.1, 2.3, 3.3, 4.1, 5.6])
    y = np.array([5.8, 4, 3.4, 2.1, 1.05])

Would you say these arrays are strongly correlated, weakly correlated, or not correlated? If correlated, in what direction?

Chapter 9. Array Transformation


NumPy arrays can be transformed in many ways, generally these include transposing, reshaping, combining, splitting, rotating, and sorting. The following examples use the following array, a 2x3 matrix:

    a = np.array([[1, 2, 3], [4, 5, 6]])

Depicted as:

1 2 3

4 5 6

9.1  Transposing

The transpose of a 1-D array is unchanged, otherwise swap rows and columns.

Example 9-1

Transpose a 2-D array:

    np.transpose(a)

1 4

2 5

3 6

Transposing returns a view of the array. a.swapaxes(0,1) achieves the same result.

9.2  Reshaping

Arrays can be reshaped to a desired (but compatible) shape.

Example 9-2

Reshape a 2x3 array to 3x2:

    np.reshape(a, (3,2))  # Or: a.reshape(3,2)

1 2

3 4

5 6

New and old shapes must be compatible. Eg a.reshape(2,2) would result in an error. But np.reshape(a, (1,6)) yields: [1, 2, 3, 4, 5, 6].

9.3  Flattening

Flattening reduces the dimension of an array.

Example 9-3

Flatten an array into a vector:

    np.ravel(a)  # Creates a view

1 2 3 4 5 6

Example 9-4

Flattening with order:

    np.ravel(a, order='F')  # Creates a view

1 4 2 5 3 6

See the API docs for various Flip and reverse ordering options.

a.reshape(-1) yields: [1, 2, 3, 4, 5, 6]. Use np.flatten(a) to create a copy of the array instead.

9.4  Rotating

Rotations can be performed at the element level or upon an axis.

Example 9-5

Rotate by flip & reverse:

    np.flip(a)

6 5 4

3 2 1

Example 9-6

Rotate by flipping on axis:

    np.flip(a, 0)

4 5 6

1 2 3

Flip and rotate operations behave as you’d expect, with variations depending on optional arguments. Note flip(a,0) ~= flipud(a) and flip(a,1) ~= fliplr(a).

Example 9-7

Rotate an array by 90 degrees in the plane specified by axes:

    a = np.array([[1, 2, 3], [4, 5, 6]])
    np.rot90(a, 1)

3 6

2 5

1 4

See also:

Function Description
roll Roll array elements along a given axis.

9.5  Combining & Splitting

The following examples will make use of the arrays:

a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[7, 8, 9]])

Depicted as:

1 2 3

4 5 6

and

7 8 9

Example 9-8

Join arrays in sequence along an axis:

    np.concatenate((a, b))  # Default axis is 0

1 2 3

4 5 6

7 8 9

Arrays must have the same shape, except in the dimension corresponding to the given (or default) axis.

Example 9-9

Combine & flatten into a vector:

    np.concatenate((a, b), axis=None) 

1 2 3 4 5 6 7 8 9

The concatenate function creates a copy of the data.

Example 9-10

Split into multiple arrays:

The following example will make use of the 1D array, a = np.array([1, 2, 3, 4, 5, 6]):

1 2 3 4 5 6

    np.split(a, 2)  # Returns an array of arrays

[ [ 1 2 3 ], [ 4 5 6 ] ]

See also:

Function Description
hsplit Split an array into multiple sub-arrays horizontally (column-wise)
dsplit Split array into multiple sub-arrays along the 3rd axis (depth)
vsplit Split an array into multiple sub-arrays vertically (row-wise)

The following examples will make use of the arrays:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])`

Depicted as:

1 2 3

and:

4 5 6

Example 9-11

Stacking arrays by row:

    np.stack((a, b))

1 2 3

4 5 6

This operation is equivalent to np.vstack((a, b))

Example 9-12

Stacking arrays by column:

    np.stack((a, b), axis=-1)

1 4

2 5

3 6

See also:

Function Description
hstack Stack arrays in sequence horizontally (column wise)
dstack Stack arrays in sequence depth wise (along third axis)
vstack Stack arrays in sequence vertically (row wise)

9.6  Sorting

The following examples use the array:

a = np.array([[4, 2, 1], [3, 6, 5]])

Depicted as:

4 2 1

3 6 5

Example 9-13

Sort by ‘row’:

    np.sort(a)

1 2 4

3 5 6

Example 9-14

Sort by ‘row’, but output the index:

    np.argsort(a)

2 1 0

0 2 1

Example 9-15

Flatten, then sort:

    np.sort(a, axis=None)

1 2 3 4 5 6

Example 9-16

Sort by ‘column’:

    np.sort(a, axis=0)  # Or.. a.sort() 

3 2 1

4 6 5

See also:

Function Description
argsort Returns the indices that would sort an array

9.7  Exercises

Exercise 9-1

Convert the array a = np.array([[1, 2], [3, 4]]):

1 2

3 4

So that it looks like this:

1 3

2 4

Exercise 9-2

Is it possible to reshape the 2x2 array a = np.array([[1, 2], [3, 4]]) into a 4x1 array? If so what is the procedure? Assign this new array to variable b.

Exercise 9-3

Flatten the arrays a and b from exercise 9-2, and combine them into a single 2x4 array assigned to c, resulting in:

    array([[1, 2, 3, 4],
           [1, 2, 3, 4]])

Exercise 9-4

Use a rotate operation upon c to convert it into the 2x4 array:

    array([[4, 4],
           [3, 3],
           [2, 2],
           [1, 1]])

Exercise 9-5

Convert the 1-D array a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) into a 3x3 array sorted in reverse order, so that the result is:

    array([[9, 8, 7],
           [6, 5, 4],
           [3, 2, 1]])

Chapter 10. String Arrays


Arrays made up of string values can be created, copied, and otherwise manipulated just like any NumPy array. However, given that the values are non-numeric, many, if not most of the numeric operations we’ve seen throughout this book will not be applicable.

The new numpy.strings module (as of NumPy 2.0) provides a set of functions and utilities for the manipulation of string-based NumPy arrays. It’s designed to handle string data efficiently, offering vectorised operations that can be applied to entire arrays of strings. There are operations for string splitting, case conversion, substring searching, and more - optimised for performance on large datasets compared to standard Python string methods.

String-based array processing in NumPy can be useful for:

  • Text Processing: Manipulating small or large datasets of text as arrays, in an element-wise fashion.
  • Pattern Matching: Searching for patterns or substrings across large collections of strings.
  • Data Cleaning: Normalising, formatting, or splitting text data (e.g., removing whitespace, converting to lowercase).
  • Tokenization: Breaking down text into smaller components for NLP tasks like word or sentence tokenization.
  • Feature Extraction: Extracting features for machine learning tasks, such as counting word frequencies or similarities.
  • Vectorization: Converting text into numerical representations, like bag-of-words or TF-IDF, for use in machine learning models.

10.1 Common String Processing Operations

The following examples rely on the array, a, defined as follows:

    import numpy as np
    a = np.array(["Apple", "Banana", "Cherry", "date", "", 
                  "42"], dtype="str_")

Example 10-1

String arrays can be tested for any; if at least one element evaluates to True:

    np.any(a) 
True

Example 10-2

String arrays can also be tested for all; if every element evaluates to True:

    np.all(a)
False

A string will always evalute to True - unless it’s empty ("") or None; this includes the values "False" and "0", both of which evaluate to True. Both single ('Hi there.') and double ("Hi there.") quoted strings are acceptable when generating data. If your strings include single quotes (and the quotes are not escaped), then use double quotes ("I'm here!"). If they include double quotes, then wrap them in single quotes ('"Hi", she shouted'). It’s always good practice to be consistent and stick to a standard approach if possible.

Example 10-3

String arrays can be easily converted to lower or upper-case, or even capitalised:

    np.strings.lower(a)
    np.strings.upper(a)
    np.strings.capitalize(a)
array(['apple', 'banana', 'cherry', 'date', '', '42'], 
      dtype='<U6')
array(['APPLE', 'BANANA', 'CHERRY', 'DATE', '', '42'], 
      dtype='<U6')
array(['Apple', 'Banana', 'Cherry', 'Date', '', '42'], 
      dtype='<U6')

Performing element-wise string operations creates a new array, and you will often assign a variable when doing so (for example b = np.strings.upper(a)). You might have noticed the type of the new array in the previous example is <U6 - the conversion assigned it the most appropriate type, in this case a Unicode string of length 6 characters.

Example 10-4

String array values may be self-concatenated to produce repeating values:

    np.strings.multiply(a, 3)
array(['AppleAppleApple', 'BananaBananaBanana', 
       'CherryCherryCherry', 'datedatedate', '', 
       '424242'], dtype='<U18')

Example 10-4

Replacing text inside strings is straighforward:

    np.strings.replace(a, 'Ba', 'XE')
array(['Apple', 'XEnana', 'Cherry', 'date', '', 
       '42'], dtype='<U6')

Example 10-5

Compare the equality of string arrays to return an array of True | False values:

    b = np.array(["Apple", "Orange", "Cherry", "date", "", 
                  "97"], dtype="str_")
    np.strings.equal(a, b)
array([ True, False,  True,  True,  True, False])

Example 10-6

Find the occurances of a string in the array:

    np.strings.find(a, "ang")
# Returns index of first occurance, or -1 if not found
array([-1, -1,  2, -1, -1, -1])

Example 10-6

Check whether any values are numeric (the entire value represents a number):

    np.strings.isnumeric(a)
array([False, False, False, False, False,  True])

Example 10-6

Test whether any value starts with, or ends with a sub-string:

    np.strings.startswith(a, "Ap")
    np.strings.endswith(a, "e")
array([ True, False, False, False, False, False])
array([ True, False, False,  True, False, False])

See also:

Function Description
strip Remove leading and trailing characters (defaults to whitespace)
zfill Return a numeric string left-filled with zeros
not_equal Return (x1 != x2) element-wise
greater Return the truth value of (x1 > x2) element-wise
less Return the truth value of (x1 < x2) element-wise
isspace Return the truth value if there are only whitespace characters
isalpha Return the truth value if there are only alphabeitc characters
str_len Returns the length of each element
swapcase Uppercase characters converted to lowercase and vice versa

10.2 Exercises

Exercise 10-1

Swap the case of the following array so that uppercase letters are converted to lowercase, and lower to upper:

['Hello', 'WORLD', 'FROM', 'Python!']

Exercise 10-2

Pad the following array of numeric strings with zeros, up to a width of 4:

['42', '97', '2005', '0025']

Exercise 10-3

Test whether the values in the following array consist only of alphabetic characters:

['Los', 'Angeles', 'Year 2019']

Exercise 10-4

Find the length of each value in the previous array.

Appendix A. Virtual Environments & Containers


When you install a lot of software packages over time, your system can become bloated and can run the risk of programs failing. This can be due to conflicting package versions, or packages that were removed inadvertently that a program might rely on. Or maybe you’ve tried to run different versions of the same application (with potentially conflicting dependencies) for testing.

A solution to address these problems is to create virtual environments – isolated, self-contained ‘sand-boxes’ that are perfect for protecting your applications from potential dependency and usage conflicts.

A.1  virtualenv

Install virtualenv

One of the most popular programs for creating virtual environments is virtualenv, itself a package that you install via PyPI:

    pip install virtualenv

Once installed you’re ready to create virtual environments. It’s a good idea to have a top-level folder under which all your virtual environments reside. From now on we’ll refer to a virtual environment simply as a “venv”.

Another benefit of venvs is reproducibility. You can create a sand-box with the exact software, packages, and package versions you require to reflect another environment, such as what you may have in production. This ensures every contributor to a project has an identical set-up.

Create and activate a virtual env’t

To create a venv, change into the directory of the parent location for your venvs. Existing venvs will reside here as sub-directories. To create a venv run the following command:

    python -m venv my-venv

The name of the venv can be anything you like, but of course uniquely named under this location. To enter a venv you need to ‘activate’ it, for example:

    source my-venv/bin/activate

You can do this from anywhere by passing the fully qualified path to the venv folder:

    source /path/to/my-venv/bin/activate

In windows this might look like this (a . (dot) operator is equivalent to source):

    . C:\path\to\my-venv\bin\activate

When you enter the venv, you’ll notice a special prompt that informs you that you’re inside a venv. For example:

(my-venv) user@host:~/:$

Once you’re in the venv, you can execute python scripts or install packages as you would normally.

    python --version
'Python 3.10.12'
    pip install numpy jupyterlab

(Output not shown)

Packages will be installed for this venv without knowledge of other venvs. You can then run your programs as you would in a global environment:

    jupyter lab  
    # [Ctrl+c] to stop Jupyter..

Exit the virtual env’t

To exit out of a venv, shut down any running programs and type:

    deactivate

and you’ll be returned to a regular console prompt.

There’s a lot more to virtual environments in Python (including the ability to enter/execute/exit them via shell scripts), so consult the online documentation or follow a good tutorial.

A.2  Docker Containers

Docker is a modern, small footprint alternative to hardware virtualisation1. Instead of running an entire guest operating system as a virtual machine (composed of the full complement of an operating system’s disk, memory, and processing designated in advance), Docker containers are light-weight system shells that install the bare minimum that’s required to run an application.

A container then relies on the host’s infrastructure for access to the operating system kernel and interfaces. This means you can run multiple Docker containers on commodity hardware. Containers are sand-boxed environments that isolate your apps, inside what looks like — to them — a stand alone operating system, of which there are many variants, most commonly Linux based.

Docker containers can be created from scratch, but there are also images available that you can point to that are already set up with all or most of an application’s needs. There’s even a Jupyter image which you can use to launch a Jupyter-ready container after Docker is installed2.

The easiest way to install and get Docker running is to use Docker Desktop3. Once you have it installed, start Docker Desktop, and this will also start the Docker engine. You can find and install images via the “Images — Hub” area in the Docker Desktop application, otherwise it’s also very easy to do via a command line:

    docker pull quay.io/jupyter/scipy-notebook

Once this completes, you can run Jupyter as follows:

    docker run -p 10000:8888 quay.io/jupyter/scipy-notebook

This will launch Jupyter in a web browser at the localhost:8888/ URL, and you can start working with Python notebooks.

The Docker Desktop app also allows you to start and stop, or manage Docker containers that were installed either via the application interface or from the command line (it will detect them). It’s a good idea to become familiar with Docker Desktop’s features using the ‘help’ documentation.


  1. Docker. https://www.docker.com.↩︎

  2. Jupyter image. https://jupyter-docker-stacks.readthedocs.io/en/latest.↩︎

  3. Docker Desktop https://docs.docker.com/desktop.↩︎

Appendix B. Python Lists & array Vs NumPy Arrays


Python lists are a native language feature that let you easily create array-like data sequences, using bracket ([]) indexing to access the data. Lists are also expandable — you can add and remove items at any time. Whilst flexible, they have limitations, and the array object is available to use if lists don’t meet your engineering goals. For advanced data processing, however, the NumPy array (ndarray) may be the superior choice.

When you think of arrays, you typically expect them to have the following characteristics:

  1. Be of a fixed size

  2. Be of the same type

  3. Have efficient indexing

  4. Ability to efficiently operate on them

  5. Multidimensional structure.

We’ll discuss lists, the array module, and NumPy arrays in context of these features, and highlight how they differ and when you might prefer one over the others.

B.1  Python Lists

Python lists are simple to create, for example:

    items = [1, 2.5, 'N/A', [555, 'FILK']]
    items
[1, 2.5, 'N/A', [555, 'FILK']]

This is a perfectly valid Python list, but you can immediately see that a list can be made up of any type. Lists, therefore, put the responsibility on the engineer to enforce the rules of typing. And whilst it’s perfectly acceptable to stick to a convention, the lack of language-level enforcement can lead to problems simply because nothing prevents the list from being created with the wrong types in the first place.

Lists are also expandable, and therefore don’t have a fixed size:

    items.pop(3)
    items
[1, 2.5, 'N/A']

So far we’ve broken the first two “rules” of the array test. But what about indexing? Lists are easily indexed, which meets goal 3. But without enforcement, you may not know what type you’re getting for certain. On top of that, in order to operate on a list in an element-wise fashion, you need to iterate over the entire list, for example:

    for i in items:
        print(i)
1
2.5
N/A

You can be a little more expressive with list comprehensions:

    [i for i in items]
[1, 2.5, 'N/A']

But at the end of the day, it’s just a more concise way of looping over a list.

Furthermore, if you want to operate on a list you need to be sure to handle potential problems with types, as they may not be compatible with the operation, for example:

    [i * 2 for i in items]
[2, 5.0, 'N/AN/A']

This example ostensibly performs a multiplication operation on the data, and this time gets a result without failing; but the resulting value may vary unexpectedly depending on an element’s type (* is an overloaded operator and works with strings differently to numbers) — or it may fail entirely.

Given that lists can be made up of other lists, you can design them to have a multidimensional structure:

multidim = [[1,2,3], [4,5,6]]

And this looks a lot like a NumPy array — in fact you could use this array-like object to create a NumPy array, as we’ve seen before in this book. But relying on lists for n-D arrays adds to the complexity of enforcing types and managing dimension sizes. They soon become inefficient (multiple loops and complex logic) and unreliable.

B.2  The array Array

The array module ships with Python as a core library feature, so you don’t have to install it using pip, but you do need to import it:

    import array

To create an array you are required to specify the type, and there are thirteen available type codes defined. Here are a few of them:

  • 'b' — a char
  • 'i' — a signed int of 2 bytes
  • 'l' — a signed long of 4 bytes
  • 'f' — a float of 4 bytes
  • 'd' — a double of 8 bytes

You can also run array.typecodes to get a quick print out of all codes:

    array.typecodes
'bBuhHiIlLqQfd'

Use the handy help() feature to get useful information on any function. help(array) will output information about array usage, along with a full listing of type codes with more detail.

help(array)
NAME
    array

DESCRIPTION
    This module defines an object type which can 
    efficiently represent an array of basic values: 
    characters, integers, floating point numbers. 
    Arrays are sequence types and behave very much
    like lists, except that the type of objects 
    stored in them is constrained.

CLASSES
    builtins.object
        array
    
    ArrayType = class array(builtins.object)
    | array(typecode [, initializer]) -> array
    |  
    | Return a new array whose items are restricted by 
    | typecode, and initialized from the optional 
    | initializer value, which must be a list, string or 
    | iterable over elements of the appropriate type.
    |  
    | Arrays represent basic values and behave very much 
    | like lists, except the type of objects stored in 
    | them is constrained. The type is specified at object
    | creation time by using a type code, which is a 
    | single character.
    |
    | The following type codes are defined:
    |  
    |      Type code   C Type             Min size (bytes)
    |      'b'         signed integer     1
    |      'B'         unsigned integer   1
    |      'u'         Unicode character  2 (see note)
    |      'h'         signed integer     2
    |      'H'         unsigned integer   2
    |      'i'         signed integer     2
    |      'I'         unsigned integer   2
    |      'l'         signed integer     4
    |      'L'         unsigned integer   4
    |      'q'         signed integer     8 (see note)
    |      'Q'         unsigned integer   8 (see note)
    |      'f'         floating point     4
    |      'd'         floating point     8

(Output truncated)

Creating an array is done as follows (first argument is the type code):

    a = array('i', [1, 2, 3, 4, 5])
    a
array('i', [1, 2, 3, 4, 5])

You have overcome the problem of loose typing with array arrays, and have at least created a structure of a certain size up front. But the array is not designed to be multidimensional — unless you create a list of array arrays:

    b = array('i', [1, 2, 3])
    c = array('f', [2., 3., 4.])
    a = [b, c];
    a
[array('i', [1, 2, 3]), 
 array('f', [2.0, 3.0, 4.0])]

Overcoming one problem, however, just re-introduces the problem that you had to begin with — lists. And you now have two different structures to deal with when processing.

In terms of resource efficiency, an array is better at memory allocation, because it’s given the type (and therefore the size of memory to allocate to an element), and the size of the array — so it knows the amount of memory to assign to the entire array object. Where a list may perform better than an array array is with adding and removing elements, simply because a list will over-allocate memory and this may provide a processing efficiency benefit on average.

However, if you expect to be doing a lot of insert, append, or remove operations, then neither lists nor arrays may be the best option. In that case you should look at more advanced structures such as hash tables, or use an appropriate database. A library like Pandas is also an excellent choice if you need to manipulate table-like DataFrames.

It’s not entirely true that array is of a fixed size, because it does have append and pop functions that will resize it. In this sense, it’s more like a Java ArrayList, for those who are familiar with Java programming.

B.3  The Case for (or Against) NumPy Arrays

The entire book has been dedicated to explaining the importance and use of ndarray arrays in NumPy, so there’s no need to re-state the case at any length. And given the limitations of lists or array arrays, it should now be clearer if and when you might decide to upgrade from one approach to the next.

If you need a simple data structure that’s easy to create, is mutable, that you can take responsibility over typing, and you won’t be performing very complex operations on, then stick with lists. Lists are highly inter-operable, and are easy to work with especially along with list comprehensions or generators.

An array is something of an improvement on lists — you could call them a wrapper of simple lists that enforces type checking. Another reason to prefer an array over a list is when you might need to persist simple data structures to file storage. It would be prudent to have type safety in this case, especially if the stored data will be shared among software components.

For everything else, use NumPy. Or, find a library such as SciPy or Pandas that builds on NumPy to provide the more specialised capabilities you’re after.

Appendix C. NumPy Function & Property Reference


Member information listed here (grouped by modules) was extracted from the NumPy source code ‘docstring’, as of NumPy version: 2.2.3. Some modules have been excluded, for example numpy.ctypeslib and numpy.testing. numpy.matrix is no longer recommended and stands to be deprecated.

Listings of other sub-classes such as numpy.chararray and numpy.bmat were also excluded for brevity. You can see a full schedule of classes at the NumPy documentation web page1.

Entries shown with a strike-out have been deprecated since NumPy 2.0. Those that have been removed altogether have also been removed from this reference.

C.1  numpy

This section lists only the immediate members of this top-level NumPy class. Relevant sub-classes are listed separately.

numpy A — F

Member Description
absolute Calculate the absolute value element-wise.
add Add arguments element-wise.
add_docstring Add a docstring to a built-in obj if possible.
add_newdoc Add documentation to an existing object, typically one defined in C
add_newdoc_ufunc Replace the docstring for a ufunc with new_docstring.
all Test whether all array elements along a given axis evaluate to True.
allclose Returns True if two arrays are element-wise equal within a tolerance.
alltrue Check if all elements of input array are true.
amax Return the maximum of an array or maximum along an axis.
amin Return the minimum of an array or minimum along an axis.
angle Return the angle of the complex argument.
any Test whether any array element along a given axis evaluates to True.
append Append values to the end of an array.
apply_along_axis Apply a function to 1-D slices along the given axis.
apply_over_axes Apply a function repeatedly over multiple axes.
arange Return evenly spaced values within a given interval.
arccos Trigonometric inverse cosine, element-wise.
arccosh Inverse hyperbolic cosine, element-wise.
arcsin Inverse sine, element-wise.
arcsinh Inverse hyperbolic sine element-wise.
arctan Trigonometric inverse tangent, element-wise.
arctan2 Element-wise arc tangent of ‘x1/x2’ choosing the quadrant correctly.
arctanh Inverse hyperbolic tangent element-wise.
argmax Returns the indices of the maximum values along an axis.
argmin Returns the indices of the minimum values along an axis.
argpartition Perform an indirect partition along the given axis using the algorithm specified by the ‘kind’ keyword.
argsort Returns the indices that would sort an array.
argwhere Find the indices of array elements that are non-zero, grouped by element.
around Round an array to the given number of decimals.
array Create an array.
array2string Return a string representation of an array.
array_equal True if two arrays have the same shape and elements, False otherwise.
array_equiv Returns True if input arrays are shape consistent and all elements equal.
astype Copies an array to a specified data type..
atleast_2d View inputs as arrays with at least two dimensions.
atleast_3d View inputs as arrays with at least three dimensions.
average Compute the weighted average along the specified axis.
bartlett Return the Bartlett window.
base_repr Return a string representation of a number in the given base system.
binary_repr Return the binary representation of the input number as a string.
bincount Count number of occurrences of each value in array of non-negative ints.
bitwise_and Compute the bit-wise AND of two arrays element-wise.
bitwise_count Computes the number of 1-bits in the absolute value of x.
bitwise_not Compute bit-wise inversion, or bit-wise NOT, element-wise.
bitwise_or Compute the bit-wise OR of two arrays element-wise.
bitwise_xor Compute the bit-wise XOR of two arrays element-wise.
blackman Return the Blackman window.
block Assemble an nd-array from nested lists of blocks.
bmat Build a matrix object from a string, nested sequence, or array.
bool_ Boolean type (True or False), stored as a byte.
broadcast Produce an object that mimics broadcasting.
broadcast_arrays Broadcast any number of arrays against each other.
broadcast_shapes Broadcast the input shapes into a single shape.
broadcast_to Broadcast an array to a new shape.
busday_count Counts the number of valid days between ‘begindates’ and ‘enddates’, not including the day of ‘enddates’.
busday_offset First adjusts the date to fall on a valid day according to the ‘roll’ rule, then applies offsets to the given dates.
busdaycalendar A business day calendar object that efficiently stores information
byte Signed integer type, compatible with C ‘char’.
byte_bounds Returns pointers to the end-points of an array.
bytes_ A byte string.
c_ Translates slice objects to concatenation along the second axis.
can_cast Returns True if cast between data types can occur according to the casting rule.
cbrt Return the cube-root of an array, element-wise.
cdouble Complex number type composed of two double-precision floating-point
ceil Return the ceiling of the input, element-wise.
char This module contains a set of functions for vectorized string operations and methods.
character Abstract base class of all character string scalar types. (To be deprecated)
choose Construct an array from an index array and a list of arrays to choose from.
clip Given an interval, values outside the interval are clipped to the interval edges.
clongdouble Complex number type composed of two extended-precision floating-point numbers.
column_stack Stack 1-D arrays as columns into a 2-D array.
common_type Return a scalar type which is common to the input arrays.
compat Compatibility module.
complex128 Complex number type composed of two double-precision floating-point numbers, compatible with Python ‘complex’.
complex256 Complex number type composed of two extended-precision floating-point numbers.
complex64 Complex number type composed of two single-precision floating-point numbers.
complexfloating Abstract base class of all complex number scalar types that are made up of floating-point numbers.
compress Return selected slices of an array along given axis.
concatenate Join a sequence of arrays along an existing axis.
conj Return the complex conjugate, element-wise.
conjugate Return the complex conjugate, element-wise.
convolve Returns the discrete, linear convolution of two one-dimensional sequences.
copy Return an array copy of the given object.
copysign Change the sign of x1 to that of x2, element-wise.
copyto Copies values from one array to another, broadcasting as necessary.
core Contains the core of NumPy: ndarray, ufuncs, dtypes, etc.
corrcoef Return Pearson product-moment correlation coefficients.
correlate Cross-correlation of two 1-dimensional sequences.
cos Cosine element-wise.
cosh Hyperbolic cosine, element-wise.
count_nonzero Counts the number of non-zero values in the array ‘a’.
cov Estimate a covariance matrix, given data and weights.
cross Return the cross product of two (arrays of) vectors.
csingle Complex number type composed of two single-precision floating-point numbers.
cumprod Return the cumulative product of elements along a given axis.
cumulative_prod Compatible alternatives for cumprod.
cumproduct Return the cumulative product over the given axis.
cumsum Return the cumulative sum of the elements along a given axis.
cumulative_sum Compatible alternatives for cumsum.
datetime64 If created from a 64-bit integer, it represents an offset from ‘1970-01-01T00:00:00’.
datetime_as_string Convert an array of datetimes into an array of strings.
datetime_data Get information about the step size of a date or time type.
deg2rad Convert angles from degrees to radians.
degrees Convert angles from radians to degrees.
delete Return a new array with sub-arrays along an axis deleted.
diag Extract a diagonal or construct a diagonal array.
diag_indices Return the indices to access the main diagonal of an array.
diag_indices_from Return the indices to access the main diagonal of an n-dimensional array.
diagflat Create a two-dimensional array with the flattened input as a diagonal.
diagonal Return specified diagonals.
diff Calculate the n-th discrete difference along the given axis.
digitize Return the indices of the bins to which each value in input array belongs.
disp Display a message on a device.
divide Divide arguments element-wise.
divmod Return element-wise quotient and remainder simultaneously.
dot Dot product of two arrays. Specifically,
double Double-precision floating-point number type, compatible with Python ‘float’ and C ‘double’.
dsplit Split array into multiple sub-arrays along the 3rd axis (depth).
dstack Stack arrays in sequence depth wise (along third axis).
e Convert a string or number to a floating point number, if possible.
ediff1d The differences between consecutive elements of an array.
einsum Evaluates the Einstein summation convention on the operands.
einsum_path Evaluates the lowest cost contraction order for an einsum expression by considering the creation of intermediate arrays.
emath Wrapper functions to more user-friendly calling of certain math functions.
empty Return a new array of given shape and type, without initializing entries.
empty_like Return a new array with the same shape and type as a given array.
equal Return (x1 == x2) element-wise.
errstate Context manager for floating-point error handling.
euler_gamma Convert a string or number to a floating point number, if possible.
exp Calculate the exponential of all elements in the input array.
exp2 Calculate ‘2^p’ for all ‘p’ in the input array.
expand_dims Insert a new axis that will appear at the ‘axis’ position in the expanded array shape.
expm1 Calculate ‘exp(x) — 1’ for all elements in the array.
extract Return the elements of an array that satisfy some condition.
eye Return a 2-D array with ones on the diagonal and zeros elsewhere.
fabs Compute the absolute values element-wise.
fill_diagonal Fill the main diagonal of the given array of any dimensionality.
finfo Machine limits for floating point types.
fix Round to nearest integer towards zero.
flatiter Flat iterator object to iterate over arrays.
flatnonzero Return indices that are non-zero in the flattened version of a.
flexible Abstract base class of all scalar types without predefined length.
flip Reverse the order of elements in an array along the given axis.
fliplr Reverse the order of elements along axis 1 (left/right).
flipud Reverse the order of elements along axis 0 (up/down).
float128 Extended-precision floating-point number type, compatible with C ‘long double’ but not necessarily with IEEE 754 quadruple-precision.
float16 Half-precision floating-point number type.
float32 Single-precision floating-point number type, compatible with C ‘float’.
float64 Double-precision floating-point number type, compatible with Python ‘float’ and C ‘double’.
float_power First array elements raised to powers from second array, element-wise.
floating Abstract base class of all floating-point scalar types.
floor Return the floor of the input, element-wise.
floor_divide Return the largest integer smaller or equal to the division of the inputs.
fmax Element-wise maximum of array elements.
fmin Element-wise minimum of array elements.
fmod Returns the element-wise remainder of division.
format_float_positional Format a floating-point scalar as a decimal string in positional notation.
format_float_scientific Format a floating-point scalar as a decimal string in scientific notation.
format_parser Class to convert formats, names, titles description to a dtype.
frexp Decompose the elements of x into mantissa and twos exponent.
frombuffer Interpret a buffer as a 1-dimensional array.
fromfile Construct an array from data in a text or binary file.
fromfunction Construct an array by executing a function over each coordinate.
fromiter Create a new 1-dimensional array from an iterable object.
frompyfunc Takes an arbitrary Python function and returns a NumPy ufunc.
fromregex Construct an array from a text file, using regular expression parsing.
fromstring A new 1-D array initialized from text data in a string.
full Return a new array of given shape and type, filled with ‘fill_value’.
full_like Return a full array with the same shape and type as a given array.

numpy G — O

Member Description
gcd Returns the greatest common divisor of ‘x1’ and ‘x2’
generic Base class for numpy scalar types.
genfromtxt Load data from a text file, with missing values handled as specified.
geomspace Return numbers spaced evenly on a log scale (a geometric progression).
get_array_wrap Find the wrapper for the array with the highest priority.
get_include Return the directory that contains the NumPy *.h header files.
get_printoptions Return the current print options.
getbufsize Return the size of the buffer used in ufuncs.
geterr Get the current way of handling floating-point errors.
geterrcall Return the current callback function used on floating-point errors.
gradient Return the gradient of an N-dimensional array.
greater Return the truth value of `` element-wise.
greater_equal Return the truth value of `` element-wise.
half Half-precision floating-point number type.
hamming Return the Hamming window.
hanning Return the Hanning window.
heaviside Compute the Heaviside step function.
histogram Compute the histogram of a dataset.
histogram2d Compute the bi-dimensional histogram of two data samples.
histogram_bin_edges Function to calculate only the edges of the bins used by the ‘histogram’ function.
histogramdd Compute the multidimensional histogram of some data.
hsplit Split an array into multiple sub-arrays horizontally (column-wise).
hstack Stack arrays in sequence horizontally (column wise).
hypot Given the “legs” of a right triangle, return its hypotenuse.
i0 Modified Bessel function of the first kind, order 0.
identity Return the identity array.
iinfo Machine limits for integer types.
imag Return the imaginary part of the complex argument.
in1d Test whether each element of a 1-D array is also present in a second array.
index_exp A nicer way to build up index tuples for arrays.
indices Return an array representing the indices of a grid.
inexact Abstract base class of all numeric scalar types with a (potentially) inexact representation of the values in its range.
inf Convert a string or number to a floating point number, if possible.
info Get help information for an array, function, class, or module.
inner Inner product of two arrays.
insert Insert values along the given axis before the given indices.
int16 Signed integer type, compatible with C ‘short’.
int32 Signed integer type, compatible with C ‘int’.
int64 Signed integer type, compatible with Python ‘int’ and C ‘long’.
int8 Signed integer type, compatible with C ‘char’.
int_ Signed integer type, compatible with Python ‘int’ and C ‘long’.
intc Signed integer type, compatible with C ‘int’.
integer Abstract base class of all integer scalar types.
interp One-dimensional linear interpolation for monotonically increasing sample points.
intersect1d Find the intersection of two arrays.
intp Signed integer type, compatible with Python ‘int’ and C ‘long’.
invert Compute bit-wise inversion, or bit-wise NOT, element-wise.
is_busday Calculates which of the given dates are valid days, and which are not.
isclose Returns a boolean array where two arrays are element-wise equal within a tolerance.
iscomplex Returns a bool array, where True if input element is complex.
iscomplexobj Check for a complex type or an array of complex numbers.
isdtype Determine if a provided dtype is of a specified data type kind.
isfinite Test element-wise for finiteness (not infinity and not Not a Number).
isfortran Check if the array is Fortran contiguous but not C contiguous.
isin Calculates ‘element in test_elements’, broadcasting over ‘element’ only.
isinf Test element-wise for positive or negative infinity.
isnan Test element-wise for NaN and return result as a boolean array.
isnat Test element-wise for NaT (not a time) and return result as a boolean array.
isneginf Test element-wise for negative infinity, return result as bool array.
isposinf Test element-wise for positive infinity, return result as bool array.
isreal Returns a bool array, where True if input element is real.
isrealobj Return True if x is a not complex type or an array of complex numbers.
isscalar Returns True if the type of ‘element’ is a scalar type.
issubdtype Returns True if first argument is a typecode lower/equal in type hierarchy.
issubsctype Determine if the first argument is a subclass of the second argument.
iterable Check whether or not an object can be iterated over.
ix_ Construct an open mesh from multiple sequences.
kaiser Return the Kaiser window.
kernel_version Built-in immutable sequence.
kron Kronecker product of two arrays.
lcm Returns the lowest common multiple of ‘x1’ and ‘x2’
ldexp Returns x1 * 2^x2, element-wise.
left_shift Shift the bits of an integer to the left.
less Return the truth value of `` element-wise.
less_equal Return the truth value of `` element-wise.
lexsort Perform an indirect stable sort using a sequence of keys.
lib Note: almost all functions in the ‘numpy.lib’ namespace
linspace Return evenly spaced numbers over a specified interval.
little_endian bool(x) -> bool
load Load arrays or pickled objects from ‘.npy’, ‘.npz’ or pickled files.
loadtxt Load data from a text file.
log Natural logarithm, element-wise.
log10 Return the base 10 logarithm of the input array, element-wise.
log1p Return the natural logarithm of one plus the input array, element-wise.
log2 Base-2 logarithm of ‘x’.
logaddexp Logarithm of the sum of exponentiations of the inputs.
logaddexp2 Logarithm of the sum of exponentiations of the inputs in base-2.
logical_and Compute the truth value of x1 AND x2 element-wise.
logical_not Compute the truth value of NOT x element-wise.
logical_or Compute the truth value of x1 OR x2 element-wise.
logical_xor Compute the truth value of x1 XOR x2, element-wise.
logspace Return numbers spaced evenly on a log scale.
longdouble Extended-precision floating-point number type, compatible with C ‘long double’ but not necessarily with IEEE 754 quadruple-precision.
longlong Signed integer type, compatible with C ‘long long’.
lookfor Do a keyword search on docstrings.
mask_indices Return the indices to access (n, n) arrays, given a masking function.
math This module provides access to the mathematical functions defined by the C standard.
matrix_transpose Transposes a matrix (or a stack of matrices) x.
matmul Matrix product of two arrays.
matvec Matrix-vector dot product of two arrays.
max Return the maximum of an array or maximum along an axis.
maximum Element-wise maximum of array elements.
may_share_memory Determine if two arrays might share memory
mean Compute the arithmetic mean along the specified axis.
median Compute the median along the specified axis.
memmap Create a memory-map to an array stored in a binary file on disk.
meshgrid Return a list of coordinate matrices from coordinate vectors.
mgrid An instance which returns a dense multi-dimensional “meshgrid”.
min Parameters
min_scalar_type For scalar ‘a’, returns the data type with the smallest size and smallest scalar kind which can hold its value.
minimum Element-wise minimum of array elements.
mintypecode Return the character for the minimum-size type to which given types can be safely cast.
mod Returns the element-wise remainder of division.
modf Return the fractional and integral parts of an array, element-wise.
moveaxis Other axes remain in their original order.
multiply Multiply arguments element-wise.
nan Convert a string or number to a floating point number, if possible.
nan_to_num Replace NaN with zero and infinity with large finite numbers (default behaviour).
nanargmax Return the indices of the maximum values in the specified axis ignoring NaNs.
nanargmin Return the indices of the minimum values in the specified axis ignoring NaNs.
nancumprod Return the cumulative product of array elements over a given axis treating Not a Numbers (NaNs) as one.
nancumsum Return the cumulative sum of array elements over a given axis treating Not a Numbers (NaNs) as zero.
nanmax Return the maximum of an array or maximum along an axis, ignoring any NaNs.
nanmean Compute the arithmetic mean along the specified axis, ignoring NaNs.
nanmedian Compute the median along the specified axis, while ignoring NaNs.
nanmin Return minimum of an array or minimum along an axis, ignoring any NaNs.
nanpercentile Compute the qth percentile of the data along the specified axis, while ignoring nan values.
nanprod Return the product of array elements over a given axis treating Not a Numbers (NaNs) as ones.
nanquantile Compute the qth quantile of the data along the specified axis, while ignoring nan values.
nanstd Compute the standard deviation along the specified axis, while ignoring NaNs.
nansum Return the sum of array elements over a given axis treating Not a Numbers (NaNs) as zero.
nanvar Compute the variance along the specified axis, while ignoring NaNs.
ndenumerate Multidimensional index iterator.
ndim Return the number of dimensions of an array.
ndindex An N-dimensional iterator object to index arrays.
nditer Efficient multi-dimensional iterator object to iterate over arrays.
negative Numerical negative, element-wise.
nested_iters Create nditers for use in nested loops
nextafter Return the next floating-point value after x1 towards x2, element-wise.
nonzero Return the indices of the elements that are non-zero.
not_equal Return (x1 != x2) element-wise.
numarray Help for removed not found.
number Abstract base class of all numeric scalar types.
object_ Any Python object.
ogrid An instance which returns an open multi-dimensional “meshgrid”.
oldnumeric Help for removed not found.
ones Return a new array of given shape and type, filled with ones.
ones_like Return array of ones with the same shape and type as given array.
outer Compute the outer product of two vectors.

numpy P — Z

Member Description
packbits Packs the elements of a binary-valued array into bits in a uint8 array.
pad Pad an array.
partition Return a partitioned copy of an array.
percentile Compute the q-th percentile of the data along the specified axis.
pi Convert a string or number to a floating point number, if possible.
piecewise Evaluate a piecewise-defined function.
place Change elements of an array based on conditional and input values.
poly Find the coefficients of a polynomial with the given sequence of roots.
poly1d A one-dimensional polynomial class.
polyadd Find the sum of two polynomials.
polyder Return the derivative of the specified order of a polynomial.
polydiv Returns the quotient and remainder of polynomial division.
polyfit Least squares polynomial fit.
polyint Return an antiderivative (indefinite integral) of a polynomial.
polymul Find the product of two polynomials.
polynomial A sub-package for efficiently dealing with polynomials.
polysub Difference (subtraction) of two polynomials.
polyval Evaluate a polynomial at specific values.
positive Numerical positive, element-wise.
power First array elements raised to powers from second array, element-wise.
printoptions Context manager for setting print options.
prod Return the product of array elements over a given axis.
product Return the product of array elements over a given axis.
promote_types Returns the data type with the smallest size and smallest scalar kind to which both ‘type1’ and ‘type2’ may be safely cast.
ptp Range of values (maximum — minimum) along an axis.
put Replaces specified elements of an array with given values.
put_along_axis Put values into the destination array by matching 1d index and data slices.
putmask Changes elements of an array based on conditional and input values.
quantile Compute the q-th quantile of the data along the specified axis.
r_ Translates slice objects to concatenation along the first axis.
rad2deg Convert angles from radians to degrees.
radians Convert angles from degrees to radians.
ravel Return a contiguous flattened array.
ravel_multi_index Converts a tuple of index arrays into an array of flat indices, applying boundary modes to the multi-index.
real Return the real part of the complex argument.
real_if_close If input is complex with all imaginary parts close to zero, return real parts.
rec Record Arrays
recarray Construct an ndarray that allows field access using attributes.
recfromtxt Load ASCII data from a file and return it in a record array.
reciprocal Return the reciprocal of the argument, element-wise.
record A data-type scalar that allows field access as attribute lookup.
remainder Returns the element-wise remainder of division.
repeat Repeat each element of an array after themselves
require Return an ndarray of the provided type that satisfies requirements.
reshape Gives a new shape to an array without changing its data.
resize Return a new array with the specified shape.
result_type Returns the type that results from applying the NumPy
right_shift Shift the bits of an integer to the right.
rint Round elements of the array to the nearest integer.
roll Roll array elements along a given axis.
rollaxis Roll the specified axis backwards, until it lies in a given position.
roots Return the roots of a polynomial with coefficients given in p.
rot90 Rotate an array by 90 degrees in the plane specified by axes.
round Evenly round to the given number of decimals.
row_stack Stack arrays in sequence vertically (row wise).
s_ A nicer way to build up index tuples for arrays.
save Save an array to a binary file in NumPy ‘.npy’ format.
savetxt Save an array to a text file.
savez Save several arrays into a single file in uncompressed ‘.npz’ format.
savez_compressed Save several arrays into a single file in compressed ‘.nfastputmaskpz’ format.
sctypeDict dict() -> new empty dictionary
searchsorted Find indices where elements should be inserted to maintain order.
select Return an array drawn from elements in choicelist, depending on conditions.
set_printoptions Set printing options.
setbufsize Set the size of the buffer used in ufuncs.
setdiff1d Find the set difference of two arrays.
seterr Set how floating-point errors are handled.
seterrcall Set the floating-point error callback function or log object.
setxor1d Find the set exclusive-or of two arrays.
shape Return the shape of an array.
shares_memory Determine if two arrays share memory.
short Signed integer type, compatible with C ‘short’.
show_config Show libraries and system information on which NumPy was built and is being used
sign Returns an element-wise indication of the sign of a number.
signbit Returns element-wise True where signbit is set (less than zero).
signedinteger Abstract base class of all signed integer scalar types.
sin Trigonometric sine, element-wise.
sinc Return the normalized sinc function.
single Single-precision floating-point number type, compatible with C ‘float’.
sinh Hyperbolic sine, element-wise.
size Return the number of elements along a given axis.
sometrue Check whether some values are true.
sort Return a sorted copy of an array.
sort_complex Sort a complex array using the real part first, then the imaginary part.
spacing Return the distance between x and the nearest adjacent number.
split Split an array into multiple sub-arrays as views into ‘ary’.
sqrt Return the non-negative square-root of an array, element-wise.
square Return the element-wise square of the input.
squeeze Remove axes of length one from ‘a’.
stack Join a sequence of arrays along a new axis.
std Compute the standard deviation along the specified axis.
str_ A unicode string.
subtract Subtract arguments, element-wise.
sum Sum of array elements over a given axis.
swapaxes Interchange two axes of an array.
take Take elements from an array along an axis.
take_along_axis Take values from the input array by matching 1d index and data slices.
tan Compute tangent element-wise.
tanh Compute hyperbolic tangent element-wise.
tensordot Compute tensor dot product along specified axes.
tile Construct an array by repeating A the number of times given by reps.
timedelta64 A timedelta stored as a 64-bit integer.
trace Return the sum along diagonals of the array.
transpose Returns an array with axes transposed.
trapz Integrate along the given axis using the composite trapezoidal rule.
tri An array with ones at and below the given diagonal and zeros elsewhere.
tril Lower triangle of an array.
tril_indices Return the indices for the lower-triangle of an (n, m) array.
tril_indices_from Return the indices for the lower-triangle of arr.
trim_zeros Trim the leading and/or trailing zeros from a 1-D array or sequence.
triu Upper triangle of an array.
triu_indices Return the indices for the upper-triangle of an (n, m) array.
triu_indices_from Return the indices for the upper-triangle of arr.
true_divide Divide arguments element-wise.
trunc Return the truncated value of the input, element-wise.
typecodes dict() -> new empty dictionary
typename Return a description for the given data type code.
ubyte Unsigned integer type, compatible with C ‘unsigned char’.
ufunc Functions that operate element by element on whole arrays.
uint Unsigned integer type, compatible with C ‘unsigned long’.
uint16 Unsigned integer type, compatible with C ‘unsigned short’.
uint32 Unsigned integer type, compatible with C ‘unsigned int’.
uint64 Unsigned integer type, compatible with C ‘unsigned long’.
uint8 Unsigned integer type, compatible with C ‘unsigned char’.
uintc Unsigned integer type, compatible with C ‘unsigned int’.
uintp Unsigned integer type, compatible with C ‘unsigned long’.
ulonglong Signed integer type, compatible with C ‘unsigned long long’.
union1d Find the union of two arrays.
unique Find the unique elements of an array.
unique_all Find the unique elements of an array, and counts, inverse, and indices.
unique_counts Find the unique elements and counts of an input array x.
unique_inverse Find the unique elements of x and indices to reconstruct x.
unique_values Returns the unique elements of an input array x.
unpackbits Unpacks elements of a uint8 array into a binary-valued output array.
unravel_index Converts a flat index or array of flat indices into a tuple of coordinate arrays.
unsignedinteger Abstract base class of all unsigned integer scalar types.
unstack Split an array into a sequence of arrays along the given axis.
unwrap Unwrap by taking the complement of large deltas with respect to the period.
ushort Unsigned integer type, compatible with C ‘unsigned short’.
vander Generate a Vandermonde matrix.
var Compute the variance along the specified axis.
vdot Return the dot product of two vectors.
vecdot Vector dot product of two arrays.
vecmat Vector-matrix dot product of two arrays.
vectorize Returns an object that acts like pyfunc, but takes arrays as input.
void Create a new structured or unstructured void scalar.
vsplit Split an array into multiple sub-arrays vertically (row-wise).
vstack Stack arrays in sequence vertically (row wise).
where Return elements chosen from ‘x’ or ‘y’ depending on ‘condition’.
zeros Return a new array of given shape and type, filled with zeros.
zeros_like Return an array of zeros with the same shape and type as a given array.

C.1.1  numpy.ndarray

ndarray(shape, dtype=float, buffer=None, offset=0, 
        strides=None, order=None)

An array object represents a multidimensional, homogeneous 
array of fixed-size items. An associated data-type object 
describes the format of each element in the array (its 
byte-order, how many bytes it occupies in memory, whether 
it is an integer, a floating point number, or something 
else, etc.)

Arrays should be constructed using `array`, `zeros` or 
`empty`. The parameters given here refer to a low-level 
method `) for instantiating an array.

Parameters
----------
shape : tuple of ints
    Shape of created array.
dtype : data-type, optional
    Any object that can be interpreted as numpy data type.
buffer : object exposing buffer interface, optional
    Used to fill the array with data.
offset : int, optional
    Offset of array data in buffer.
strides : tuple of ints, optional
    Strides of data in memory.
order : {'C', 'F'}, optional
    Row-major (C-style) or column-major order.
Member Description
alignment The required alignment (bytes) of this data-type according to the compiler.
base Returns dtype for the base element of the subarrays, regardless of their dimension or shape.
byteorder Character indicating the byte-order of this dtype object.
char A unique character code for each of the built-in types.
descr ‘__array_interface__’ description of the data-type.
fields Dictionary of named fields defined for this type or None.
flags Bit-flags describing how this data type is to be interpreted.
hasobject Boolean indicating whether this dtype contains any reference-counted objects in any fields or sub-dtypes.
isalignedstruct Boolean indicating whether the dtype is a struct which maintains field alignment.
isbuiltin Integer indicating how this dtype relates to built-in dtypes.
isnative Boolean indicating whether the byte order of this dtype is native to the platform.
itemsize The element size of this data-type object.
kind A character code (one of ‘biufcmMOSUV’) identifying the general kind of data.
metadata None, or readonly dict of metadata (mappingproxy).
name A bit-width name for this data-type.
names Ordered list of field names, or ‘None’ if there are no fields.
ndim Number of dimensions of the sub-array if this data type describes a sub-array, and ‘0’ otherwise.
newbyteorder Return a new dtype with a different byte order.
num A unique number for each of the 21 different built-in types.
shape Shape tuple of the sub-array if this data type describes a sub-array, and ‘()’ otherwise.
str The array-protocol typestring of this data-type object.
subdtype Tuple ‘(item_dtype, shape)’ if this ‘dtype’ describes a sub-array, and None otherwise.
to_device Tuple ‘(item_dtype, shape)’ if this ‘dtype’ describes a sub-array, and None otherwise.

C.1.2  numpy.dtype

dtype(dtype, align=False, copy=False, [metadata])

Create a data type object. 

A numpy array is homogeneous, and contains elements 
described by a dtype object. A dtype object can be 
constructed from different combinations of fundamental 
numeric types.

Parameters
----------
dtype
    Object to be converted to a data type object.
align : bool, optional
    Add padding to the fields to match what a C compiler 
    would output for a similar C-struct. Can be 'True' 
    only if `obj` is a dictionary or a comma-separated 
    string. If a struct dtype is being created, this also 
    sets a sticky alignment flag 'isalignedstruct'.
copy : bool, optional
    Make a new copy of the data-type object. If 'False', 
    the result may just be a reference to a built-in 
    data-type object.
metadata : dict, optional
    An optional dictionary with dtype metadata.
Member Description
alignment The required alignment (bytes) of this data-type according to the compiler.
base Returns dtype for the base element of the subarrays, regardless of their dimension or shape.
byteorder A character indicating the byte-order of this data-type object.
char A unique character code for each of the 21 different built-in types.
descr ‘__array_interface__’ description of the data-type.
fields Dictionary of named fields for this type or None.
flags Bit-flags describing how this data type is to be interpreted.
hasobject Boolean indicating whether this dtype contains any reference-counted objects in any fields or sub-dtypes.
isalignedstruct Boolean indicating whether the dtype is a struct which maintains field alignment.
isbuiltin Integer indicating how this dtype relates to the built-in dtypes.
isnative Boolean indicating whether the byte order of this dtype is native to the platform.
itemsize The element size of this data-type object.
kind A character code (one of ‘biufcmMOSUV’) identifying the general kind of data.
metadata Either ‘None’ or a readonly dictionary of metadata.
name A bit-width name for this data-type.
names Ordered list of field names, or ‘None’ if there are no fields.
ndim Number of dimensions of the sub-array if this data type describes a sub-array, and ‘0’ otherwise.
newbyteorder Return a new dtype with a different byte order.
num A unique number for each of the 21 different built-in types.
shape Shape tuple of the sub-array if this data type describes a sub-array, and ‘()’ otherwise.
str The array-protocol typestring of this data-type object.
subdtype Tuple ‘(item_dtype, shape)’ if this ‘dtype’ describes a sub-array, and None otherwise.

C.2  numpy.linalg

The NumPy linear algebra functions rely on BLAS and 
LAPACK to provide efficient low level implementations 
of standard linear algebra algorithms. Those libraries 
may be provided by NumPy itself using C versions of a 
subset of their reference implementations but, when 
possible, highly optimized libraries that take 
advantage of specialized processor functionality are 
preferred. Examples of such libraries are OpenBLAS, 
MKL (TM), and ATLAS. Because those libraries are 
multithreaded and processor dependent, environmental 
variables and external packages such as threadpoolctl 
may be needed to control the number of threads or 
specify the processor architecture. 

- OpenBLAS: https://www.openblas.net/
- threadpoolctl: https://github.com/joblib/threadpoolctl

Please note that the most-used linear algebra functions 
in NumPy are present in the main 'numpy' namespace 
rather than in 'numpy.linalg'. There are: 'dot', 
'vdot', 'inner', 'outer', 'matmul', 'tensordot', 
'einsum', 'einsum_path' and 'kron'.
Member Description
cholesky Cholesky decomposition.
cond Compute the condition number of a matrix.
det Compute the determinant of an array.
diagonal Returns specified diagonals of a matrix (or a stack of matrices) x.
eig Compute eigenvalues & right eigenvectors of a square array.
eigh Return the eigenvalues and eigenvectors of a complex Hermitian (conjugate symmetric) or a real symmetric matrix.
eigvals Compute the eigenvalues of a general matrix.
eigvalsh Compute the eigenvalues of a complex Hermitian or real symmetric matrix.
inv Compute the (multiplicative) inverse of a matrix.
lstsq Return the least-squares solution to a linear matrix equation.
matrix_power Raise a square matrix to the (integer) power ‘n’.
matrix_rank Return matrix rank of array using SVD method.
matrix_norm Computes the matrix norm of a matrix (or a stack of matrices) x.
matrix_transpose Transposes a matrix (or a stack of matrices) x.
multi_dot Compute the dot product of two or more arrays in a single function call, while automatically selecting the fastest evaluation order.
norm Matrix or vector norm.
pinv Compute the (Moore-Penrose) pseudo-inverse of a matrix.
qr Compute the qr factorization of a matrix.
svdvals Returns the singular values of a matrix (or a stack of matrices) x.
slogdet Compute the sign and (natural) logarithm of the determinant of an array.
solve Solve a linear matrix equation, or system of linear scalar equations.
svd Singular Value Decomposition.
tensorinv Compute the ‘inverse’ of an N-dimensional array.
tensorsolve Solve the tensor equation ‘a x = b’ for x.
trace Returns the sum along the specified diagonals of a matrix (or a stack of matrices) x.
vecdot Computes the vector dot product.
vector_norm Computes the vector norm of a vector (or batch of vectors) x.

C.3  numpy.fft

The SciPy module 'scipy.fft' is a more comprehensive 
superset of 'numpy.fft', which includes only a basic 
set of routines. 
Member Description
fft Compute the one-dimensional discrete Fourier Transform.
fft2 Compute the 2-dimensional discrete Fourier Transform.
fftfreq Return the Discrete Fourier Transform sample frequencies.
fftn Compute the N-dimensional discrete Fourier Transform.
fftshift Shift the zero-frequency component to the center of the spectrum.
helper Discrete Fourier Transforms — helper.py
hfft Compute the FFT of a signal that has Hermitian symmetry, i.e., a real spectrum.
ifft Compute the one-dimensional inverse discrete Fourier Transform.
ifft2 Compute the 2-dimensional inverse discrete Fourier Transform.
ifftn Compute the N-dimensional inverse discrete Fourier Transform.
ifftshift The inverse of ‘fftshift’. Although identical for even-length ‘x’, the functions differ by one sample for odd-length ‘x’.
ihfft Compute the inverse FFT of a signal that has Hermitian symmetry.
irfft Computes the inverse of ‘rfft’.
irfft2 Computes the inverse of ‘rfft2’.
irfftn Computes the inverse of ‘rfftn’.
rfft Compute the one-dimensional discrete Fourier Transform for real input.
rfft2 Compute the 2-dimensional FFT of a real array.
rfftfreq Return the Discrete Fourier Transform sample frequencies (for usage with rfft, irfft).
rfftn Compute the N-dimensional discrete Fourier Transform for real input.

C.4  numpy.random

The numpy.random module is a NumPy sub-package, primarily used for generating random numbers and performing various statistical operations. The module provides a suite of functions that support many aspects of randomisation and probability distributions.

Member Description
beta Draw samples from a Beta distribution.
binomial Draw samples from a binomial distribution.
bit_generator BitGenerator base class and SeedSequence used to seed the BitGenerators.
bytes Return random bytes.
chisquare Draw samples from a chi-square distribution.
choice Generates a random sample from a given 1-D array
default_rng Construct a new Generator with the default BitGenerator (PCG64).
dirichlet Draw samples from the Dirichlet distribution.
exponential Draw samples from an exponential distribution.
f Draw samples from an F distribution.
gamma Draw samples from a Gamma distribution.
geometric Draw samples from the geometric distribution.
get_state Return a tuple representing the internal state of the generator.
gumbel Draw samples from a Gumbel distribution.
hypergeometric Draw samples from a Hypergeometric distribution.
laplace Draw samples from the Laplace or double exponential distribution with specified location (or mean) and scale (decay).
logistic Draw samples from a logistic distribution.
lognormal Draw samples from a log-normal distribution.
logseries Draw samples from a logarithmic series distribution.
multinomial Draw samples from a multinomial distribution.
multivariate_normal Draw random samples from a multivariate normal distribution.
negative_binomial Draw samples from a negative binomial distribution.
noncentral_chisquare Draw samples from a noncentral chi-square distribution.
noncentral_f Draw samples from the noncentral F distribution.
normal Draw random samples from a normal (Gaussian) distribution.
pareto Draw samples from a Pareto II or Lomax distribution with specified shape.
permutation Randomly permute a sequence, or return a permuted range.
poisson Draw samples from a Poisson distribution.
power Draws samples in [0, 1] from a power distribution with positive exponent a — 1.
rand Random values in a given shape.
randint Return random integers from ‘low’ (inclusive) to ‘high’ (exclusive).
randn Return a sample (or samples) from the “standard normal” distribution.
random Return random floats in the half-open interval [0.0, 1.0).
random_integers Random integers of type ‘np.int_’ between ‘low’ and ‘high’, inclusive.
random_sample Return random floats in the half-open interval [0.0, 1.0).
ranf This is an alias of ‘random_sample’. See ‘random_sample’ for the complete documentation.
rayleigh Draw samples from a Rayleigh distribution.
sample This is an alias of ‘random_sample’. See ‘random_sample’ for the complete documentation.
seed Reseed the singleton RandomState instance.
set_state Set the internal state of the generator from a tuple.
shuffle Modify a sequence in-place by shuffling its contents.
standard_cauchy Draw samples from a standard Cauchy distribution with mode = 0.
standard_exponential Draw samples from the standard exponential distribution.
standard_gamma Draw samples from a standard Gamma distribution.
standard_normal Draw samples from a standard Normal distribution (mean=0, stdev=1).
standard_t Draw samples from a standard Student’s t distribution with ‘df’ degrees of freedom.
triangular Draw samples from the triangular distribution over the interval ‘[left, right]’.
uniform Draw samples from a uniform distribution.
vonmises Draw samples from a von Mises distribution.
wald Draw samples from a Wald, or inverse Gaussian, distribution.
weibull Draw samples from a Weibull distribution.
zipf Draw samples from a Zipf distribution.

C.5  numpy.polynomial

A sub-package for efficiently dealing with polynomials.

Member Description
Polynomial Power series
Chebyshev Chebyshev series
Legendre Legendre series
Laguerre Laguerre series
Hermite Hermite series
HermiteE HermiteE series

C.6  numpy.strings

The numpy.strings module provides a set of universal functions operating on arrays of type numpy.str_ or numpy.bytes_.

String operations Description
add Add arguments element-wise.
center Return a copy of a with its elements centered in a string of length width.
capitalize Return a copy of a with only the first character of each element capitalized.
decode Calls bytes.decode element-wise.
encode Calls str.encode element-wise.
expandtabs Return a copy of each string element where all tab characters are replaced by one or more spaces.
ljust Return an array with the elements of a left-justified in a string of length width.
lower Return an array with the elements converted to lowercase.
lstrip For each element in a, return a copy with the leading characters removed.
mod Return (a % i), that is pre-Python 2.6 string formatting (interpolation), element-wise for a pair of array_likes of str or unicode.
multiply Return (a * i), that is string multiple concatenation, element-wise.
partition Partition each element in a around sep.
replace For each element in a, return a copy of the string with occurrences of substring old replaced by new.
rjust Return an array with the elements of a right-justified in a string of length width.
rpartition Partition (split) each element around the right-most separator.
rstrip For each element in a, return a copy with the trailing characters removed.
strip For each element in a, return a copy with the leading and trailing characters removed.
swapcase Return element-wise a copy of the string with uppercase characters converted to lowercase and vice versa.
title Return element-wise title cased version of string or unicode.
translate For each element in a, return a copy of the string where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table.
upper Return an array with the elements converted to uppercase.
zfill Return the numeric string left-filled with zeros.
Comparisons Description
equal Return (x1 == x2) element-wise.
not_equal Return (x1 != x2) element-wise.
greater_equal Return the truth value of (x1 >= x2) element-wise.
less_equal Return the truth value of (x1 <= x2) element-wise.
greater Return the truth value of (x1 > x2) element-wise.
less Return the truth value of (x1 < x2) element-wise.
String information Description
count Returns an array with the number of non-overlapping occurrences of substring sub in the range [start, end).
endswith Returns a boolean array which is True where the string element in a ends with suffix, otherwise False.
find For each element, return the lowest index in the string where substring sub is found, such that sub is contained in the range [start, end).
index Like find, but raises ValueError when the substring is not found.
isalnum Returns true for each element if all characters in the string are alphanumeric and there is at least one character, false otherwise.
isalpha Returns true for each element if all characters in the data interpreted as a string are alphabetic and there is at least one character, false otherwise.
isdecimal For each element, return True if there are only decimal characters in the element.
isdigit Returns true for each element if all characters in the string are digits and there is at least one character, false otherwise.
islower Returns true for each element if all cased characters in the string are lowercase and there is at least one cased character, false otherwise.
isnumeric For each element, return True if there are only numeric characters in the element.
isspace Returns true for each element if there are only whitespace characters in the string and there is at least one character, false otherwise.
istitle Returns true for each element if the element is a titlecased string and there is at least one character, false otherwise.
isupper Return true for each element if all cased characters in the string are uppercase and there is at least one character, false otherwise.
rfind For each element, return the highest index in the string where substring sub is found, such that sub is contained in the range [start, end).
rindex Like rfind, but raises ValueError when the substring sub is not found.
startswith Returns a boolean array which is True where the string element in a starts with prefix, otherwise False.
str_len Returns the length of each element.

  1. NumPy docs. numpy.org. numpy.org/doc/stable/reference/arrays.classes.html↩︎

Appendix D. Solutions to Exercises


Answers to exercises assume, where required, that numpy has been imported:

    import numpy as np

Please note that some of the output and results have been formatted for better display.


D.1  Chapter 3

Exercise 3-1

Create a new 64 byte dtype of float type, using a sized alias, and assign it to the variable t. Print out the variable.

    t = np.dtype('float64')
    print(t)
float64

Exercise 3-2

Repeat the above dtype creation, but instead using an equivalent native Python type.

    t = np.dtype(float)
    print(t)
float64

Exercise 3-3

Write a Python expression that calculates the area of a circle with radius of 30mm.

    # mm units
    radius = 30
    # Formula: a = pi * r * r
    area_mm_sq = np.pi * (radius**2)
    # In units of mm-squared
    print(area_mm_sq)
2827.4333882308138

Exercise 3-4

Convert the result of the area of the circle to units of cm2. Print the result.

    area_cm_sq = area_mm_sq / (10**2)
    print(area_cm)
28.274333882308138

D.2  Chapter 4

Exercise 4-1

Create a 2-D Python list of the test scores, with the student scores as “rows” in test order. Assign this list to the variable student_scores_list. Print out this list.

    student_scores_list = [
        [1, 63.5, 56.0, 68.0, 73.5],
        [2, 53.0, 77.5, 61.0, 83.0],
        [3, 59.0, 79.0, 67.5, 70.0]
    ]
    print(student_scores_list)
[[1, 63.5, 56.0, 68.0, 73.5], 
 [2, 53.0, 77.5, 61.0, 83.0], 
 [3, 59.0, 79.0, 67.5, 70.0]]

Exercise 4-2

Using the list from Exercise 4-1, create a NumPy array assigned to student_scores_arr, and explicitly assign it an appropriate floating point dtype. Print out the array.

    student_scores_arr = np.array(student_scores_list, 
                                  dtype=float)
    print(student_scores_arr)
[[ 1.  63.5 56.  68.  73.5]
 [ 2.  53.  77.5 61.  83. ]
 [ 3.  59.  79.  67.5 70. ]]

NumPy arrays must be of the same type, in this case whole numbers (integers) were promoted to floating point values.

Exercise 4-3

Create a copy of student_scores_arr whilst assigning it to a new variable. Change the type of the copied array to a suitable integer. What do the scores look like now? What effect did the conversion have on the values?

    scores_arr_cp = scores_arr.astype(np.int_)
    print(scores_arr_cp)
[[ 1 63 56 68 73]
 [ 2 53 77 61 83]
 [ 3 59 79 67 70]]

The effect of the conversion was to round down (take the floor of) the values.

Exercise 4-4

Create a new array filled with ones, of the same dimensions as student_scores_arr. Print the array. What is the dtype of this array?

    arr_ones = np.ones_like(student_scores_arr)
    print(arr_ones)
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

Exercise 4-5

Create an identity matrix of 4x4 size, and print the result.

    arr_ident = np.identity(n=4, dtype=int)
    print(arr_ident)
[[1 0 0 0]
 [0 1 0 0]
 [0 0 1 0]
 [0 0 0 1]]

Exercise 4-6

Design a suitable named compound type for student_scores_arr (using sensible names without spaces), where the student id is an integer, and the scores are a floating point. Recreate the scores array to use this compound type. Print out the dtype for the array. (Hint: use tuples as rows.)

    t = np.dtype([
        ('student', 'int'), 
        ('test1', float), 
        ('test2', float), 
        ('test3', float)]) 
    # Note the use of tuples as rows..
    student_scores_arr = np.array([
        (1, 85.5, 90.0, 78.5, 92.0),  
        (2, 79.0, 88.5, 95.5, 87.0),
        (3, 92.5, 87.0, 89.5, 91.0)], 
        dtype=t)
    print(student_scores_arr.dtype)
dtype([('student', '<i8'), 
       ('test1', '<f8'), 
       ('test2', '<f8'), 
       ('test3', '<f8')])

Exercise 4-7

Convert the structured array created in Exercise 4-6 to a recarray array. Print out the 2nd column of the record array by name.

    scores_rec = student_scores_arr.view(np.recarray)
    scores_rec.test1
array([85.5, 79. , 92.5])

D.3  Chapter 5

Exercise 5-1

What is the shape of the NumPy array? Use the array’s shape property to confirm your conclusion.

This is a 2x3 array. This can be confirmed using:

    a.shape
(2, 3)

Exercise 5-2

What is the size of each element in this array, in bytes?

    i = a.itemsize
    print(i)
8

Exercise 5-3

Use the appropriate inspection property to find the total bytes consumed by the array. How does this compare to the multiple of the previous exercise’s result by the total count of elements?

    b = a.nbytes 
    print(b)
48

size gives you the number of elements, therefore:

    b = i * a.size 
    print(b)
48

Exercise 5-4

What is the string name of the data type of this array?

    a.dtype.name
'int64'

Exercise 5-5

If a third row containing the elements [7 8 9] was added to the array, what would be the number of dimensions?

Adding a new row simply increases the size of the 0th dimension, so the number of dimensions remains unchanged at 2. The shape, however, becomes (3,3).

    a = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)]) 
    a.ndim
2
    a.shape
(3, 3)

D.4  Chapter 6

Exercise 6-1

Load the file into a NumPy array, excluding the header, and print the array.

    # Set f to the file's path on your computer:
    f = '/home/student_scores.csv'
    a = np.loadtxt(f, delimiter=',', skiprows=1)
    print(a)
array([[ 1. , 63.5, 56. , 68. , 73.5],
       [ 2. , 53. , 77.5, 61. , 83. ],
       [ 3. , 59. , 79. , 67.5, 70. ]])

Exercise 6-2

Load the array instead using the compound type. Print the new array.

    from numpy import genfromtxt
    r = genfromtxt(f, delimiter=',', dtype=t, names=True)
    print(r)
array([(1, 63.5, 56. , 68. ), 
       (2, 53. , 77.5, 61. ),
       (3, 59. , 79. , 67.5)],
       dtype=[('student_no', '<i8'), 
              ('test_1', '<f8'), 
              ('test_2', '<f8'), 
              ('test_3', '<f8')])

Exercise 6-3

Convert the array you created in Exercise 6-2 to a recarray, and print out a list of the student numbers using the ‘array.field’ notation.

    r = r.view(np.recarray)
    r.student_no
array([1, 2, 3])

D.5  Chapter 7

Exercise 7-1

Get a list of (only) the scores of the 1st student.

    scores_array[0, 1:]
array([63.5, 56. , 68. , 73.5])

Exercise 7-2

Print out student IDs of the 1st two students.

    scores_array[:2, 0]
array([1., 2.])

Exercise 7-3

Modify the score of the 2nd student in the second test to 87.

    scores_array[1, 2] = 87
    print(scores_array[1,:])
 [ 2.  53.  87.  61.  83. ]

Exercise 7-4

Print the scores of all students in the 3rd test.

    scores_array[:, 3]
array([68. , 61. , 67.5])

Exercise 7-5

Modify the scores of all students in the 4th test to 75.

    scores_array[:, 4] = 75
    print(scores_array)
[[ 1.  63.5 56.  68.  75. ]
 [ 2.  53.  87.  61.  75. ]
 [ 3.  59.  79.  67.5 75. ]]

Exercise 7-6

Print the scores of the 2nd student in the last two tests.

    scores_array[1, -2:]
array([61., 75.])

Exercise 7-7

Modify the scores of all students in the 1st test to 70.

    scores_array[:, 1] = 70
    print(scores_array)
[[ 1.  70.  56.  68.  75. ]
 [ 2.  70.  87.  61.  75. ]
 [ 3.  70.  79.  67.5 75. ]]

Exercise 7-8

Print the ID and scores of the last student.

    scores_array[-1,0]
    scores_array[-1, 1:]
3.0
array([70. , 79. , 67.5, 75. ])

Exercise 7-9

Repeat the selection from Example 7-8 but assign it to the variable sub_arr. Change the first score of sub_arr to 77. Is this sub-selection a view? Confirm by printing both sub_arr and scores_array to see if the original array has also been modified.

    sub_arr = scores_array[-1, 1:]
    sub_arr[0] = 77
    print(sub_arr)
    print(scores_array)

sub_arr is a view, you can see that both arrays have been modified:

array([77. , 79. , 67.5, 75. ])

array([[ 1. , 70. , 56. , 68. , 75. ],
       [ 2. , 70. , 87. , 61. , 75. ],
       [ 3. , 77. , 79. , 67.5, 75. ]])

Exercise 7-10

Retrieve an array of the scores only (no student id) and apply a conditional expression to return True|False for any scores over 80.

    scores_only = scores_array[:, 1:]
    scores_over_80 = scores_only > 80
    print(scores_over_80)
[[False False False False]
 [False False False  True]
 [False False False False]]

D.6  Chapter 8

Exercise 8-1

Perform element-wise addition (+) and multiplication (*) of the two arrays and print the results.

    # Element-wise addition using ufunc
    addition_result = np.add(a, b)
    print("Addition Result:", addition_result)
    # Element-wise multiplication using ufunc
    multiplication_result = np.multiply(a, b)
    print("Multiplication Result:", multiplication_result)
Addition Result: [5 7 9]

Multiplication Result: [ 4 10 18]

Exercise 8-2

Does the matrix dot operation between the arrays result in a valid array? Print the result. Can you explain how the result was generated?

    result = a.dot(b)
32

No. A dot operation on a 1x3 matrix with another 1x3 matrix results in a scalar value. This can be depicted as follows:

math


Using dot on two equal size 1-D arrays (n-vectors) performs a matrix dot-product (or scalar-product) operation. (This should not be confused with “scalar * array” or “array * array” arithmetic multiplication.) However, if one side of the dot operation in NumPy is a scalar value, then it will revert to “scalar * array”, or “array * array” multiplication if the vectors are of different size.

Exercise 8-3

Perform element-wise comparison (greater than, less than, equal to) between the elements of these arrays. Print the result of each comparison.

    greater_than_result = np.greater(a, b)
    less_than_result = np.less(a, b)
    equal_to_result = np.equal(a, b)
    print("Greater Than Result:", greater_than_result)
    print("Less Than Result:", less_than_result)
    print("Equal To Result:", equal_to_result)
Greater Than Result: [False False False]

Less Than Result: [ True  True  True]

Equal To Result: [False False False]

Exercise 8-4

Perform scalar division on the array a with the value 5. Print the resulting array. What is the dtype of the result?

    result = a / 5
    print(result)
    print(result.dtype)
array([0.2, 0.4, 0.6])

dtype('float64')

Exercise 8-5

Create a larger NumPy array b with dimensions (3, 3) containing random values. Then, add array a to b. Print the resulting array.

    b = np.random.randint(0, 10, size=(3, 3))
    result = a + b
    print(b)
    print(result)
[[8 7 6]
 [0 8 8]
 [1 0 0]]

[[ 9  9  9]
 [ 1 10 11]
 [ 2  2  3]]

Note, your final result will vary depending on the random values generated in b. The result is determined by adding a ([1, 2, 3]) to each row of b.

Exercise 8-6

What percentage of student scores are over 80?

    scores_only = scores_array[:, 1:]
    scores_over_80 = scores_only > 80
    num_scores_over_80 = np.sum(scores_over_80)
    perc_over_80 = (num_scores_over_80 / scores_only.size) 
                   * 100
    print(scores_over_80)
    print("Percentage of student scores over 80:", 
          perc_over_80, "%")
[[False False False False]
 [False False False  True]
 [False False False False]]

Percentage of student scores over 80: 8.333333333333332 %

Exercise 8-7

Compute the mean and standard deviation of the NumPy array arr = np.array([1, 2, 3, 4, 5]).

    mean_value = np.mean(arr)
    std_deviation = np.std(arr)
    print("Mean:", mean_value)
    print("Standard Deviation:", std_deviation)
Mean: 3.0

Standard Deviation: 1.4142135623730951

Exercise 8-8

Compute and print out the median and quartiles (25th and 75th percentiles) of the following NumPy array arr = np.array([10, 20, 30, 40, 50]).

    median_value = np.median(data)
    quartiles = np.percentile(data, [25, 75])
    print("Median:", median_value)
    print("25th Percentile (Q1):", quartiles[0])
    print("75th Percentile (Q3):", quartiles[1])
Median: 30.0

25th Percentile (Q1): 20.0

75th Percentile (Q3): 40.0

Exercise 8-9

Compute and print out the correlation coefficient between the two NumPy arrays x and y.

    corr_coefficient = np.corrcoef(x, y)[0, 1]
    print("Correlation coefficient:", corr_coefficient)
Correlation coefficient: -0.9881395367446534

The correlation coefficient is very near to -1, therefore there is a strong negative correlation; as x increases, y decreases linearly.

D.7  Chapter 9

Exercise 9-1

Convert the array a = np.array([[1, 2], [3, 4]])

    a = np.array([[1, 2], [3, 4]])
    a.transpose()
array([[1, 3],
       [2, 4]])

Exercise 9-2

Is it possible to reshape the 2x2 array a = np.array([[1, 2], [3, 4]]) into a 4x1 array? If so what is the procedure? Assign this new array to variable b.

    a = np.array([[1, 2], [3, 4]])
    b = np.reshape(a, (4,1))
    print(b)
array([[1],
       [2],
       [3],
       [4]])

Exercise 9-3

Flatten the arrays a and b from exercise 2, and combine them into a single 2x4 array assigned to c.

    a = np.ravel(a)
    b = np.ravel(b)
    c = np.vstack((a, b))
array([[1, 2, 3, 4],
       [1, 2, 3, 4]])

Exercise 9-4

Use a rotate operation upon c to convert it into a 2x4 array.

    # Using `c` from previous exercise..
    np.rot90(c, 1)
array([[4, 4],
       [3, 3],
       [2, 2],
       [1, 1]])

Exercise 9-5

Convert the 1-D array a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) into a 3x3 array sorted in reverse order.

The key is to reverse the array first:

   a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
   a = np.flip(a)
   np.split(a, 3)
array([[9, 8, 7],
       [6, 5, 4],
       [3, 2, 1]])

D.8  Chapter 10

Exercise 10-1

Swap the case of the array ['Hello', 'WORLD', 'FROM', 'Python'] so that uppercase letters are converted to lowercase, and lower to upper.

    a = np.array(['Hello', 'WORLD', 'FROM', 'Python'])
    np.strings.swapcase(a)
array(['hELLO', 'world', 'from', 'pYTHON!'], dtype='<U6')

Exercise 10-2

Pad the array of numeric strings ['42', '97', '2005', '025'] with zeros, up to a width of 4.

    a = np.array(['42', '97', '2005', '025'])
    np.strings.zfill(a, 4)
array(['0042', '0097', '2005', '0025'], dtype='<U4')

Exercise 10-3

Test whether the values in the array ['Los', 'Angeles', 'Year 2019'] consist only of alphabetic characters.

    a = np.array(['Los', 'Angeles', 'Year 2019'])
    np.strings.isalpha(a)
array([ True,  True, False])

Exercise 10-4

Find the length of each value in the previous array.

    np.strings.str_len(a)
array([3, 7, 9])