Basic Containers and Packages#

Python Lists#

Lists are a data structure provided as part of the Python language. Lists and more on lists..

A list is compound data type which is a mutable, indexed, and ordered collection of data.

Lists are often constructed using square brackets: [...]

x0 = [1, 2, 3] # list of numbers
print(x0)
print(type(x0))
[1, 2, 3]
<class 'list'>
x1 = ['hello', 'world'] # list of strings
print(x1)
print(type(x1))
['hello', 'world']
<class 'list'>
x = [x0, x1] # list of lists
print(x)
print(type(x))
[[1, 2, 3], ['hello', 'world']]
<class 'list'>

Usually lists will be generated programatically. One way you can do this is by using the append method

x = [] # empty list
for i in range(5):
    x.append(i)
x
[0, 1, 2, 3, 4]

or the extend method, which appends all elements in another list

x = [1,2,3]
y = [4,5,6]
x.extend(y) # extends x by y
x
[1, 2, 3, 4, 5, 6]

You can also extend lists using the + operator

[1,2,3] + [4,5,6]
[1, 2, 3, 4, 5, 6]

You can also generate lists using list comprehensions. Comprehensions are “Pythonic” which is a vauge term roughly meaning “something a Python programmer would write”.

x = [i for i in range(5)]
x
[0, 1, 2, 3, 4]
x = [i * i for i in range(5)]
x
[0, 1, 4, 9, 16]

Generally, comprensions consist of [expression loop conditional]

This looks a lot like set notation in mathematics. E.g. for the set $\(y = \{i \mid i \in x, i \ne 4\}\)$ we compute

y = [i for i in x if i != 4]
y
[0, 1, 9, 16]

Indexing#

Python is 0-indexed (like C, unlike fortran/Matlab). This means a list of length n will have indices that start at 0, and end at n-1.

This is the reason why range(n) iterates through the range 0,...,n-1

words = ["dog", "cat", "house"]
print(words[0])
print(words[1])
dog
cat

you can access elements starting at the back of the array using negative integers. A good way to think of this is the index -1 translates to n-1

print(words[-1])
print(words[-2])
house
cat

Slicing - you can use the colon character : to slice an array. The syntax is start:end:stride

x = [i for i in range(10)]
print(x)
print(x[:])
print(x[2:4])
print(x[2:9:3])
print(x[-3:-1])
print(x[::2])
print(x[::-1])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[2, 3]
[2, 5, 8]
[7, 8]
[0, 2, 4, 6, 8]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Lists are mutable, which means you can change elements

words = ["dog", "cat", "house"]
print(words)

words[0] = "mouse"
print(words)
['dog', 'cat', 'house']
['mouse', 'cat', 'house']

Other Python Collections#

There are other collections you might use in Python:

  • Tuples (...) are ordered, indexed, and immutable

  • Sets {...} are unordered, unindexed, and mutable

  • Dictionaries {...} are unordered, indexed, and mutable

These collections also support comprehensions.

You can find additional types of collections in the Collections module

x = (1,2,3) # tuple
print(x)
x = (i for i in range(1,4)) # tuple comprehension
print(tuple(x))
(1, 2, 3)
(1, 2, 3)
s = {1,2,3} # set
1 in s
True
s = {i for i in range(1,4)}
s
{1, 2, 3}
d = {'hello' : 0, 'goodbye': 1} # dictionary
print(type(d))
d['hello']
<class 'dict'>
0
d = {key: val for val, key in enumerate(['hello', 'goodbye'])}
d
{'hello': 0, 'goodbye': 1}

Numpy#

If you haven’t already:

conda install numpy

Numpy is perhaps the fundamental scientific computing package for Python - just about every other package for scientific computing uses it.

Numpy basically provides a ndarray type (n-dimensional array), and provides fast operations for arrays (i.e. compiled C or Fortran).

We’ll do some deeper dives into numpy in future lectures. For now, we’ll cover some basics. For those who want to dive in now, here are some tutorials

You can find lots of information in the numpy documentation

import numpy as np # import numpy into the np namespace

You can easily generate numpy arrays from list data

x = np.array([1,2,3])
print(x)
print(type(x))
[1 2 3]
<class 'numpy.ndarray'>

A 2-dimensional array can be generated by lists of lists

x = np.array([[1,2,3], [4,5,6]])
print(x)
[[1 2 3]
 [4 5 6]]

a few class members:

print(x.ndim) # number of dimensions
print(x.shape) # shape of array
print(x.size) # total number of elements in array
print(x.dtype) # data type
print(x.itemsize) # number of bytes for data type
print(x.data) # buffer location in memory
print(x.flags) # some flags
2
(2, 3)
6
int64
8
<memory at 0x7fa68019e0c0>
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

Other ways of obtaining numpy arrays:

# array from a range
a = np.arange(3)
print(a)

# an array of 1
a = np.ones((3,2), dtype=float)
print(a)

# just an array with no initialization - WARNING: data can be anything
a = np.empty((3,2), dtype=float)
print(a)

# random normal data
a = np.random.normal(size=(3,2))
print(a)
[0 1 2]
[[1. 1.]
 [1. 1.]
 [1. 1.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]]
[[-0.93526172  1.13533343]
 [-0.42972551 -2.13619811]
 [-0.53513088  0.01674356]]

Indexing#

1-dimensional arrays are indexed in the same way as lists (0-indexed, can use slices, etc)

a = np.arange(4)
print(a[:2]) 
[0 1]

you can also index using lists of indices

inds = [0,2]
a[inds] 
array([0, 2])

2-dimensional arrays are a bit different from lists of lists:

a = [[0,1],[2,3]] # list of lists
print(a)
a[1][1] # like indexing in C
[[0, 1], [2, 3]]
3
anp = np.array(a) # 2-dimensional array
anp[1,1] # like indexing in Matlab, Julia
3

you can also use slices, index sets, etc. in multi-dimensional arrays.

If you only provide 1 index, you’ll get the corresponding row (or set of rows if slicing)

anp[0]
array([0, 1])

Arithmetic#

Numpy arrays support basic element-wise arithmetic, assuming arrays are the same shape.

Note: there are more complicated broadcasting rules for different-shaped arrays, which we’ll cover some other time.

x = np.arange(4)
print(x)
print(x + x)
print(x * x)
print(x**3)

def f(x):
    return x**2 - 2*x + 1

print(f(x))
[0 1 2 3]
[0 2 4 6]
[0 1 4 9]
[ 0  1  8 27]
[1 0 1 4]

Warning: The * operator applied to 2-dimensional arrays is not the same as matrix-matrix multiplication. It will perform element-wise multiplication instead.

Numpy provides the @ operator for matrix multiplication. You can also use the matmul() or dot() (dot product) methods.

x = np.arange(4).reshape(2,2)
print(x)
print(x*x)
print(x @ x)
print(x.dot(x))
print(np.matmul(x,x))
[[0 1]
 [2 3]]
[[0 1]
 [4 9]]
[[ 2  3]
 [ 6 11]]
[[ 2  3]
 [ 6 11]]
[[ 2  3]
 [ 6 11]]

Numpy provides a variety of mathematics functions that you can use with numpy arrays. Numpy is vectorized, meaning that it is typically much faster to perform array operations than to use explicit for loops. This should be a familiar concept to Matlab users.

np.sin(x)
array([[0.        , 0.84147098],
       [0.90929743, 0.14112001]])
import time
n = 1_000_000
# list data
x = [i/n for i in range(n)]
# numpy array
xnp = np.array(x)

# square elements in-place
t0 = time.monotonic()
for i in range(n):
    x[i] = x[i] * x[i]
t1 = time.monotonic()
print("time for loop over list: {:.3} sec.".format(t1 - t0))

t0 = time.monotonic()
xnp = xnp * xnp
t1 = time.monotonic()
print("time for numpy vectorization: {:.3} sec.".format(t1 - t0))
time for loop over list: 0.0946 sec.
time for numpy vectorization: 0.00229 sec.

PyPlot#

PyPlot is a go-to plotting tool for Python. It is fully operable with numpy arrays.

conda install matplotlib
import matplotlib.pyplot as plt
# plot a single function
x = np.linspace(-1,1,100)
y = x * x
plt.plot(x,y)
plt.show()
../_images/ec48143bd755912c668e728fde92cf7497656ec700cab8c66e0d624d4e6031c8.png
# plot multiple functions
x = np.linspace(-1,1,100)
for n in range(5):
    plt.plot(x,x**n, label=f"x^{n}")
plt.legend()
plt.xlabel("x")
plt.title("Simple polynomials")
plt.show()
../_images/4bb514178860789963c2a53871cad40203ba64a85c4f13b0e12f9910a9bdaedb.png

CSV files, Pandas#

The *.csv extension is typically used to denote a “comma seperated value” file. These types of files are often used to store arrays in human-readable plain text.

Here’s an example:

0, 1, 2, 3
4, 5, 6, 7
...

You can save numpy arrays to files using np.savetxt()

# generates example.csv
n = 1000
x = np.arange(4*n).reshape(-1,4)
np.savetxt("example.csv", x, fmt="%d", delimiter=',')

Files can be loaded using np.loadtxt()

y = np.loadtxt('example.csv', dtype=np.int32, delimiter=',')
y
array([[   0,    1,    2,    3],
       [   4,    5,    6,    7],
       [   8,    9,   10,   11],
       ...,
       [3988, 3989, 3990, 3991],
       [3992, 3993, 3994, 3995],
       [3996, 3997, 3998, 3999]], dtype=int32)

Often, scientific data has some meaning associated with numbers. In this case, the csv file might have a header, and every row is a different data point.

temperature, density, width, length
0, 1, 2, 3
4, 5, 6, 7
...

You can still load using numpy, but it is easy to loose track of what the different columns of the array mean.

The solution for this sort of data is to use a Pandas dataframe

conda install pandas
import pandas as pd
data = pd.read_csv('example.csv', header=None, sep=',')
data
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
... ... ... ... ...
995 3980 3981 3982 3983
996 3984 3985 3986 3987
997 3988 3989 3990 3991
998 3992 3993 3994 3995
999 3996 3997 3998 3999

1000 rows × 4 columns

# this will set the header identitites and save the file
data = pd.read_csv('example.csv', header=None, names=["temperature", "density", "width", "length"])
data.to_csv("example2.csv", index=False) # writes to csv with headers
data
temperature density width length
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
... ... ... ... ...
995 3980 3981 3982 3983
996 3984 3985 3986 3987
997 3988 3989 3990 3991
998 3992 3993 3994 3995
999 3996 3997 3998 3999

1000 rows × 4 columns

data2 = pd.read_csv("example2.csv") # read csv with headers
data2
temperature density width length
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
... ... ... ... ...
995 3980 3981 3982 3983
996 3984 3985 3986 3987
997 3988 3989 3990 3991
998 3992 3993 3994 3995
999 3996 3997 3998 3999

1000 rows × 4 columns

You can get columns of a dataframe by using the column label

data2['temperature']
0         0
1         4
2         8
3        12
4        16
       ... 
995    3980
996    3984
997    3988
998    3992
999    3996
Name: temperature, Length: 1000, dtype: int64

To get rows, use the iloc parameter:

data2.iloc[1:3]
temperature density width length
1 4 5 6 7
2 8 9 10 11

You can easily plot labeled columns

data.plot('temperature')
plt.show()
../_images/c7ee757ed6ca62d59bbb54cfe230e09f4c7abedd8dda41aa48ec85176523cd1e.png