Bits, Bytes, and Numbers#

In order to do math on a computer, you should have some idea of how computers represent numbers and do computations.

Unlike strongly typed languages (e.g. C), you don’t have to worry too much about type if you’re just scripting. However, you will have to think about this for many algorithms in scientific computing.

x = 5 # an integer type
print(type(x))
x = 5.0 # float type
print(type(x))
<class 'int'>
<class 'float'>
print(4/3)  # division of ints -> float
print(4//3) # integer division -> int
1.3333333333333333
1

Bits and Bytes#

A bit is a 0/1 value, and a byte is 8 bits. Most modern computers are 64-bit architectures on which Python 3 will use 64-bits to represent numbers. Some computers may be 32-bit architectures, and Python may use 32-bits to represent numbers - beware!

You can represent strings of bits using the 0b prefix. Be default, these will be interpreted as integers written in base-2. For example, on a 32-bit system,

0b101 = 00000000000000000000000000000101 (base 2) = 5 (base 10)
x = 0b101
print(x)
5

It is often easier to deal with hexadecimal (base-16), denoted with the 0x prefix

x = 0xF
print(x)
15

Let’s count in hexadecimal and binary:

print("hex\tbin\tdec")
for x in range(0x11):
    print("{:x}\t{:b}\t{}".format(x, x, x))
hex	bin	dec
0	0	0
1	1	1
2	10	2
3	11	3
4	100	4
5	101	5
6	110	6
7	111	7
8	1000	8
9	1001	9
a	1010	10
b	1011	11
c	1100	12
d	1101	13
e	1110	14
f	1111	15
10	10000	16
0xF == 0b1111 == 15
True

By using the 0o prefix, you can also use octal (base 8) if you’d like

x = 0o10
print(x)
8

hexadecimal is often used because it breaks up bits into blocks of 4 (16 = 2^4). So a 64-bit type has some representation as a length-16 string in hexadecimal.

Integers#

Integers are represented in base-2 using bits. Most modern computers are 64-bit architectures, and Python 3 will by default use 64-bit integers. However, unlike some languages, Python will use arbitrary precision so you don’t run into overflow errors:

print(2**65) # ** for exponentiation
36893488147419103232
import sys
sys.getsizeof(2**256) # size in bytes
60

However, when we call code written in C/C++ or fortran such as numpy, you can run into overflow issues

import numpy as np
x = np.int64(2)
x ** 63
-9223372036854775808
x**65
0

Bitwise operations#

You can perform operations on bit strings in Python. To start with, we recall operations in boolean algebra: & (and), | (or), and ^ (xor) ~ (not).

Here’s a list of possible values for &

for i in [0,1]:
    for j in [0,1]:
        print("{} & {} = {}".format(i,j,i&j))
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1

Performing an operation bitwise just performs the operation on each set of bit grouped by position (i.e. you can read off each column of the below using the table above)

  0b1100
& 0b0101
--------
  0b0100
def binprint(n):
    """
    formats an input integer n to 4 digit binary
    
    n input
    
    prints n
    """
    print("{:04b}".format(n)) # 4 digits, padded by 0 in the front
    

print("the binprint function simply formats a number in binary and zero-pads on the left up to 4 digits")
binprint(1)
binprint(2)
binprint(4)
binprint(8)
print("\n")

print("bitwise operations")
binprint(0b1100 & 0b0101) # bitwise and
binprint(0b1100 | 0b0101) # bitwise or
binprint(0b1100 ^ 0b0101) # bitwise xor
binprint(~0b1100) # negation
the binprint function simply formats a number in binary and zero-pads on the left up to 4 digits
0001
0010
0100
1000


bitwise operations
0100
1101
1001
-1101

Note that & and and are not equivalent. and does not operate bitwise - it interprets both as logical values (where 0 is False and any other number is True).

binprint(0b1100 & 0b0101)
binprint(0b1100 and 0b0101)
0100
0101

Two’s Complement#

You’ll notice in the above, that ~ has a potentially unexpected output. We might have expected something like the following:

binprint(~0b1100 & 0b1111) # negation, with bitmask for last 4 digits
0011

To explain why, we need to understand how negative integers are represented. Let’s consider a signed 8-bit integer. The first bit is the sign bit (0 indicates the number is positive, and 1 indicates the number is negative).

import bitstring

def print_int8(a):
    """
    print a as if it were an 8-bit signed integer
    """
    # for printing color
    CRED    = '\33[31m'
    CBLUE   = '\33[34m'
    CEND      = '\33[0m'
    
    b = bitstring.BitArray(int=a, length=8)
    bs = str(b.bin)
    print(CBLUE + bs[0] + CEND + CRED + bs[1:] + CEND)
    
print_int8(1)
print_int8(8)
print_int8(64)
print_int8(127)
00000001
00001000
01000000
01111111

What about a negative number?

print_int8(-1)
print_int8(-127)
11111111
10000001

Naively, if we ignore the sign bit, -1 looks like “-”127, and -127 looks like “-”1. This is because negative integers are using two’s complement to represent negative integers, so you can’t just read off the number by ignoring the sign bit in the way you might expect.

You can compute the two’s complement of a number by inverting bits using a bit-wise not operation and adding 1 (ignoring integer overflow).

print_int8(~4 + 1)
print_int8(-4)
11111100
11111100

The two’s complement operation is its own inverse.

def twos_complement(n):
    """
    compute the two's complement of an integer n
    """
    return ~n + 1

n = 4
print_int8(4)
print_int8(twos_complement(4))
print_int8(twos_complement(twos_complement(4)))
00000100
11111100
00000100

Note that positive numbers can go up to 127, but negative numbers go down to -128.

print_int8(0)
print_int8(127)
print_int8(-128)
00000000
01111111
10000000

Why use two’s complement? The reason why is that using this representation you can use the same circuits in hardware for addition, subtraction, and multiplication of negative integers that you can for positive integers.

Floating Point Numbers#

Real numbers are typically represented as floating point numbers on a computer. Almost all real numbers must be approximated, which means you can’t always ask for exact equality

1.0/3.0
0.3333333333333333
1.2 - 1.0
0.19999999999999996
1.2 - 1.0 == 0.2
False

The approximation error is called machine precision, typically denoted \(\epsilon\)

import numpy as np
print(np.finfo(np.float16).eps) # 16-bit float
print(np.finfo(np.float32).eps) # 32-bit float
print(np.finfo(np.float64).eps) # 64-bit float
print(np.finfo(np.float128).eps) # 128-bit float
0.000977
1.1920929e-07
2.220446049250313e-16
1.084202172485504434e-19

32-bit floating point numbers corrsepond to a float in C, and are also known as single precision numbers. 64-bit floating point numbers correspond to a double in C, and are also known as double precision numbers. 16-bit floats are half-precision, and 128-bit floats are quad-precision.

Double precision is the standard for many numerical codes. Quad- (or higher) precision is sometimes useful. A big trend in deep learning is to use lower-precision formats.

Floating point numbers are numbers written in scientific notation (in base-2). E.g. \(1.1 * 2^{10} = 1.5 * 2^2\) in base-10 = 6. They contain a sign bit, a set of bits for the decimal (called the significand or mantissa), and a set of bits for the exponent.

For example, float32 has 1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa.

img

For further reading on potential considerations with floating point numbers, see the Python documentation.

You can inspect the bits used in a floating point number in python using the bitstring package

(pycourse) $ conda install bitstring -c conda-forge
import bitstring
def fmt_float32_bits(a):
    """
    prints bits of a 32-bit float
    """
    # for printing color
    CRED    = '\33[31m'
    CGREEN  = '\33[32m'
    CBLUE   = '\33[34m'
    CEND      = '\33[0m'
    
    b = bitstring.BitArray(float=a, length=32)
    bs = str(b.bin)
    return CBLUE + bs[0] + CEND + CGREEN + bs[1:8] + CEND + CRED + bs[8:] + CEND
    
print("{: .2e} = {}".format(1.0, fmt_float32_bits(1.0)))
print("{: .2e} = {}".format(-1.0, fmt_float32_bits(-1.0)))
print("{: .2e} = {}".format(1/(2**126), fmt_float32_bits(1/(2**126))))
print("{: .2e} = {}".format((2**126),fmt_float32_bits((2**126))))
 1.00e+00 = 00111111100000000000000000000000
-1.00e+00 = 10111111100000000000000000000000
 1.18e-38 = 00000000100000000000000000000000
 8.51e+37 = 01111110100000000000000000000000
print("{: .2e} = {}".format((np.pi),fmt_float32_bits((np.pi))))
 3.14e+00 = 01000000010010010000111111011011

As a result, you can’t count on floating point numbers to count:

import numpy as np
a = np.float32(2**24 + 1)
b = np.float32(2**24)
print(a - b)
0.0

You must also be aware of accumulating rounding errors. Let’s test that by taking an average of many duplicates of 1/3.

floatype = np.float32
x = floatype(0)
samp = floatype(1./3.) # be sure that samples are in the same precision
nsamp = 10**6 # how many samples
for i in range(nsamp):
  x = x+samp
x = x/nsamp
print("Error in the mean is {:.2e}".format(x-samp))
Error in the mean is 4.97e-04

There other ways to compute means, but they don’t do much better

x = floatype(0)
for i in range(nsamp):
  x = x*floatype(i/(i+1))+samp/floatype((i+1))
print("Error in the mean is {:.2e}".format(x-samp))
Error in the mean is 1.91e-04
print("a = {}".format(fmt_float32_bits(a)))
print("b = {}".format(fmt_float32_bits(b)))
a = 01001011100000000000000000000000
b = 01001011100000000000000000000000

For floats, there are two equivalent zeros

print(" 0.0 = {}".format(fmt_float32_bits(0.0)))
print("-0.0 = {}".format(fmt_float32_bits(-0.0)))
0.0 == -0.0
 0.0 = 00000000000000000000000000000000
-0.0 = 10000000000000000000000000000000
True

Converting From Floating Point#

If you want to convert the binary floating point representation to a decimal, you can read off the sign, exponent, and significand independently.

The sign bit is straightforward (0 is positive, 1 is negative)

print(" 0.0 = {}".format(fmt_float32_bits(0.0)))
print("-0.0 = {}".format(fmt_float32_bits(-0.0)))
 0.0 = 00000000000000000000000000000000
-0.0 = 10000000000000000000000000000000

The exponent has a “bias” equal to half the number of possible bits. So to get the value of an 8-bit exponent, we subtract 2**7-1 = 127 = 0b01111111.

for x in [2**-126, 1e0, 2**126]:
    print( "{:.2e} = {}".format(x,fmt_float32_bits(x)))
1.18e-38 = 00000000100000000000000000000000
1.00e+00 = 00111111100000000000000000000000
8.51e+37 = 01111110100000000000000000000000

You obtain the number by raising the significand to the base multiplied by the exponent.

Exercise#

  1. What is \(\log_2(\epsilon)\) for 32 and 64-bit floats? (hint: use np.log2)

  2. How many bits do you think are used to represent the decimal part of a number in both these formats?

  3. If you take into account a sign bit, how many bits are left for the exponent?

  4. What is the largest exponent you can have for 32- and 64-bit floats? Keep in mind that you should have an equal number of positive and negative expoenents.

  5. Design an experiment to check your answer to part 4.

Check your answer with e.g. np.finfo(np.float32).max)

# Your code here
Hide code cell content
# Part 1
np.log2(np.finfo(np.float32).eps)
-23.0
Hide code cell content
-np.log2(np.finfo(np.float32).eps)
23.0
Hide code cell content
# Part 3
32 + np.log2(np.finfo(np.float32).eps) - 1
8.0
Hide code cell content
# part 4
print(0b11111111 // 2) # bit string of 8 1s in binary
print(0xFF // 2)       # same bit string, but in hex
127
127
Hide code cell content
# part 5
print(np.float32(1.0 *2**127))
print(np.float32(1.0 *2**128))
1.7014118e+38
inf
/tmp/ipykernel_679/167000260.py:3: RuntimeWarning: overflow encountered in cast
  print(np.float32(1.0 *2**128))
Hide code cell content
np.float32(1.0 *2**128)
/tmp/ipykernel_679/1218727782.py:1: RuntimeWarning: overflow encountered in cast
  np.float32(1.0 *2**128)
inf