terminal = false
nup_logo

Machine Learning with Python

Lecture 2. Python types, mutables and immutables


Alexander Avdiushenko
October 10, 2023
import sys def pretty_sizeof(x): return f'{x} — {sys.getsizeof(x)} bytes' pretty_sizeof(1) print(type(1)) print([method for method in dir(1) if not method.startswith('__')])
help((1).to_bytes)
print('\n'.join(['Size of objects in Python:'] + [pretty_sizeof(x) for x in (0.0, 1.0)] + [''] + [pretty_sizeof(x) for x in ("", "a", "ab")] + [''] + [pretty_sizeof(x) for x in ([], ["a"], ["a", "aaa"], ["a", "aaa", 1.0])] + ['']))
print('\n'.join(['Size of objects in Python:'] + [pretty_sizeof(x) for x in ((), ("a",), ("a", "aaa"))] + [''] + [pretty_sizeof(x) for x in (set(), {"a"})] + [''] + [pretty_sizeof(x) for x in ({}, {1: "a"}, {1: "a", 2: "b"})] + ['']))

Python Properties:

  • Multiparadigm (object-oriented, functional, ...)
  • "Batteries included" (rich standard library)
  • PEP (Python Enhanced Proposal); What's new in Python 3.12
  • Strict dynamic typing

About Life

  • Python is a universal glue for APIs/libraries/frameworks/distributed systems
  • Comes in handy every day
  • But it's not worth writing projects with serious infrastructure in Python

Data type — a set of values and operations over these values (IEEE Std 1320.2-1998), their representation in memory.

Helps programmers find errors in the code.

x = 0 5 / x

Dynamic (duck) typing — "If it looks like a duck, swims like a duck and quacks like a duck, then it probably is a duck."

duck

Types in Python

a = 2 print(type(a)) a = True print(type(a)) # https://en.wikipedia.org/wiki/George_Boole # «In 1847 Boole published the pamphlet Mathematical Analysis of Logic» max([1, 2]), max({1, 2}), max(1, 2, 3) help(max)

Strict (strong) typing — the presence of type consistency safety and memory access safety.

In Python, there is no type casting (almost).

2 + "1.0" 2 + 1.0

Float

a = 1.7976931348623157e+308 + 10009090 b = 1.7976931348623157e+308 + 2323 print(a) a == b, id(a) == id(b) # in CPython it is (True, False) import sys sys.float_info

From 1991 to 2018, Guido van Rossum was the "benevolent lifetime dictator" of the Python language.

guido_python
import this

(Im)mutability

a = 1.0 b = 2.0 c = b id_b = id(b) b = b + a # the same result is with b += a id(b) == id_b, id(c) == id_b
lst_of_lst = [[1]] * 2 lst_of_lst[1][0] = 2 print(lst_of_lst) # that's why better to use immutable tuple lst_of_tpl = [(1,)] * 2 lst_of_tpl[1][0] = 2 print(lst_of_tpl)

String (str)

a = "hello" b = "hello" id(a) == id("hello"), id(a) == id(b) a = "a" * 100500 b = "a" * 100500 id(a) == id("a" * 100500), id(a) == id(b) import sys a = sys.intern("a" * 100500) b = sys.intern("a" * 100500) id(a) == id("a" * 100500), id(a) == id(b)
help(sys.intern)

Python STR algorithmic complexites

  • Inserting a letter to the beginning/middle/end of the string? O(n)
  • Getting the length of a string? O(1)
  • Searching a letter in a string? O(n)
  • Deleting a letter from a string? O(n)
  • Adding a string to another string? (i.e. concatenating) O(m + n)

list

[1, "2", [3]] a = 2; b = 3; list_ = [2, 4] print(f" a -> {id(a)}\n b -> {id(b)}\n list_ -> {id(list_)}", ) for i, el in enumerate(list_): print(f"list_[{i}] -> {id(el)}") a = [1] b = [1] id(a) == id([1]), id(a) == id(b)
a = [1] b = [2] c = b old_id_b = id(b) b = b + a a, b, c, id(b) == old_id_b, id(c) == old_id_b a = [1] b = [2] c = b old_id_b = id(b) b += a # same result with b.extend(a) and b.append(1) a, b, c, id(b) == old_id_b, id(c) == old_id_b

Joy of using typed arrays in Python directly from C

from array import array arr = array('H', range(1000)) arr[3]

list — mutable data type in Python, so its complexities are

  • Updating an item in the list by index? O(1)
  • Inserting into a list? O(n)
  • Deleting from a list? O(n)
  • Appending to the end of a list? O(1)
  • Concatenating two lists? O(n+m)
  • Getting the length of a list? O(1)
  • Searching in a list? O(n)

tuple

a = (1,) b = (2,) c = b old_id_b = id(b) b += a a, b, c, id(b) == old_id_b, id(c) == old_id_b # what is the final value of tuple_ here and why? tuple_ = (0, 1, []) tuple_[2].append(8) tuple_

tuple — immutable data type in Python, so

  • Updating an item in a tuple by index? --
  • Inserting into a tuple? --
  • Deleting from a tuple? --
  • Appending to the end of a tuple? --
  • Concatenating two tuples? O(n + m)
  • Getting the length of a tuple? O(1)
  • Searching in a tuple? O(n)

dict — dictionaries (associative arrays)

a = {1: 2} b = dict([(1, 2)]) b a[2] = 3; a[2] = 4 a a = {1: 2} a.get(1), a.get(3), a.get(3, 0) a = {1: 2} a.update({2: 3, 1: 4}) a
d = {1: 2, 3: 4} # newbie for k in d.keys(): print(k) # profi for k in d: print(k)
d = {1: 2, 3: 4} for k, v in d.items(): print(k, v) for v in d.values(): print(v)
d = {1: 2, 3: 4} for k in d: del d[k] d
a = {1: 2} b = {2: 3, 1: 4} c = b old_id_b = id(b) b.update(a) # the same as b[1] = 2 a, b, c, id(b) == old_id_b, id(c) == old_id_b
d = {{1: 2}: 3} d = {(1,2): 3} d
import sys d = {} # not set() print(sys.getsizeof(d), d.__sizeof__()) d.clear() print("After clear(): ", sys.getsizeof(d), d.__sizeof__())

From the Python method documentation

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

Quine

A computer program that outputs an exact copy of its source code.

Programs that use external data (reading program text from a file, input from the keyboard, etc.) are not considered quines.

_='_=%r;print(_%%_)';print(_%_)