Skip to content

Python

Numeric in Python

  • Python

5 types

int

integers are represented using base-2(binary) digits, not base-10(decimal)

10011(base-2) required 5 bits

what’s the range of integer number can be represented using 8 bits?

11111111(base-2) required 8 bits, could represent largest 255

Using 8 bits are able to represent range [-127, 127], since 0 does not require a sign, we can squeeze out an extra number to [-128, 127]

How large an integer can be depends on how many bits are used, some languages(Java, C) provide multiple distinct integer types that use a fixed number of bits. Python doesn’t work that way, the int object uses a variable number of bits, can use 32, 64, 96 bits etc.(increase bits by need)

int arithmetic
  • int +(-, *, **, //, %) int => int
  • int / int => float

integer base change algorithm


Fraction

represent rational number


float

represent real number

CPython float is implemented using the C double type which implements the IEEE double-precision binary float, also called binary64

float uses a fixed number of bytes, in standard, CPython/3.6/64bits using 8 bytes(64 bits)

representation
  • decimal
  • binary

problem: some numbers do not have a finite representation

  • 1/3 can not represent accurate by decimal expansion
  • 1/10 can not represent accurate by binary expansion

Coercing float to integer -> data loss

truncation: math.trunc(), constructor of int() use truncation

floor: math.floor()

ceiling: math.ceil()

round: round()

Banker’s Rounding

Read More »Numeric in Python

Python Optimization

Interpreters
  • CPython: the standard python implementation written in C
  • Jython: written in Java
  • IronPython: written in C#
  • PyPy: written in RPython (which is itself a statically-typed subset of Python written C that is designed to write interpreters)

Interning

Reusing objects on-demand

Integer

At startup, CPython pre-loads(catches) a global list of integers in the range[-5, 256], any time an integer is referenced in the range, python will use the cached version of that object.

Optimization strategy: small integers show up often

string

Not all strings are automatically interned

As the python code is compiled, identifiers are interned

peephole

an optimization occur at compile time

pre-calculate constant expressions

  • numeric calculation: 24 * 60 (store 1440)
  • short seqences length < 20: ‘abc’ * 3 (store ‘abcabcabc’)

membership tests (if e in [1, 2, 3]: …)

mutable replaced by immutable

  • list -> tuples
  • set -> frozenset
  • set membership is much faster than list and tuple

Object Mutability in Python

Internal state

changing the data inside the object is called modifying the internal state of the object, the state(data) is changed, but memory address has not changed

Mutable

an object whose internal state can be changed

  • Lists, Sets, Dictionarys, User-defined Classes

Immutable

an object whose internal state can not be changed

  • Numbers(int, float, Booleans), String, Tuples, Frozen Sets, User-defined Classes
  • variable re-assignment change the reference not the internal value

 

Variables in Python

variables are memory references, not equal to the object but reference(alias) the object at memory space.

Find out memory address referenced: using id()

reference counting

after we created a object in memory, python keep track of the number of the references we have to the object, as soon as the count goes to 0, python memory manager destroy object and reclaim the memory space

Find out reference count:

  • using sys.getrefcount()
  • using ctypes.c_long.from_address()

circular references

in the circumstance of circular reference, the reference count could not goes to 0(memory leak), need garbage collector to identify it

garbage collection

can be control programmatically using the gc module, turned on by default, beware to turn it off, for python < 3.4, if even one of the objects in the circular reference has a destructor, the destruction order may be important, but the GC does not know what order should be, so the circular reference objects will be marked as uncollectable and cause memory leak

dynamically typing

python variable name has no references to any type, when we use type(), python looks up the object which is referenced and return the type of the object

variable equality

  • identity operator(var_a is var_b) compare the memory address
  • equality operator(var_a == var_b) compare the object state

Python Name Conventions

Must start with (_) or letter (a-z, A-Z), follow by any number of (_) or letter(a-z, A-z) or digit (0-9) except reserved words

Conventions

_my_var: indicate “internal use” or “private” object, cannot get imported by

__my_var: used to mangle class attributes, useful in inheritance chain

__my_var__: system defined

PEP8 style guide

Python Multi-line Statements

How python interpret multi-line code into single line code:
  1. python program
  2. physical lines of code(end with a physical newline CHARACTER create by enter)
  3. logical lines of code(end with a logical NEWLINE token)
  4. tokenized
  5. execute
physical newlines vs logical newline

sometimes physical newlines are ignored in order to combine multiple physical lines into a  single logical newline

break implicitly: [], (), {}

break explicitly

multi-line strings

multi-line strings are regular string, not comments (can be used as docstring)

escaped characters(\n, \t), non-visible characters(newlines, tabs) in multi-line are part of string; escaped characters will formatted when print it

ref:

https://github.com/fbaptiste/python-deepdive/blob/master/Part%201/Section%2002%20-%20A%20Quick%20Refresher/01%20-%20Multi-Line%20Statements%20and%20Strings.ipynb