Python Internals Serie : Int (Long) Object10 Jan 2021
In this article, we will read and discuss the implementation details of ints in CPython.
Altough using integers in python is fairly easy :
The implementation file contains more than 5000 code lines.
Since the file contains roughly 116 functions/macros, I’ll probably skip some (most of ?) functions.
Without further ado, let’s dive into the code !
First we need to find a starting point, we know that all pythons objects are stored in
Unfortunetly, we don’t seem to have an
intobject.c file. That’s because the underlying C object for python integers is
The file we’re interessted in is :
From now one I will refer to python’s integers implementation both using integers or longs, which is not technically accurate since in C those are different types.
For the rest of this article, we’ll revisit some of the most used operations/functions related to integers.
The comment is pretty explanatory, the macro convert a given long to an sdigit. But what’s an sdigit ? Well it depends :
PYLONG_BITS_IN_DIGIT is defined either at configure time or in
We have an assert to protect us from accidently casting a big integer, which is not small enough to fit in an sdigit, to an sdigit, which may result in a loss of information, therefore potentiely issues which can be hard to detect.
Curiously, the assert check that the size of x is bigger than -1, but can be less than 0 ? My guess is that size is unsigned to represent both the size and the sign of the integer, for exemple :
PY_SIZE(-15) = -2.
This seems to be confirmed with
Py_SIZE(x) < 0 ? -(sdigit)(x)->ob_digit.
ob_digit looks like an array containing our integer. which can be confirmed in the file
One curious fact, that I can’t explain, is the small size of the array (one element ?), maybe it’s somehow changed at runtime.
Pretty straightforward function.
NSMALLNEGINTS is defined in the
pycore_interp.h file :
To undesrtand why those 2 magic numbers and where this macro is used, let’s dive into the next function.
static PyObject * get_small_int(sdigit ival)
This function implements what is commonly known as the Flyweight pattern, as explained here : https://python-patterns.guide/gang-of-four/flyweight/, the flyweight pattern is used in python to create at the initialisation phase all the integers in the range [-5, 256]. At the runtime, whenever you ask for a number in this range, you’ll always get the same number.
Which can be easily checked :
Those integers objects are stored in the interpreter state.
An interpreter state is a group of threads along with the data specific to this group. Threads share such things as loaded modules (sys.modules), builtins (builtins.dict) and the import system (importlib).
Apparently flyweight integers are stored there.
We also use
Py_INCREF to increment the reference count of the returned integer, recall that reference counting is what is used by the CPython garbage collector to detect which objects to free (Well not just that, since reference counting alone doesn’t resolve circular references, but we’ll discuss the gc in more details in future articles).
Straitforward function, we check a given integer is small enough to fit into an sdigit, if it’s the case we downcast it to an sdigit using the
MEDIUM_VALUE. After downcasting we check if the integer is in the flyweight range [-5, 257], if it’s the case we decrement the referece counting (since we don’t need two PyLong objects) and we return a reference on the already allocated number.
How are longs created
Longs are created using the function
PyLongObject * _PyLong_New(Py_ssize_t size), size here refer to the number of digit of the target long.
Well, looks like we can’t have an indefinely big integer, but how big can our integers be ?
If you don’t already know it, offsetof is a C function wich will basicaly return an offset of the member (ob_digit) from the structure (PyLongObject), if you recall correctly, since our struture only contains
digit ob_digit. so the offset is the memorry taken with VAR_HEAD.
So a integer has a maximal size of roughly Py_SSIZE_T_MAx.
According to this answer : https://stackoverflow.com/a/42777910/14517936
Py_SSIZE_T_MAX = sys.maxsize, which is according to the official documentation ( https://docs.python.org/3/library/sys.html#sys.maxsize ) equal to
2**31 - 1 on 32 bits machine or
2**63 - 1.
2**63 - 1 is a huge number of digits.
You can check this limit yourself by doing :
If the size is fine, we allocate enough memory to store our integer :
As a good practice you should ALWAYS check that an allocation had correctly been performed, which may not be the case if you don’t have enough memory for example, your futur self will thank you.
Adding two integers
First we check that the two integers are valid.
One curious thing you may have noticed is the :
Which seems exactly the same as simply :
The purpose of doing this is macros in C are not really smart, the preprocessor will simply do a find/replace and hope for the best. The problem with this can be illustrated with the following macro :
But what happens if you do :
the preprocessor will replace this with :
which is not valid C, since in C if you omit the braces, the if body will only consists of the first instruction which is not what you would expect.
The do while solve this by enclosing all our instructions in one scope.
But why not just use :
As far as i know this is simply a syntaxic choice, since writing
MACRO(); is more natural than
MACRO() (not that in the second case we have no semicolon).
The rest is :
- if both a and b are negative we return -|a + b|
- if a is negative but not b we compute b - a
- if b is negative and a is positive we compute a - b
- if both are positive we compute |a + b|
This looks fairly complicated for a simple a + b, so why all the burden ?
Well recall that in
Python integers can be very large (2^63 digits on 64 bits machines), this is achieved by storing the integers in arrays (each element representing a digit).
We dive in the
x_add code we see :
And that’s it, the
long_sub is fairly similar, and you can always (i encourage you to do so) check multiplication and division code.
We rarely stop to think about basic operations like integer operations, the longobject file shows use how complex and optimised long/integers implementation is in python.