# Python Internals Serie : Int (Long) Object

10 Jan 2021In this article, we will read and discuss the implementation details of *ints* in *CPython*.

Altough using integers in python is fairly easy :

The implementation file contains more than 5000 code lines.

Since the file contains roughly 116 functions/macros, I’ll probably skip some (most of ?) functions.

Without further ado, let’s dive into the code !

### longobject.c

First we need to find a starting point, we know that all pythons objects are stored in `cpython/Objects`

.

Unfortunetly, we don’t seem to have an `intobject.c`

file. That’s because the underlying C object for python integers is `PyLongObject`

.

The file we’re interessted in is : `longobject.c`

From now one I will refer to python’s integers implementation both using *integers* or *longs*, which is not technically accurate since in *C* those are different types.

For the rest of this article, we’ll revisit some of the most used operations/functions related to integers.

#### MEDIUM_VALUE(x)

The comment is pretty explanatory, the macro convert a given *long* to an *sdigit*. But what’s an sdigit ? Well it depends :

The `PYLONG_BITS_IN_DIGIT`

is defined either at configure time or in `pyport.h`

.

We have an assert to protect us from accidently casting a big integer, which is not small enough to fit in an sdigit, to an sdigit, which may result in a loss of information, therefore potentiely issues which can be hard to detect.

Curiously, the assert check that the *size* of x is bigger than -1, but can be less than 0 ? My guess is that size is unsigned to represent both the size and the sign of the integer, for exemple : `PY_SIZE(-15) = -2`

.
This seems to be confirmed with `Py_SIZE(x) < 0 ? -(sdigit)(x)->ob_digit[0]`

.

`ob_digit`

looks like an array containing our integer. which can be confirmed in the file `longintrepr.h`

:

One curious fact, that I can’t explain, is the small size of the array (one element ?), maybe it’s somehow changed at runtime.

#### IS_SMALL_INT(ival)

Pretty straightforward function.

`NSMALLPOSINTS`

and `NSMALLNEGINTS`

is defined in the `pycore_interp.h`

file :

To undesrtand why those 2 magic numbers and where this macro is used, let’s dive into the next function.

#### static PyObject * get_small_int(sdigit ival)

This function implements what is commonly known as the *Flyweight pattern*, as explained here : https://python-patterns.guide/gang-of-four/flyweight/, the flyweight pattern is used in python to create at the initialisation phase all the integers in the range [-5, 256]. At the runtime, whenever you ask for a number in this range, you’ll always get the *same* number.

Which can be easily checked :

Those integers objects are stored in the interpreter state.

According to https://tenthousandmeters.com/blog/python-behind-the-scenes-1-how-the-cpython-vm-works/ :

An interpreter state is a group of threads along with the data specific to this group. Threads share such things as loaded modules (sys.modules), builtins (builtins.

dict) and the import system (importlib).

Apparently flyweight integers are stored there.

We also use `Py_INCREF`

to increment the reference count of the returned integer, recall that reference counting is what is used by the CPython garbage collector to detect which objects to free (Well not just that, since reference counting alone doesn’t resolve circular references, but we’ll discuss the *gc* in more details in future articles).

#### maybe_small_long

Straitforward function, we check a given integer is small enough to fit into an sdigit, if it’s the case we downcast it to an sdigit using the `MEDIUM_VALUE`

. After downcasting we check if the integer is in the flyweight range *[-5, 257]*, if it’s the case we decrement the referece counting (since we don’t need two PyLong objects) and we return a reference on the already allocated number.

### How are longs created

Longs are created using the function `PyLongObject * _PyLong_New(Py_ssize_t size)`

, size here refer to the number of digit of the target long.

Well, looks like we can’t have an indefinely big integer, but how big can our integers be ?

If you don’t already know it, *offsetof* is a *C* function wich will basicaly return an offset of the member (ob_digit) from the structure (PyLongObject), if you recall correctly, since our struture only contains `PyObject_VAR_HEAD`

and `digit ob_digit`

. so the offset is the memorry taken with VAR_HEAD.

So a integer has a maximal size of roughly Py_SSIZE_T_MAx.

According to this answer : https://stackoverflow.com/a/42777910/14517936 `Py_SSIZE_T_MAX = sys.maxsize`

, which is according to the official documentation ( https://docs.python.org/3/library/sys.html#sys.maxsize ) equal to `2**31 - 1`

on 32 bits machine or `2**63 - 1`

.

`2**63 - 1`

is a huge number of digits.

You can check this limit yourself by doing :

If the size is fine, we allocate enough memory to store our integer :

As a good practice you should ALWAYS check that an allocation had correctly been performed, which may not be the case if you don’t have enough memory for example, your futur self will thank you.

### Adding two integers

First we check that the two integers are valid.

One curious thing you may have noticed is the :

Which seems exactly the same as simply :

The purpose of doing this is macros in C are not really smart, the *preprocessor* will simply do a find/replace and hope for the best. The problem with this can be illustrated with the following macro :

But what happens if you do :

the preprocessor will replace this with :

which is not valid *C*, since in *C* if you omit the braces, the if body will only consists of the first instruction which is not what you would expect.

The do while solve this by enclosing all our instructions in one scope.

But why not just use :

As far as i know this is simply a syntaxic choice, since writing `MACRO();`

is more natural than `MACRO()`

(not that in the second case we have no semicolon).

The rest is :

- if both a and b are negative we return -|a + b|
- if a is negative but not b we compute b - a
- if b is negative and a is positive we compute a - b
- if both are positive we compute |a + b|

This looks fairly complicated for a simple a + b, so why all the burden ?

Well recall that in `Python`

integers can be very large (2^63 digits on 64 bits machines), this is achieved by storing the integers in arrays (each element representing a digit).

We dive in the `x_add`

code we see :

## Conclusion

And that’s it, the `long_sub`

is fairly similar, and you can always (i encourage you to do so) check multiplication and division code.

We rarely stop to think about basic operations like integer operations, the longobject file shows use how complex and optimised long/integers implementation is in python.