Building automatic inplace in python
28 Jan 2023When using data related python libraries (pandas, numpy, etc.) a lot of the functions accept an inplace parameter. By setting it to True
, we change the source object without creating an additional copy, in some cases this can have a significant memory impact.
In this article I’ll show that we can automatically detect some patterns where using inplace=True
has no downsides:
To do this we need to solve two main issues:
First, we need to be sure that we have no other references on the source object, for example:
This is a valid use case.
This is not. As we’ll end up also changing other_df
, which may not be intended by the user.
Second, we need to make sure that the same variable we get as a parameter is used as a target, example:
This is valid.
This is not, as the changes to the source df are not expected.
In the following, we’ll use the following function as an example:
References counting
To make sure that we don’t accidentally change another object by automatically setting inplace=True
, we need to use reference counting.
Basically, we should only have 2 references to our source object:
To count the references we can use : sys.getrefcount(l)
.
If you try this, you may be surprised that the value is 4
not 2
.
This is due to sys.getrefcount
also creating a reference to its argument.
The second additional ref count is explained here: https://stackoverflow.com/a/46146772, which looks like an implementation detail in the new way CPython handles arguments passing starting 3.6.
Our code now is:
Confirming the source and target objects are the same
To solve the second issue, we’ll:
- Retrieve information on the outer frame, since we’re already in the function the outer frame will refer to the one from which we call the function.
- Retrieve the code portion referring to the function call.
- Using the code to build a small ast, to check that the first argument of the call has the same name as the target.
To retrieve the outer frame we can use the inspect
module:
We can get the code using frame.code_context
.
Once we have the code, we can use the ast
module to check if we’re on the valid use case:
Wrapping up
We can offer an easy way to add automatic inplace to existing functions using decorators:
This way we can just decorate our functions that offer an inplace parameter:
Limitations
There are some limitations with the presented approach:
Our decorator expects the source object to be the first argument, and expect the function to return exactly one value. This is the most common, but we can easily extend our decorator to more usages by adding parameters.
The second limitation is that the frame.code_context
will only return one line of the code, this may not sound like an issue but it will make it hard to detect use cases like this one:
This is a little trickier to solve and may cause some missed auto inplace opportunities.