Building automatic inplace in python28 Jan 2023
When using data related python libraries (pandas, numpy, etc.) a lot of the functions accept an inplace parameter. By setting it to
True, we change the source object without creating an additional copy, in some cases this can have a significant memory impact.
In this article I’ll show that we can automatically detect some patterns where using
inplace=True has no downsides:
To do this we need to solve two main issues:
First, we need to be sure that we have no other references on the source object, for example:
This is a valid use case.
This is not. As we’ll end up also changing
other_df, which may not be intended by the user.
Second, we need to make sure that the same variable we get as a parameter is used as a target, example:
This is valid.
This is not, as the changes to the source df are not expected.
In the following, we’ll use the following function as an example:
To make sure that we don’t accidentally change another object by automatically setting
inplace=True, we need to use reference counting.
Basically, we should only have 2 references to our source object:
To count the references we can use :
If you try this, you may be surprised that the value is
This is due to
sys.getrefcount also creating a reference to its argument.
The second additional ref count is explained here: https://stackoverflow.com/a/46146772, which looks like an implementation detail in the new way CPython handles arguments passing starting 3.6.
Our code now is:
Confirming the source and target objects are the same
To solve the second issue, we’ll:
- Retrieve information on the outer frame, since we’re already in the function the outer frame will refer to the one from which we call the function.
- Retrieve the code portion referring to the function call.
- Using the code to build a small ast, to check that the first argument of the call has the same name as the target.
To retrieve the outer frame we can use the
We can get the code using
Once we have the code, we can use the
ast module to check if we’re on the valid use case:
We can offer an easy way to add automatic inplace to existing functions using decorators:
This way we can just decorate our functions that offer an inplace parameter:
There are some limitations with the presented approach:
Our decorator expects the source object to be the first argument, and expect the function to return exactly one value. This is the most common, but we can easily extend our decorator to more usages by adding parameters.
The second limitation is that the
frame.code_context will only return one line of the code, this may not sound like an issue but it will make it hard to detect use cases like this one:
This is a little trickier to solve and may cause some missed auto inplace opportunities.