Demystifying Python’s Descriptor Protocol
A lot of modern frameworks and libraries use the "descriptor" protocol to make the process of creating APIs for end-users neat and simple. Let's discuss how the behavior of Python's builtins like property, staticmethod and classmethod can be imitated using the descriptor protocol.
Consider the following example class:
Whenever we access any of foo's attribute, say foo.first_name, then first_name is checked in the following until it is found:
- foo.__dict__,
- type(foo).__dict__
- __dict__ of foo's base classes in the MRO1 — except for metaclasses.
Notice that attribute full_name isn't there in foo.__dict__. Huh, where did it come from?
Well, the attribute access mechanism that we discussed was incomplete. But, before going into that, let's take a detour and look at how the descriptor protocol — on the base of which the property, classmethod, staticmethod work — works.
What's the descriptor protocol?
Any object which has at least one of __get__, __set__, __delete__ methods defined, is called a descriptor. The signature of these methods are:
There are two types of descriptors: data descriptors, and non-data descriptors. The difference between two is that, if an object has either of __set__ or __delete__ defined then it's called as a data descriptor. A non-data descriptor, therefore, only has __get__ defined among these three methods. Data and non-data descriptors have different precedence in the attribute lookup chain (more on it later).
In Person class, the class attribute full_name is a descriptor. When foo.full_name is accessed, the Person.full_name.__get__(foo, Person) gets called, which in turn calls the function that we passed in property as fget keyword argument.
So the attribute access mechanism now is:
- Check if type(foo).__dict__['first_name'] is a data descriptor. If yes, then Person.first_name.__get__(foo, Person) is returned.
- If not, first_name is checked in foo.__dict__, type(foo).__dict__ and in __dict__ of foo's base classes in MRO1 — unless it's a metaclass.
- Lastly, it is checked if type(foo).__dict__['first_name'] is a non-data descriptor, in which case Person.first_name.__get__(foo, Person) is returned.
Note that the first and third steps are almost similar. But, if an attribute is a data descriptor, then it's given the highest precedence, and in case of non-data descriptor the __dict__ lookup has higher precedence than non-data descriptors. We'll see how this will be used in the cached property later in the post.
You might be wondering what orchestrates this lookup mechanism. Well it's __getattribute__ (not to be confused with __getattr__) — When we lookup foo.full_name, foo.__getattribute__('full_name') is called, which handles it according to the attribute access mechanism we just defined.
It is also important to understand attribute setting mechanism. Consider this statement: foo.age = 32 :
- if age attribute is a descriptor then type(foo).__dict__['age'].__set__(32) is called. In case age is a non-data descriptor, AttributeError is thrown.
- Otherwise, an entry is created in foo's __dict__, i.e foo.__dict__['age'] = 32.
How does property builtin works?
Let's first see the signature of property.
although it looks like a function, but it's actually a class which is also a descriptor because it has __get__, __set__, and __delete__ defined.
We know that an attribute which is a descriptor, when accessed on an object say foo, calls its __get__ method with the object and class of the object as arguments, i.e. type(foo).__dict__['attr_name'].__get__(foo, type(foo)). Similarly, when it's being set, then its __set__ method is called with the object and value to be set, i.e. type(foo).__dict__['attr_name'].__set__(foo, value).
Continuing with the opening example:
Note that when we set foo.full_name = 'keanu reeves' , then full_name property's __set__ is called which in turn calls the _full_name_setter that we passed to property as fset argument.
We can mimic the property behavior with the following implementation:
How does cached property works?
The expected behavior for a cached property is that it should be calculated if it hasn't been calculated already, and after the calculation, it should be stored ('cached') so that it can be quickly be accessed next time onwards.
Let's now use it
Observe that when we first accessed the score attribute on foo, it printed "doing some time-consuming calculations". After foo.score was accessed once, foo.__dict__ was populated with a new entry with the key score. If we access foo.score for a second time now, nothing would be printed — it returns vars(foo)['score'] instead.
Why did that happen?
To answer this, it's time to recall the attribute access machinery. When score was accessed for first time:
- It was checked if score was a data descriptor. It was not.
- The next check was done on __dict__. Again score key wasn't found in either foo or in it's base's __dict__.
- Next, it was checked if score was a non-data descriptor — True, therefore type(foo).__dict__['score'].__get__(foo, type(foo)) was called which stored and returned the result.
When score is now accessed second time onward:
- Check if score is a data descriptor — It's not.
- 'score' key is then looked up in foo.__dict__, where it was inserted when score was accessed for the first time. foo.__dict__['score'] is returned.
One example where using cached property becomes particularly useful is if you've a model class in Django and you've defined a property which makes a time consuming query. Django's "batteries included" philosophy falls no short, and provides django.utils.functional.cached_property for this use case.
How do staticmethod and classmethod work?
A method decorated by staticmethod does not receive an implicit first argument. It converts a function to be a static method. Let's implement it using descriptor protocol:
Similarly, the descriptive API can be used to implement classmethod decorated methods — which receive the class object as the first argument — as follows:
We've used descriptor magic to understand how builtins like staticmethod, classmethod and property work, and how we can implement one like CachedProperty ourselves. Note that the CachedProperty that we implemented is not a hack — Python 3 provides these APIs to enable developers to be able to customize things as and when needed.
Helpful links:
- http://dabeaz.com/py3meta/
- https://docs.python.org/3/howto/descriptor.html
- https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.functional.cached_property
- Method Resolution Order↩