Last week my friend asked me about 0x80
bit that sometimes set in Python marshalled objects data in *.pyc file.
So I decided to understand what kind of flag it is and why it is used.
This is
FLAG_REF — a special flag that was added in Python 3.4, and used in conjunction with
TYPE_REF type code value. For source code look at:
https://github.com/python/cpython/blob/3.4/Python/marshal.c.
In the current version (Python 3.7) everything is the same.
TYPE_REF = 'r'
FLAG_REF = 0x80
All currently used type code values are < 0x80 (the highest type code is TYPE_DICT = '{', i.e. 0x7b), so it was decided to store this flag right in the highest bit of type code value.
Now let's see how this flag works.
When marshal_load reads *.pyc file with read_object function, it calls
r_object to read marshalled binary data object by object.
To understand the type of an object to read, a single byte with
code is
read.
flag = code & FLAG_REF to understand if FLAG_REF is set. |
type = code & ~FLAG_REF to get only 7 bits (as all type code values are < 0x80).
Also a special macro
R_REF is
defined. This macro reads unmarshalled object into special
refs list with
r_ref function only if
FLAG_REF is set.
NB: on write (for mashalling) the
hashtable, instead of the list, is used.
So when
type is equal to
TYPE_REF, then
only the reference index is read, thus eliminating the need to unmarshall the already unmarshalled object again.