Tuesday, May 1, 2018


Last week my friend asked me about 0x80 bit that sometimes set in Python marshalled objects data in *.pyc file.
So I decided to understand what kind of flag it is and why it is used.

This is FLAG_REF — a special flag that was added in Python 3.4, and used in conjunction with TYPE_REF type code value. For source code look at: https://github.com/python/cpython/blob/3.4/Python/marshal.c.
In the current version (Python 3.7) everything is the same.

TYPE_REF = 'r'
FLAG_REF = 0x80

All currently used type code values are < 0x80 (the highest type code is TYPE_DICT = '{', i.e. 0x7b), so it was decided to store this flag right in the highest bit of type code value.

Now let's see how this flag works.

When marshal_load reads *.pyc file with read_object function, it calls r_object to read marshalled binary data object by object.
To understand the type of an object to read, a single byte with code is read.

flag = code & FLAG_REF to understand if FLAG_REF is set.
type = code & ~FLAG_REF to get only 7 bits (as all type code values are < 0x80).

Also a special macro R_REF is defined. This macro reads unmarshalled object into special refs list with r_ref function only if FLAG_REF is set.
NB: on write (for mashalling) the hashtable, instead of the list, is used.

So when type is equal to TYPE_REF, then only the reference index is read, thus eliminating the need to unmarshall the already unmarshalled object again.

No comments:

Post a Comment