I've tried to understand how CPython's "long" value is stored in memory. To do this I used the WinDbg.
I've used the next long value for check:
Using bytes format:
00000000`0034ec60 02 00 00 00 00 00 00 00 e0 65 29 1e 00 00 00 00
00000000`0034ec70 05 00 00 00 00 00 00 00 11 11 00 00 fc ff bb 3b
00000000`0034ec80 de dd cd 0c f3 ee ae 2a aa 00 00 00 00 f0 ad ba
Or using WinDbg's pointer and symbol memory display format:
00000000`0034ec60 0000000000000002
00000000`0034ec68 000000001e2965e0 python27!PyLong_Type
00000000`0034ec70 0000000000000005
00000000`0034ec78 3bbbfffc00001111
00000000`0034ec80 2aaeeef30ccdddde
00000000`0034ec88 baadf000000000aa
Knowing the structure of PyLongObject:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
which expands to:
struct _longobject {
Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;
Py_ssize_t ob_size;
digit ob_digit[1];
};
0xaaaabbbbccccddddeeeeffff00001111L
226855257439031502727993705501399453969
I've used the next long value for check:
0xaaaabbbbccccddddeeeeffff00001111 in hexadecimal, or 226855257439031502727993705501399453969L in decimal format.
So on Windows x64, I've the next memory representation of PyLongObject:
Using bytes format:
00000000`0034ec60 02 00 00 00 00 00 00 00 e0 65 29 1e 00 00 00 00
00000000`0034ec70 05 00 00 00 00 00 00 00 11 11 00 00 fc ff bb 3b
00000000`0034ec80 de dd cd 0c f3 ee ae 2a aa 00 00 00 00 f0 ad ba
Or using WinDbg's pointer and symbol memory display format:
00000000`0034ec60 0000000000000002
00000000`0034ec68 000000001e2965e0 python27!PyLong_Type
00000000`0034ec70 0000000000000005
00000000`0034ec78 3bbbfffc00001111
00000000`0034ec80 2aaeeef30ccdddde
00000000`0034ec88 baadf000000000aa
On x86 version of Windows the memory representation is the next:
022bb300 00000002
022bb304 1e1f25e0 python27!PyLong_Type
022bb308 00000009
022bb30c 00001111
022bb310 77777ffc
022bb314 199b5dde
022bb318 555d6ef3
022bb31c baad00aa
Knowing the structure of PyLongObject:
struct _longobject {
struct _longobject {
Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;
Py_ssize_t ob_size;
digit ob_digit[1];
};
As on x86 version:
typedef unsigned short digit;
#define PyLong_SHIFT 15
and on x64 version:
typedef PY_UINT32_T digit;
#define PyLong_SHIFT 30
According to the structure, let's consider the contents of the memory:
0000000000000002 - ob_refcnt
000000001e2965e0 - ob_type
It is easy to check the type value in Python interpreter:
>>> hex(id(long))
'0x1e2965e0L'
0000000000000005 - ob_size. Number of digit objects (for x64 one digit is uint32).
3bbbfffc00001111 - ob_digit
2aaeeef30ccdddde - ob_digit
baadf000000000aa - ob_digit
2aaeeef30ccdddde - ob_digit
baadf000000000aa - ob_digit
Here the value baadf000 (or baad for alignment on the above x86 example) is a value of uninitialized allocated heap memory.
Now the main question is how to restore the original long integer value from memory. Successfully I've found the answer in CPython sources.
Long integer representation.
The absolute value of a number is equal to
SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
Negative numbers are represented with ob_size < 0;
zero is represented by ob_size == 0.
Knowing that I've wrote a function to restore long value. Here is script with hardcoded bytes from the memory above:
import struct import platform is_x64 = platform.architecture()[0] == '64bit' if is_x64: SHIFT = 30 digit_size = struct.calcsize('I') else: SHIFT = 15 digit_size = struct.calcsize('H') def restore_python_long(): """ Restore python long integer value from memory """ # hard coded data bytes from memory if is_x64: ob_size_data_str = '05 00 00 00 00 00 00 00' ob_digit_data_str = '11 11 00 00 fc ff bb 3b de dd cd 0c f3 ee ae 2a aa 00 00 00' else: ob_size_data_str = '09 00 00 00' ob_digit_data_str = '11 11 00 00 fc 7f 77 77 de 5d 9b 19 f3 6e 5d 55 aa 00' ob_size_data = ob_size_data_str.replace(' ', '').decode('hex') ob_digit_data = ob_digit_data_str.replace(' ', '').decode('hex') # get ob_size if is_x64: ob_size = struct.unpack('q', ob_size_data)[0] else: ob_size = struct.unpack('i', ob_size_data)[0] if ob_size == 0: return 0L # get digits digits = [] for i in xrange(0, abs(ob_size)): digit_value = ob_digit_data[i * digit_size: i * digit_size + digit_size] if is_x64: digits.append(struct.unpack('I', digit_value)[0]) else: digits.append(struct.unpack('H', digit_value)[0]) # restore long value = 0L for i in xrange(0, abs(ob_size)): value += digits[i] * 2 ** (SHIFT * i) if ob_size < 0: value = -value return value value = restore_python_long() print hex(value) print valueAnd the result is the same as input:
0xaaaabbbbccccddddeeeeffff00001111L
226855257439031502727993705501399453969