The Detect Methods of Pythons Dill Package

[In Progress]

Possibly the greatest thing about OpenSource projects is that they combine democracy, communism, consensus and cooperation. Python, for example, was created by a man named Guido, but has been developed by dozens of not hundreds of others for free use and modification by any and all.

The particular package this article will explore is called Dill (named for the Pickle module it interfaces). Moth Dill and Pickle are mostly used to serialize objects so that they can be transferred between “processes”, but it’s also necessary if one wants to “save” an object for later use. Matt Rocklin compared the use of serialization to the Star Trek transport method.

But the Detect module of dill is useful even in simply exploring the make-up of objects and their relationships.

Below are listed the methods of Dills “Detect” module.

Convenience functions included with Dill are listed at the bottom of the article

dill.detect.children

children(obj, objtype, depth=1, ignore=())
"""Find the chain of referrers for obj. Chain will start with obj.

objtype: an object type or tuple of types to search for
depth: search depth (e.g. depth=2 is 'grandchildren')
ignore: an object or tuple of objects to ignore in the search

NOTE: a common thing to ignore is all globals, 'ignore=(globals(),)'

NOTE: repeated calls may yield different results, as python stores
the last value in the special variable '_'; thus, it is often good
to execute something to replace '_' (e.g. >>> 1+1).

"""

Detect.children tells us obj what objects refer to objs, directly and indirectly.

Note that ‘ignore’ expects a list so call it with ignore=(item,[item2]).
It’s often useful to ignore=(globals(),) (note the comma!).

If this is at all confusing, consider this read.

dill.detect.iscode

Some of the dill methods are calls to functions of Python’s inspect module. Python module of the week has a good explanation of the inspect module:

“The inspect module provides functions for learning about live objects, including modules, classes, instances, functions, and methods. You can use functions in this module to retrieve the original source code for a function, look at the arguments to a method on the stack, and extract the sort of information useful for producing library documentation for your source code.”

Here inspect.iscode calls on inspect.isinstance to return True if the object is a code object.


"""Return true if the object is a code object.
 
Code objects provide these attributes:
    co_argcount     number of arguments (not including * or ** args)
    co_code         string of raw compiled bytecode
    co_consts       tuple of constants used in the bytecode
    co_filename     name of file in which this code object was created
    co_firstlineno  number of first line in Python source code
    co_flags        bitmap: 1=optimized | 2=newlocals | 4=*arg | 8=**arg
    co_lnotab       encoded mapping of line numbers to bytecode indices
    co_name         name with which this code object was defined
    co_names        tuple of names of local variables
    co_nlocals      number of local variables
    co_stacksize    virtual machine stack space required
    co_varnames     tuple of names of arguments and local variables"""
return isinstance(object, types.CodeType)

Read more about Python CodeObjects.
Here is an example:

In [1]: def return42(): return 42

In [2]: return42.__code__
Out[2]: < code object return42 at 0x101e08230, file "", line 1>

In []: dill.detect.iscode(return42.__code__)
Out[3]: True

dill.detect.nested


nested(func):
    """get any functions inside of func (e.g. inner functions in a closure)

    NOTE: results may differ if the function has been executed or not.
    If len(nestedcode(func)) > len(nested(func)), try calling func().
    If possible, python builds code objects, but delays building functions
    until func() is called.
    """

If you don’t understand the difference between ‘code objects’ and ‘function objects’, here’s a read.

In this case ‘nested’ refers to code that is nested within the func.func_code object.
In the following example from S.O., a new function object is created using types.CodeType (from the types module) to override func.func_code.co_consts with another function object. First we see ‘foo’ without nested code object, then with.


Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> import types
>>> # Here's the original function
... def foo():
...   def bar():
...     print("    In bar orig")
...   def baz():
...     print("  Calling bar from baz")
...     bar()
...   print("Foo calling bar:")
...   bar()
...   print("Foo calling baz:")
...   baz()
... 
>>> dill.detect.nested(foo)
[]
>>> # This is our replacement function
... def my_bar():
...   print("   Woo hoo I'm the bar override")
... 
>>> # This creates a new code object used by our new foo function 
... # based on the old foo functions code object.
... foocode = types.CodeType(
...     foo.func_code.co_argcount,
...     foo.func_code.co_nlocals,
...     foo.func_code.co_stacksize,
...     foo.func_code.co_flags,
...     foo.func_code.co_code,
...     # This tuple is a new version of foo.func_code.co_consts
...     # NOTE: Don't get this wrong or you will crash python.
...     ( 
...        foo.func_code.co_consts[0],
...        my_bar.func_code,
...        foo.func_code.co_consts[2],
...        foo.func_code.co_consts[3],
...        foo.func_code.co_consts[4]
...     ),
...     foo.func_code.co_names,
...     foo.func_code.co_varnames,
...     foo.func_code.co_filename,
...     foo.func_code.co_name,
...     foo.func_code.co_firstlineno,
...     foo.func_code.co_lnotab,
...     foo.func_code.co_freevars,
...     foo.func_code.co_cellvars )
>>> 
>>> # This is the new function we're replacing foo with
... # using our new code.
... foo = types.FunctionType( foocode , {})
dill.detect.nested(foo)
[< function my_bar at 0x1023bade8>]

dill.detect.nestedcode


nestedcode(func):
"""get the code objects for any nested functions (e.g. in a closure)"""

Whereas detect.nested returns a function object, detect.nested returns the code objects for the nested function. So:


>>> dill.detect.nestedcode(foo)
[< code object my_bar at 0x1023bb5b0, file "", line 2>, < code object baz at 0x100436e30, file "", line 5>]

It’s basically looping through foo.func_code.co_consts and returning a list of the code objects who’s values are not ‘None‘.

dill.detect.trace


'print a trace through the stack when pickling; useful for debugging'

Simple example:


import dill

class GrandParent(object):
    
    lastname="Smith"
        
dill.detect.trace(True)

with open('my_object.pkl', 'wb') as f:
    dill.dump(GrandParent, f)

Result:


T2: < class '__main__.GrandParent'>
F2: < function _create_type at 0x102299f50>
T1: < type 'type'>
F2: < function _load_type at 0x102299ed8>
T1: < type 'object'>
D2: < dict object at 0x1022b15c8>

dill.detect.baditems

dill.detect.baditems is supposed to check for “bad items” inside of an object (i.e. check what is inside of an object that doesn’t pickle). Here’s an example of an object without any unpicklable items, followed by an example using dill to examine globals().


>>> x = iter([1,2,3,4,5])
>>> x

>>> import dill
>>> # everything inside a listiterator is serializable
>>> dill.detect.baditems(x)
[]
>>> # however, not everything in globals is serializable
>>> dill.detect.baditems(globals())
[<module '__builtin__' (built-in)>, ]
# All items returned by globals() except the above two are picklable

dill.detect.errors

dill.detect.isfunction

dill.detect.outermost

dill.detect.varnames

dill.detect.badobjects

dill.detect.freevars

dill.detect.ismethod

dill.detect.parent


parent(obj, objtype, ignore=())
"""
>>> listiter = iter([4,5,6,7])
>>> obj = parent(listiter, list)
>>> obj == [4,5,6,7]  # actually 'is', but don't have handle any longer
True

NOTE: objtype can be a single type (e.g. int or list) or a tuple of types.

WARNING: if obj is a sequence (e.g. list), may produce unexpected results.
Parent finds *one* parent (e.g. the last member of the sequence).
    """

Detect.parent simply returns the last object returned by detect.parents.
(Note: ‘is’ will return True if two variables point to the same object, ‘==’ if the objects referred to by the variables are equal. In the above case ‘obj’ points/refers to ‘[4,5,6,7]’, but is not the same object.)

dill.detect.badtypes

dill.detect.globalvars

dill.detect.istraceback

dill.detect.parents

parents(obj, objtype, depth=1, ignore=())
"""Find the chain of referents for obj. Chain will end with obj.

    objtype: an object type or tuple of types to search for
    depth: search depth (e.g. depth=2 is 'grandparents')
    ignore: an object or tuple of objects to ignore in the search
"""

Detect.parents shows us what objects are reachable from obj, directly and indirectly.

(See ‘ignore’ note under detect.children above).

 Convenience Functions:

dill.detect.PY3

Is actually a variable calling a statement which returns True if the script is being run in Python version >= 3 otherwise False

dill.detect.PY3 # notice no '()' - it's not a method

dill.detect.reference

This is actually a call to _proxy_helper
(from .dill import _proxy_helper as reference)


def _proxy_helper(obj)
    """get memory address of proxy's reference object"""

References increase flexibility in where objects can be stored, how they are allocated, and how they are passed between areas of code. Pointers, due to their intimate relationship with the underlying hardware, they are one of the most powerful and efficient types of references. However, also due to this relationship, pointers require a strong understanding by the programmer of the details of memory architecture.
Example:


>>> class Parent(object):
...     name="Big Papa"
...     def get_hitched(self, partner):
...             return self.name + " + " + partner + " TLFE"
... 
>>> johnny = Parent()
>>> johnny.get_hitched("Mary")
'Big Papa + Mary TLFE'
>>> billy = johnny.get_hitched
>>> billy("Junebug")
'Big Papa + Junebug TLFE'
>>> dill.detect.reference(billy)
4299844816
>>> dill.detect.reference(johnny.get_hitched)
4299844816
>>> dill.detect.reference(johnny)
4299844816
>>> dill.detect.reference(johnny.name)
4332529152
>>> dill.detect.reference(Parent)
4332953328

 

dill.detect.at

This is also a call to another function:
(from .dill import _locate_object as at)


def _locate_object(address, module=None):
    """get object located at the given memory address (inverse of id(obj))"""

>>> dill.detect.at(4332953328)
<class '__main__.Parent'>
>>> dill.detect.at(4299844816)
<__main__.Parent object at 0x1004a6cd0>

Detect.at() takes a reference number and, if it is a reference number that would be handle by the garbage collect module, returns the object that it is associated with. Otherwise returns ReferenceError: and the hex value of the reference number.
For example instance attributes are not managed by GC because they will be destroyed when their instances are destroyed:


>>> dill.detect.reference(johnny.name)
4332529152
>>> dill.detect.at(4332529152)
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/mikekilmer/Envs/GLITCH/lib/python2.7/site-packages/dill/dill.py", line 738, in _locate_object
    raise ReferenceError("Cannot reference object at '%s'" % address)
ReferenceError: Cannot reference object at '0x1023d2600'

dill.detect.code

code(func):
    '''get the code object for the given function or method

    NOTE: use dill.source.getsource(CODEOBJ) to get the source code
    '''

Returns the CodeObject for argument provided. Similar to ‘return42.__code__’

dill.detect.isframe

Call to inspect.isframe().


'''Return true if the object is a frame object.
'''

Frame objects represent execution frames. They may occur in traceback objects. Python creates a frame for every new call you make. You can think of a frame object as being like an instance of a function. Read about frame objects in the Python data model reference.

The following example code is lifted from FrameHacks:


import sys
import dill

def one():
    two()

def two():
    three()

def three():
    for num in range(3):
        frame = sys._getframe(num)
        show_frame(num, frame)

def show_frame(num, frame):
    print frame
    frame_obj = sys._getframe(num)
    print ("  frame     = {}".format(frame_obj))
    print("sys._getframe(frame {}) is a frame object: {}").format(num, dill.detect.isframe(frame_obj))
    print("  function  = {}()".format(frame.f_code.co_name))
    function_is_frame = dill.detect.isframe(frame.f_code.co_name)
    print("frame.f_code.co_name Is a frame object: {}").format(function_is_frame)

one()

It returns:


  frame     = 
sys._getframe(frame 0) is a frame object: True
  function  = three()
frame.f_code.co_name Is a frame object: False

  frame     = 
sys._getframe(frame 1) is a frame object: True
  function  = two()
frame.f_code.co_name Is a frame object: False

  frame     = 
sys._getframe(frame 2) is a frame object: True
  function  = one()
frame.f_code.co_name Is a frame object: False

https://docs.python.org/2/library/types.html