-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add options to exclude objects from dump_session() #475
base: master
Are you sure you want to change the base?
Conversation
26c7b72
to
298a099
Compare
@mmckerns The only new feature I'd like to see in dill 0.3.6 is this one (not necessarily exactly like it is now, I'm open to implementation changes). By the way, I'm thinking if the hard support drop for Python 2.7-3.6 wouldn't deserve a major version bump to 1.0. (One dot oh, yay! 🥳) It would be a clear signaling for the users. At least I would go read the release notes if one of my dependencies rolled out a new major release. |
~It would be good for this PR and #424 to have the same interface. Is there a reason that you only filter on name, regex, and type as opposed to filtering using more general functions. In psuedocode, a list of filters could be represented like: def has_name(name):
def _filter(obj):
return name == obj.__name__
def has_type(t):
def _filter(obj):
return isinstance(obj, t)
[
(ACCEPT, has_name("x")),
(REJECT, has_type(int))
] Since one argument is already known to be the object itself, we could get rid of the currying and just do this: def has_name(obj, name):
return name == obj.__name__
def has_type(obj, t):
return isinstance(obj, t)
[
(ACCEPT, has_name, "x"),
(REJECT, has_type, int)
] The benefit of the first option is that we can overload the This allows people to make custom filters based on the contents of the object as opposed to just the name and the type. The names Should we go in this direction?~ Ah I see. That is fine I guess. Why not put the filters and other excluding options under the same name? |
Uhmm... I wrote this code a month ago and I can't remember why I kept filters separated from the rest. I think that I was concerned about both types and filters being callable and that they couldn't be distinguished safely. However, only types are, well, I'll will unify the parameters and also rewrite the commits so that the code moving to There are more things to discuss from your comment... |
About the API in general, accepting only "exclude" filters makes the implementation simpler (but not simple at all). Adding "include" filters have two major implications:
import dill, re
from dill.session import include
dill.dump_session(excludes=[re.compile(r'prefix_.+'), include('prefix_keep')]) Internally, this argument would be converted to something like [
(EXCLUDE, re.compile(r'prefix_.+'),
(INCLUDE, 'prefix_keep'),
] and processed together with the rules from settings. I agree that this PR and #424 could use the same logic/code. |
I meant to strikethrough the comment I made (the part in ~.) I guess strikethrough doesn't work across lines. The include might be useful in some situations, but it makes things more confusing and won't help most people probably and maybe is not worth the added complexity. I didn’t mean change the interface. I meant change the names a little so it is like main_globals = dill.settings.main_globals
main_globals.exclude_ids.add(var) # stores id(var)
main_globals.exclude_regex.add(r"__.+__") # stores re.compile(r"__.+__")
main_globals.exclude_types.add("Function") # stores types.FunctionType
main_globals.exclude_regex.discard(r"__.+__") # does what is expected
main_globals.exclude_types.clear() # empties the types list
main_globals.filter # filter part of main_globals
Also, maybe |
@anivegesana, I got the attempted strikethrough. It seems to work on single paragraphs, you need to add the "~" to each one... So, I reached a compromise: exclude and include rules, yes, but all the include rules run after the exclude rules. This simplifies a lot the implementation, besides making it more efficient, but still adds some flexibility. As a bonus, while I was finishing the main filtering procedure, I noticed that the include rules could be easily treated as an allowlist if there are no exclude rules. It still needs tests, but the API is working like this: import dill, re
from dill.session import ExcludeRules, EXCLUDE, INCLUDE, ipython_filter # or just import *
dill.settings.session_exclude = ExcludeRules([
(EXCLUDE, 'some_name'), # a rule can be as simple as a 2-tuple with the rule type and a single filter...
(EXCLUDE, ['another_name', '__.+__', dict]), # ...or it can have a sequence of filters, even of different types
(INCLUDE, '__doc__'), # the names refer to "exclude", but also accept include rules, in any order
(EXCLUDE, re.compile(r'ascii_\w+', re.ASCII)), # regular expressions can be specified by strings or re.Pattern's
(EXCLUDE, (lambda obj: obj.name.istitle())), # filter functions receive a namedtuple with 'name' and 'value' attributes
])
# The convenience of adding types from the module types is now in a separated method:
rules = dill.settings.session_exclude
rules.exclude.add_type('Function')
# Clear in different levels (also update()):
rules.include.ids.clear()
rules.include.clear()
rules.clear()
# Filters passed to dump_session() are *added* to the settings' rules:
dill.dump_session(exclude=ipython_filter(), include=['spam', 'eggs']) # arguments can be a single filter or a sequence of filters
# If settings.exclude_rules is empty and we only pass 'include' rules, just the matched variables will be saved:
dill.dump_session(include=['save', 'only', 'these']) About naming... I liked
Of course, if you are using it more than once you can alias it with I'm not sure about the classes names by the way. |
I'll start a review shortly. Note that CI is failing for PyPy-3.7. |
Yup... And the error message doesn't help. I'll try to install a more recent version of PyPy and debug it. |
Note that the most recent version that Travis has is pypy3.7-7.3.5, while the latest version released (and tested on my dev boxes) is pypy3.7-7.3.9. |
I got |
@mmckerns, it seems that PyPy changed its types and I need these to make uqfoundation#475 pass on Travis.
@mmckerns, it seems that PyPy changed its types and I need these to make uqfoundation#475 pass on Travis.
2ea720a
to
3c87ae9
Compare
Opening the file diffs and the changelog side-by-side may help (that's how I wrote the changelog, following the order of changes in the files). |
Results of trying to pickle and unpickle all Python StdLib modules on my machine:
|
Damn keyboard shortcuts! |
Any way we can get this merged to unblock the 0.3.6 release? |
Any way we can help unblock this ? python 3.11 release is approaching fast now and we need a release of dill to be able to use python 3.11 in pylint and prospector. |
665bcab
to
78f5e2d
Compare
* minor changes to the codebase from #475 * roll back some changes (moved to the refonfail PR) * add _session to Pickler --------- Co-authored-by: Leonardo Gama <[email protected]>
Additions and changes in this PR by file
__init__.py
._dill
is_pickled_module()
todill
s namespacelogger
tologging
import reload
to inside the function where it's used to clear it from the global namespace_dill.py
from __future__ import annotations
dill.logging
in place of the stdliblogging
module due to the newTRACE
logging level__init__.py
dataclasses
up, close to other imports, and moved the definition ofEXTENSION_SUFFIXES
down, close to a newPYTHONPATH_PREFIXES
constant and the definition of_is_builtin_module()
, which uses both.singletontypes
toIPYTHON_SINGLETONS
, a more descriptive namepickles()
into a new constantUNPICKLEABLE_ERRORS
, which is also used elsewhere forrefonfail
_getopt()
to get an option from a named argument, falling back to settings, to reduce the clutterPickler
:dispatch
attribute and a Sphinx directive to hide the giant and uninformative representation of the dictionary that was included in the documentation page_refimported
,_refonfail
and_first_pass
with default (False
) values to thePickler
class (should this remove the need to test forpickler._session
before testing for them? not related tois_dill(pickler)
tests)_file
attribute ofPickler
instances by a more specific_file_tell
, used by the logger to compute the partial pickle sizes —following theself.file_write
pattern from thepickle
module (should this also be public?)Pickler.save()
:Numpy*Type
in eachif
test with a previous test ofNumpyArrayType
refonfail
__main__
or in a module-type object raises an exceptionbytes('0', 'UTF-8')
instances by thepickle.POP
constant_module_map()
and_lookup_module()
back here fromsession.py
, as they are also used by therefonfail
feature_module_map()
now receives a module as argument and doesn't include it in the module map, instead of making_lookup_module()
to test if the found object's module is not main_lookup_module()
:not_found
_global_string()
helper function_save_module_dict()
function, currently called bysave_module_dict()
for the main module at session saving whenrefonfail
is set. Uses the recursive strategy ofPickler.save()
to try to save a variable or any of its attributes by reference if there are pickling errors, and use_lookup_module()
as a last resort for the whole object. Follows the stratified rules discussed at Session: improvements to documentation + handle unpickleable objects #527.save_module_dict()
:refonfail
GLOBAL
and bytes literals in opcode strings@_weak_cache
decorator that implements a simple cache for results with aweakref.WeakKeyDictionary
, to not keep references to the modules passed as argument to the functions that use it_is_builtin_module()
:_lookup_module()
and_save_module_dict()
, possibly for the same few modules@_weak_cache
to speed things up a bitPYTHONPATH_PREFIXES
outside the functionFalse
for__main__
(and__mp_main__
), which is necessary for the correct behavior ofrefonfail
. It should not affect any existing code (I can't imagine a real world example that would be changed by this)_is_stdlib_module()
, which is also called multiple times and thus uses@_weak_cache
for performance._module_package()
that returns the top-level package of a module orNone
if it's not part of a package. Maybe this and the function above should be merged, returning'stdlib'
or'__stdlib__'
as the "package" of Standard Library modulessave_module()
:main
module at session saving with therefonfail
option set__module__
before removing thempickler._main_dict_copy
attribute, which is used bysave_module_dict()
latersave_type()
: usestacklevel=3
for warnings to reduce clutter in repeated warning messagessave_function()
: removed the_main_modified
flag;_original_main
is set conditionally_utils.py
dill.session
, but which may be used for other tasks in the future:FilterRules
,FilterSet
,NamedObject
size_filter
EXCLUDE
,INCLUDE
Filter
,FilterFunction
,Rule
,RuleType
__all__
?):_format_bytes_size
,_open
_open()
: function that returns a file/stream context manager and takes care of assuring that it has the necessary methods (e.g.peek
orseek
)logging.py
(renamed fromlogger.py
)INFO
) and thegetLogger()
function from the Standard Library modulelogging
, so that this submodule can be used in place of that in dill's modulesTRACE
betweenINFO
andDEBUG
, used by the pickling trace functionsINFO
logging level can be used to display only informative messages, for example, about variables filtered out bydump_module()
refonfail
featurelogger.trace(pickler, "X: ...", ...)
call now must receive the object being pickled as either 1. the last positional argument, or 2. theobj
keyword argument. Added examplesroll_back()
method ofTraceAdapter
_format_bytes_size()
function in_utils.py
, which is also used bysize_filter()
trace()
now accepts a string representation of a logging level (e.g."INFO"
) as argumentsession.py
_module_map()
and_lookup_module()
to_dill.py
BUILTIN_CONSTANTS
, used by_stash_modules()
_stash_modules()
:__loader__
attribute to a new function_fix_module_namespace()
IMPORTED_AS_TYPES
orIMPORTED_AS_MODULES
refimported
feature_filter_vars()
to apply the exclude/include filters passed as arguments todump_module()
or set indill.session.settings
_discard_added_variables
to remove empty variables possibly added byModuleType()
_fix_module_namespace()
joining code that was spread across multiple functionsdump_module()
:refonfail
,exclude
,include
andbase_rules
; maderefimported
keyword-only too_open()
as context manager in place of the try-except-finally blockPickler._main_modified
attribute,pickler._original_main
is set conditionally, if the main module was modified by_filter_vars()
or_stash_modules()
refonfail
isTrue
, set the attributes_refonfail
,_fille_seek
and_file_truncate
to allowPickler.save()
to roll back the stream if pickling any object failsrefonfail
feature_PeekableReader
class to_utils.py
_identify_module()
now uses a more strict test to determine if a pickle stream is a module pickleis_pickled_module()
that tests whether a pickle file was created bydump_module()
and if it's module is importable or not —if theidentify
argument isTrue
, return the module name instead ofTrue
load_module()
:module
parameter and the return value_open()
like indump_module()
Unpickler._main
, the only use of this was removed in a previous commitload_module_as_dict()
:update
parmeter_open()
context managerModuleFilter
derived fromFilterRules
, that can hold filters for specific modulesdill.session.settings
with settings specific for the moduleipython_filter()
for excluding IPython hidden variables from__main__
tests/
test_filtering.py
for testing the different of types of filters (input and usage); exclude and include filters interaction; plusfilter_size()
test_session.py
:_lookup_module()
prioritization when multiple versions of an object are found in different modulesrefonfail
behavior according to the proposed stratification of cases between "builtin" (installed) and "user" (not installed) modulesipython_filter()
using a simulated IPython interactive namespacedocs/source/
conf.py
:automodule
directives indill.rst
exclude-members
attributes that are either implementation details or are redundant and just clutter the documentation page:__dict__, __slots__, __module__, __annotations__, __dataclass_fields__ and _abc_impl
dill.rst
: removedautomodule
options set as default inconf.py
Linked issues
exclude
andinclude
indump_module()
*dill.session.ipython_filter()
dill.session.size_filter()
(*) renamed from
dump_session()
Remaining tasks to reach the 0.3.6 milestone
refonfail
configparser
(reference:logging.config
) — moved to Config file reader and format specification #545Originally in #527
Documentation:
module
argument toload_module()
load_module()
is_pickled_module(filename)
(what about its name and signature?)dill.dump(module, file)
anddill.dump_module(file, module)
, and the assumptions and use cases for each alternative — probably in the package's main documentation, or in a separatesession.py
module's docstringCode:
is_pickled_module()
returnFalse
for modules saved by name (and for modules saved by value inside other objects) — saved an empty dict as 'state' for modules saved by reference byrefonfail
, which also allowsload_module_asdict()
to read such files.Review
_save_module_dict()
)Original post
This is a proposed feature for excluding objects from the namespace to be saved with
dill.dump_session()
.Currently implemented are exclusion rules that discard global variables when the variable:
re.search('^%s$' % regex, name)
matches)True
when passed to a filter functionIt adds two new parameters to
dump_session()
:exclude
andfilters
. Theexclude
argument can be a mixed list of variable names, regular expressions, object ids and type objects, in any order. Thefilters
argument can be either a single filter function that receives an object and returnsTrue
if it should be discarded, or a list of such functions.For example, if the user has a variable
my_pi = 3.14
in the global namespace, any of the following calls would not savemy_pi
:Beyond that, there are two new entries in the
settings
dictionary calledsession_exclude
andsession_filters
, in the case the user wants to have the same rules applied to multiple calls todump_session()
. Thesession_exclude
entry is itself a dictionary with four keys:ids
,names
,regex
andtypes
. Its values, and alsosession_filters
, are instances of a subclass ofset
with an associated constructor function that can transform the object upon addition or do some checks before adding. This is a facility for including/excluding non trivial rules like regular expressions and abstract types.Important: Rules from
settings
are merged with those fromdump_session()
arguments, not overwritten by them.Here are examples of usage:
Finally, if
trace(True)
is set,dump_session()
will list the names and types of the variables excluded before the pickling trace.If accepted, this feature will serve as the base to implement better support for saving IPython sessions (possibly with a filter function that the user can pass to the
filters
parameter or add tosettings
).Related issues: #4, #66, #345
Quote from #4 (this is old! 😄):
P.S.: I chose "exclude" because "ignore" was already taken —and it's more faithful to the feature, as "ignore" is more faithful to ignoring objects that fail pickling.