-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Graduate to v1.0 API #442
Comments
@lidatong @s-vitaliy @matt035343 @healthmatrice @artificial-aidan and many others please let us know your thoughts on this one :) |
Updated to make code look more like its actual python and not Java lol. Re how we will, if agreed, do this - by creating a new "release" v1 branch and working there so people can test the lib from git commit before we even consider pushing to PyPI. |
I probably will hate the metaclass implementation. I used to use lots of metaclasses in one of my project. But in the end we almost ditched all the metaclasses and I never used them ever in other projects. Because it is still a class factory which can create lots of headache for type checkers. I think mypy is still struggling python/mypy#2653 https://github.com/python/mypy/labels/topic-metaclasses |
This is good to know! If mypy has issues with those, then the metaclass approach doesn't seem like a viable alternative. |
Mostly agreed. Furthermore, if anyone does already use metaclasses, their code would become even more broken because all metaclasses should derive from the same parent which is not always the case. @george-zubrienko personally I like the idea of going forward, but I think this changeset is too breaking to be a root cause of dependency management mess. The reason being is the fact DCJ is often used in the dependency libraries or frameworks, and since there's no way for pip to provide different versions for each of them, if any of them wants to update, every other one should update too. IMAO, we should keep the backwards compatibility while changing the core logic behind AND throwing a ton of DepreciationWarnings in the process. |
The real improvement I can ask for is to allow user to use precompilation step for (de)serializer (similarly to how DCs constructor work) and use those instead of relying on the meta fields. This change could be breaking and/or not working in existing user cases, thus I suggest hiding it behind either enable optimization Python flag or some DCJ setting (either per-class or global). |
Poetry solves this issue, but if we bump to 1.0 and cut out current functionality, users wont be able to mix those anyway. Worst case people will (they really should though) pin major to 0.x until their lib is ready to upgrade.
One way could be by making v1 API opt-in instead of forced-use and as you said, get some deprecation warnings in. But then how long v0 API should stay in code, and why can't we rely on people pinning major? |
I tend to get rid of annotation usage all together to be fair, so in your case this would go to a separate configuration/serializer class like |
The second part of my message has nothing to do with annotations. Imagine this code:
Which internally checks if there is a compiled serializer and, if not, compiles one so the futher runs would be faster. |
How would you achieve such behavior? Usually generics won't provide their own type variables (unless |
Makes sense, I was thinking of having something like So |
You can check for the actual type provided by class-scoped type variable and then try find a configured (compiled) serializer for it. Small correction, code will look like
|
is there any plan to support discriminator? https://github.com/Fatal1ty/mashumaro#discriminator-config-option |
I don't think we will need that at all with the new API. As I mentioned, class hierarchies should be ser-desered naturally in v1, at least that would be the goal. Re other libraries, I don't think taking functionality from there is a healthy thing. DCJ has a great core and what we seek is to improve that, not port features from another libs, especially if they won't be needed (hopefully). |
hey thanks so much @george-zubrienko for writing this up. i see that there is a lot of good thinking here and triggered a nice discussion :) thanks especially for synthesizing the user issues and pain points in general i would strongly prefer not to introduce such a major breaking change. technically, keeping the library pre-1.0 "reserves that right" but there are enough users of the library and downstream libraries that a breaking change to the core API would be suboptimal (https://github.com/lidatong/dataclasses-json/network/dependents) (reading above basically echoing @USSX-Hares) if we do decide to do breaking changes, i would really like them to be contained (e.g. refactoring the way unrecognized fields are handled is one thing on my mind) or perhaps gating them behind "optional" features like enabling a more optimized build of the library (i've been floating the idea of re-writing the parser core in C) however, i do see a lot of value in what you wrote about introducing the top-level (side note: can it just be so basically i think we can have our cake and eat it too by introducing additive API changes. so users that want the quality of life improvements that come with top level let me know if that makes sense to you? and thanks again this writeup is much appreciated |
quick sketch: to_json(person) # more typing-friendly, does not modify your user type, can be done on third-party types
person.to_json() # existing API, schema pre-compiled at module load time so maybe faster (doesn't do this currently) |
Hey thanks for kind words, I do believe something good will come out of this :) Re
I think it makes sense not only to me, but also to other contributors, as well as users. I tried to lay out different options so people can iterate a bit in their heads and comment on the solution they would like/support most. I think this is exactly what is happening right now, and it is good to see people being more or less aligned on how we should implement changes. I expect a bit more reactions here next week so we have a full picture.
|
Sort of related, but coming up with a way to do CI testing for typing on the new APIs. Typing has sort of become a requirement in python recently (from what I'm seeing) and making sure that all of the type checkers handle the new APIs would be great. My brief glance over the current CI is that it just runs mypy on the codebase (which isn't a bad thing), but a set of test files exercising the different corner cases for typing would be ideal. And maybe this just looks like running the type checkers on the test files 🤷♀️ |
Actually, we could do something like that by manually defining |
@artificial-aidan if you came up with these new test cases it would be great. |
@george-zubrienko, something like that: A lot of codefrom dataclasses import dataclass
from typing import *
T = TypeVar('T')
JsonPlain = str | bool | int | float | None
Json = JsonPlain | List['Json'] | Dict[str, 'Json']
class JsonSerializerImpl(Generic[T]):
generic_over: Type[T]
def __init__(self, generic_over: Type[T]):
self.generic_over = generic_over
def __repr__(self):
return f'{JsonSerializerImpl.__qualname__}[{self.generic_over.__qualname__}]'
def to_json(self, data: T) -> Json:
print(f"> {self!r}.{self.to_json.__qualname__}({data=!r})")
# Actual implementation here...
pass
return None
def __call__(self, data: T) -> Json:
return self.to_json(data)
class JsonSerializerGeneric(Generic[T]):
def __getitem__(self, item: Type[T]) -> JsonSerializerImpl[T]:
return JsonSerializerImpl(item)
def to_json(self, data: T) -> T:
warnings.warn("Please, use JsonSerializer[ClassRef].to_json(data); otherwise the method can go wrong.", DeprecationWarning, 2)
return self[type(data)].to_json(data)
def __call__(self, data: T) -> T:
warnings.warn("Please, use to_json[ClassRef](data); otherwise the method can go wrong.", DeprecationWarning, 2)
return self[type(data)](data)
to_json = JsonSerializer = JsonSerializerGeneric()
__all__ = \
[
'JsonSerializer',
'Json',
'to_json',
] And test it with the following:from dataclasses import dataclass
from typing import *
from method_getitem.definitions import JsonSerializer, to_json
@dataclass
class MyClass:
a: str
b: int
c = MyClass('field', 8)
print("JsonSerializer:")
JsonSerializer[MyClass].to_json(c)
JsonSerializer[MyClass].to_json('15')
JsonSerializer[str].to_json(c)
JsonSerializer[str].to_json('15')
JsonSerializer.to_json(c)
print()
print("to_json:")
to_json[MyClass](c)
to_json[MyClass]('15')
to_json[str](c)
to_json[str]('15')
to_json(c)
print() Execution result:
... I was about to write here, but then noticed a really strange behaviour of PyCharm -- when I call these test code from the file I defined the This may be related to this PyCharm issue I've reported a while ago. JsonSerializer: JsonSerializerGeneric = JsonSerializerGeneric()
to_json = JsonSerializer UpdateI've found the more proper solution that works with type checkers but requires metaclasses. An updated exampleReplace the class JsonSerializerMeta(type):
def __getitem__(self, item: Type[T]) -> JsonSerializerImpl[T]:
return JsonSerializerImpl(item)
def to_json(self, data: T) -> T:
warnings.warn("Please, use JsonSerializer[ClassRef].to_json(data); otherwise the method can go wrong.", DeprecationWarning, 2)
return self[type(data)].to_json(data)
def __call__(cls, data: T) -> T:
warnings.warn("Please, use to_json[ClassRef](data); otherwise the method can go wrong.", DeprecationWarning, 2)
return cls[type(data)](data)
class JsonSerializer(metaclass=JsonSerializerMeta):
pass
to_json = JsonSerializer |
Looks great, I'll have a more thorough look later this week. Re metaclasses, I think given the concerns expressed about them, maybe we should consider that as a last resort option, but good that it works :) Re the |
Unfortunately, GH doesn't support custom emoji reactions :c. I prefer option 2 because it's more clean. Alternatively, we can use the following construction (or any similar):
|
I like this one, but I'd also like to hear what other think, so let's see what pops up during the week! |
Voting sectionOn this message, and messages directly below, vote with 👍 or 👎 . Any variations are allowed. Suggestions, as well as new options, are allowed. Voting Option 1
(with parenthesis) |
Voting Option 2
(No parenthesis) |
Voting Option 3
(Class passed as argument) |
@USSX-Hares, I would prefer Option 2, but why do you need the generic type for converting to JSON at all? I suppose, we should have a generic from_json method that returns T and untyped to_json that accepts any dataclass. |
Also #31 should be solved in v1 API more or less - I'm switching pins so this issue gets a bit more attention when people come in :) |
@USSX-Hares, To make two first options work, there is needed some magic code that (imo) reduces the transparency of this library. What about a fourth option: Voting Option 4
Basically using the constructer to initialize instead of the custom getter? |
I'm actually in favor of option 4, as it is more pythonic and involves less magic method juggling. |
A side note: we should allow users to force data type for Also, I still recommend the option 3 since it is still Pythonic and avoids confusion like "should I instantiate JsonSerializer for this particular case?" unless you can both always do that and pass additional arguments to the constructor which were per-call parameters earlier. |
@george-zubrienko and the others, I am continuing my little interrogation.
|
Great questions :) below are my thoughts
I think most of those features should be part of "base serializer case", so we just build them into the Maybe we can also consider exposing those via toml file config section? Like this:
IMO that would be cool and clear and no need for runtime code config.
Excellent question, so I'd vote for this:
If it does not, we inherit settings from the parent, if not from parent, then from global
I'm actually so in favor of the toml file, I'd suggest instead of python code registration, we move this to toml file like i did in the example. We can have smth like
Registration reads this and adds them all to "current thread" cache. Note that "current thread" may be a bit of wishful thinking on my end re thread safety, so take that with a bit of salt :)
When
If we have such serializer, we apply the respecitve Now for the settings like letter etc, maybe this? Then each serializer instance can read them from conf easy?
Looking at toml, just follow the levels, if not defined for this serializer, take from global.
Yes, if we allow
That would be hard to implement, but even it was easy, I think stuff like this can lead to user code bases losing in maintainability and readability. Depends on the feature, however. Like, conditions to run or not run the specific ser behaviour can be controlled by class structure/class serializer settings. Generics, on the other hand, idk :) |
Alright starting the work on this one finally |
Description
Based on some discussions we see in issues around like #433 , #243, #139, #106 and probably more, I can pinpoint a few common pains:
@dataclass_json
or the subclass approach withDataClassJsonMixin
, the latter being similar to whatmashumaro
does. There are also a few things that can only be done from the annotation or global/field level config, most common example beingletter_case=...
.pylint
hates it so in our projects we always use both, which adds boilerplate and confusion, especially for new developers. Now we can also see from both recent and past issues that people in general have to be creative to make their static code analysis not hate the lib. This is a usability problem and I think it must be addressed one way or another.I'd like to propose a breaking API change in 1.0 release to get rid of those issues, and potentially greatly improve user experience.
Possible solution
Consider current quickstart:
Change it to this:
Now, this removes the need to have either annotation OR the subclass and simply relies on the class def supplied by generic/type arg (subject for discussion) and generates the ser-deser boilerplate under the hood. This approach formalizes the return type of all ser-deser operations, thus making static code analysis happy, fixing autocompletion AND removes the mutation of usercode.
Note this also allows to supply encoder-decoder configuration on either global or method level via
dataclasses_json.cfg.global_config.encoders
,dataclasses_json.cfg.global_config.decoders
. For method level we can allow to override decoders/encoders viaencoders=...
anddecoders=...
argument.For class level, we can do the same via
dataclasses_json.cfg.global_config.encoders[DataPerson]
. For class field level this will be a problem. Alternative to all this can be ditchingdataclasses_json.cfg.global_config.encoders/decoders
and instead doing this:Then users can create a simple initialization file for their code where this is all set up once and for the lifetime of the app. Other big plus of this approach is that we essentially lock ser-deser pair for the user and will be able to provide clear and understandable error message if things go sideways.
Alternatives
As an alternative, we can only ditch annotation and instead convert
mixin
to a metaclass, which will allow us to mutate user code as we see fit to achieve proper ser-deser behaviours. This would allow us to achieve similar results by rewriting less code, but will require people to have a bit higher level knowledge of python in order to understand how the lib works and how to use it for their (potentially) complicated use case. Also metaclasses is one of the areas that has seen a breaking change not so long ago.Context
For context, some thoughts from community members on this:
The text was updated successfully, but these errors were encountered: