MongoArchiver issues #6

NicolaDonelli · 2021-09-06T18:21:23Z

There are a few possible issues with cgnal.data.layer.mongo.archivers.MongoArchiver class.

the signature of "retrieve" method is not compatible with the one of Archiver, since Archiver.retrieve requires *args and **kwargs. I'd propose to change the signature of the ABC making it compatible with our requirements for Pandas and Mongo Archivers
to make archiveOne and archiveMany signature coherent, so that archive could have a simpler output type, wouldn't it be better for this method to return self.archiveMany([obj]) (like we did with PandasArchiver)?
'archive' method's output type is not consistent with its' ancestor's return type (that should be 'MongoArchiver'). I'd propose to fix this returning self instead of those UpdateResults

deusebio · 2021-09-14T13:51:18Z

Being the "archive" a side-effect probably makes also sense to return None (and this would also give the same output between archiveOne and archiveMany, which should always be None). Another possibility (but I would do it later in the future when we integrate Mondas nicely in the package) would be to return an Either, in order to also handle errors functionally.

deusebio · 2021-09-14T14:02:07Z

about the arguments of retrieve it is actually quite a pain. Unfortunately we don't have an abstraction for "Query" yet, that is an object that represents the filters one wants to apply when querying persistence layers that is agnostic from the actual implementation (SQL wants strings, Pandas we could use functions, MongoDB we use dicts). I agree, this is really bothering me, but unfortunately, I don't quite have an elegant solution for handling this.

What we started to do in CIB News is to extend from the Archiver, implement methods that do the querying and call the retrieve with the generic output, e.g.

class DocumentsArchiver(Archiver):
    @abstractmethod
    def byPublishDate(publishDate: str): ...

class MongoDocumentsArchiver(MongoArchiver, DocumentsArchiver):
    
    def  byPublishDate(publishDate: str):
        return self.retrieve({"publishDate": publishDate})

class PandasDocumentsArchiver(PandasArchiver, DocumentsArchiver):
    
    def  byPublishDate(publishDate: str):
        return self.retrieve(lambda df: df[df["publishDate"] == publishDate])

In the code, we never use retrieve directly, but we generally use the custom retrieving methods, such as byPublishDate

But maybe, here, @nromeoarena can advise what it is best to do

memoclaudio · 2021-10-01T08:59:35Z

In my opinion, we should avoid function returning Node. For functions like archive, I do agree with @deusebio that Either could be the best solution. It will at least return the status of the operation in order to track errors.

memoclaudio · 2021-10-01T09:09:07Z

As an input object for the archive, I also agree with @deusebio and we should think more about a "Query" abstract object that will implement query in a specific language like QueryMongo, QueryPandas, QuerySQL. We can then create a factory that actually builds specific queries instead of putting the logic of a query inside the Archiver. As example

class ClientQueryBuilder(MongoQueryBuilder): def retrieveByDate(data :str) -> MongoQuery: def retrieveClientByRegion(region: str) -> MongoQuery:

the archiver will then contain only the archive method def archive(query: Query) -> Either and def retrieve(query: Query) -> Either, according to this refactoring the archiver will only contain the "connection to the database" logic.

Moreover if we centralize the querybulder/queryfactory we have a centralized point where we actually build the query and we do not need to write the same query in different part of the code without touching the archiver that will be finally independent form business logic.

NicolaDonelli added the good first issue Good for newcomers label Sep 7, 2021

deusebio added avanced issue and removed good first issue Good for newcomers labels Jan 31, 2022

NicolaDonelli transferred this issue from CGnal/cgnal-data-ingestion Feb 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MongoArchiver issues #6

MongoArchiver issues #6

NicolaDonelli commented Sep 6, 2021 •

edited

Loading

deusebio commented Sep 14, 2021

deusebio commented Sep 14, 2021 •

edited

Loading

memoclaudio commented Oct 1, 2021

memoclaudio commented Oct 1, 2021 •

edited

Loading

MongoArchiver issues #6

MongoArchiver issues #6

Comments

NicolaDonelli commented Sep 6, 2021 • edited Loading

deusebio commented Sep 14, 2021

deusebio commented Sep 14, 2021 • edited Loading

memoclaudio commented Oct 1, 2021

memoclaudio commented Oct 1, 2021 • edited Loading

NicolaDonelli commented Sep 6, 2021 •

edited

Loading

deusebio commented Sep 14, 2021 •

edited

Loading

memoclaudio commented Oct 1, 2021 •

edited

Loading