Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unflatten with lists #8

Open
benbowen opened this issue Jun 5, 2019 · 17 comments
Open

unflatten with lists #8

benbowen opened this issue Jun 5, 2019 · 17 comments

Comments

@benbowen
Copy link

benbowen commented Jun 5, 2019

Flattening a nested dict that contains lists works great, but unflatten makes dicts instead of lists when index is list index. I rewrote part of your lib to unflatten for my needs and thought you might want to integrate it into you unflatten.

I'm worried that my changes aren't generic enough work for all kinds of mixed list with dict.

Here is I how did the unflattening. The only function I change is this one:

def nested_set_dict(d, keys, value):
    """Set a value to a sequence of nested keys

    Parameters
    ----------
    d : Mapping
    keys : Sequence[str]
    value : Any
    """
    assert keys
    key = keys[0]
    if len(keys) == 1:
        if type(d) == list:
            d.append(value)
        else:
            d[key] = value
        return

    # the type is a string so make a dict if none exists
    if type(keys[1]) == int:
        if key in d:
            pass
        else:
            d[key] = []
        d = d[key]
    elif type(key)==int:
        if (key+1) > len(d):
            d.append({})
        d = d[key]
    else:
        d = d.setdefault(key, {})
    nested_set_dict(d, keys[1:], value)

Testing it out:

d1 = {'a':{'b':[{'c1':'nested1!','d1':[{'e1':'so_nested1!!!'}]},
               {'c2':'nested2!','d2':[{'e2':'so_nested2!!!'}]},
               {'c3':'nested3!','d3':[{'e3':'so_nested3!!!'}]},
               {'c4':'nested4!','d4':[{'e4':'so_nested4a!!!'},
                                      {'e4':'so_nested4b!!!'},
                                      {'e4':'so_nested4c!!!'},
                                      {'e4':'so_nested4d!!!'},
                                      {'e4':'so_nested4e!!!'}]}]}}    

Flatten works great for this out of the box

df = mzm.flatten(d1,enumerate_types=(list,))
kv = sorted([(k,v) for (k,v) in df.items()])

(('a', 'b', 0, 'c1'), 'nested1!')
(('a', 'b', 0, 'd1', 0, 'e1'), 'so_nested1!!!')
(('a', 'b', 1, 'c2'), 'nested2!')
(('a', 'b', 1, 'd2', 0, 'e2'), 'so_nested2!!!')
(('a', 'b', 2, 'c3'), 'nested3!')
(('a', 'b', 2, 'd3', 0, 'e3'), 'so_nested3!!!')
(('a', 'b', 3, 'c4'), 'nested4!')
(('a', 'b', 3, 'd4', 0, 'e4'), 'so_nested4a!!!')
(('a', 'b', 3, 'd4', 1, 'e4'), 'so_nested4b!!!')
(('a', 'b', 3, 'd4', 2, 'e4'), 'so_nested4c!!!')
(('a', 'b', 3, 'd4', 3, 'e4'), 'so_nested4d!!!')
(('a', 'b', 3, 'd4', 4, 'e4'), 'so_nested4e!!!')

d2 = {}
for key_value in kv:
    k = key_value[0]
    v = key_value[1]
    nested_set_dict(d2,k,v)

Gives

d1 =

{'a': {'b': [{'c1': 'nested1!', 'd1': [{'e1': 'so_nested1!!!'}]}, {'c2': 'nested2!', 'd2': [{'e2': 'so_nested2!!!'}]}, {'c3': 'nested3!', 'd3': [{'e3': 'so_nested3!!!'}]}, {'d4': [{'e4': 'so_nested4a!!!'}, {'e4': 'so_nested4b!!!'}, {'e4': 'so_nested4c!!!'}, {'e4': 'so_nested4d!!!'}, {'e4': 'so_nested4e!!!'}], 'c4': 'nested4!'}]}}

d2 =

{'a': {'b': [{'c1': 'nested1!', 'd1': [{'e1': 'so_nested1!!!'}]}, {'c2': 'nested2!', 'd2': [{'e2': 'so_nested2!!!'}]}, {'c3': 'nested3!', 'd3': [{'e3': 'so_nested3!!!'}]}, {'d4': [{'e4': 'so_nested4a!!!'}, {'e4': 'so_nested4b!!!'}, {'e4': 'so_nested4c!!!'}, {'e4': 'so_nested4d!!!'}, {'e4': 'so_nested4e!!!'}], 'c4': 'nested4!'}]}}
@ianlini
Copy link
Owner

ianlini commented Sep 13, 2019

Thanks for the advice.
This is doable, but we need some design to make this general and intuitive enough.
My first thought is adding a parameter list_index_types to define when to create list.
If the splitter function generates a tuple with an element with a type in list_index_types, then we can look that element as a list index.
If an index i doesn't exist and there is some index bigger than i, I think making that element to be None is better.

AlexTelon added a commit to AlexTelon/flatten-dict that referenced this issue Oct 25, 2019
unflatten does not support lists, the github user benbowen created an
issue: ianlini#8 and described a
solution that worked for him.
AlexTelon added a commit to AlexTelon/flatten-dict that referenced this issue Oct 25, 2019
unflatten does not support lists, the github user benbowen created an
issue: ianlini#8 and described a
solution that worked for him.
@KoreyPeters
Copy link

This would be a useful feature to me as well. It seems natural that the flatten/unflatten process should produce the same output as input, but that is not the case now.

@ysfchn
Copy link

ysfchn commented Aug 1, 2020

Sorry for reviving the issue, but any updates about it? Because I'm also flattering a dictionary that contains arrays, but unflattering it results with list indices that converted to dictionary keys.

@ianlini
Copy link
Owner

ianlini commented Aug 1, 2020

Because there is no further feedback about the design, and it seems to be the most requested feature, I will implement it according to my last comment. Not sure about the timing, maybe in 1 or 2 months.

I would like to emphasize this again: I'm not expecting flatten() or unflatten() to be invertible. I really want to make them invertible, but I couldn't figure out the way. If you think it's possible, then please kindly give me the idea. Otherwise, you can only expect that a may not equal to unflatten(flatten(a)) except that a has some constraints and you use correct arguments for flatten() and unflatten().

@ysfchn
Copy link

ysfchn commented Aug 1, 2020

/test/0/example

If one of the keys in the path just contains a number, then it can take as a list and insert the object in the specified index (in this case 0), but yes, this can be a dictionary key too. Then maybe in the flatten() method, you can show list indices between a different character like this: /test/[0]/example, so when unflattering, it can know this is a list or not. But this will also affect keys that contains [ and/or ]

Then the only choice will be making these flatten() and unflatten() methods as class objects, and with keypath objects (that will contain the key path and it will have own properties and methods like getvalue() (to get the value of key path), etc), so it would be easier for you to implement new features maybe. Because as class objects will have their own properties, it will be much easier and readable (for us and for you) in the flatten and unflatten operations.

Sorry, I'm not experienced well in nested dictionaries and recursive stuff, because I know too how it is hard to deal with them, so I can only say these.

@ianlini
Copy link
Owner

ianlini commented Aug 3, 2020

@ysfchn , thanks for your suggestion. I have considered making the key as a special object. It is one of the most feasible idea in my mind. It's great to see that you have similar idea.

I have also considered making flatten() and unflatten() as methods of some class. I think it's not related to the keypath idea because they can be done separately. The benefit of making a class is that we don't need to worry about making corresponding arguments when calling unflatten() after flatten().

Anyway, one of the difficulties is that I don't really know how people use this library. I guess people use it very differently, and I actually only use this library in some simple way.

For example, If we make the key as a special object, then {"test/0": 1} cannot be unflatten to {"test": [1]} because "test/0" is not our special object. They should transform the dict into something like {KeyPath("test", ListIndex(0)): 1}. This design is very useful when the dict we want to unflatten() is always generated by flatten(), but I don't know whether unflattening {"test/0": 1} is also important. To be a general library, we might need to support both ways without making things complicated.

@aneuway2
Copy link

Hi! I'm investigating switching to using this project from another dict flattening library and this is one of the missing features that I would need.

I was able to easily switch this out using the code that @benbowen provided, but it looks like 2 other use cases are missing:

  • nested lists e.g. example.0.0.world which we use in some of our API requests/responses
  • list values are not always ints, sometimes they are strings and can be cast to int
def nested_set_dict(d, keys, value):
    # https://github.com/ianlini/flatten-dict/issues/8
    """Set a value to a sequence of nested keys

    Parameters
    ----------
    d : Mapping
    keys : Sequence[str]
    value : Any
    """
    assert keys
    key = keys[0]
    if len(keys) == 1:
        if type(d) == list:
            d.append(value)
        else:
            d[key] = value
        return

    # convert to int if it is a string digit
    if isinstance(keys[1], str) and keys[1].isdigit():
        keys[1] = int(keys[1])

    # the type is a string so make a dict if none exists
    if type(keys[1]) == int:
        if key in d:
            pass
        elif type(d) == list and type(key) == int:
            if not d:
                d.append([])
            if key == len(d):
                d.append([])
        else:
            d[key] = []
        d = d[key]
    elif type(key) == int:
        if (key + 1) > len(d):
            d.append({})
        d = d[key]
    else:
        d = d.setdefault(key, {})
    nested_set_dict(d, keys[1:], value)
flatten_dict.nested_set_dict = nested_set_dict

Example to unflatten:

{
    "hello.world.0.item.inside.0.0.again": False,
    "hello.world.0.item.inside.0.1.again": True,
    "hello.world.0.item.inside.1.0.andagain": 1,
    "hello.world.0.item.inside.1.1.andagain": 2,
}

Example to flatten:

{
    "data": [
        {
            "active": True,
            "conditions": {
                "field": "segment_group",
                "operator": "and",
                "value": [
                    [
                        {
                            "action": "include",
                            "segment_id": 94427
                        },
                        {
                            "action": "include",
                            "segment_id": 94431
                        }
                    ]
                ]
            },
        }
    ]
}

@benbowen
Copy link
Author

I just saw this in my email with the at-mention. I'm so sorry for 1 year of silence! but I'm excited that others are looking and working on this. The flatten/unflatten is an essential step in a pipeline that I have to run. I'm transferring the "battle code" I wrote for this pipeline about a year ago to another developer and by then if not sooner we will check out your commits and others suggestions.

@ianlini
Copy link
Owner

ianlini commented Sep 25, 2020

@benbowen I'm planning to implement this next week.

@whardier
Copy link

whardier commented Dec 11, 2020

If I can propose using the Ellipsis (python2.7) or ... (python3+) variable in place of array indexes.. that would indicate (at least with tuples formatting) that this is part of a list.

{'roles': [
    {'uuid': {'$uuid': '55e119ce-3b4f-11eb-adc7-00163e0987ed'}},
    {'uuid': {'$uuid': '55e11cee-3b4f-11eb-adc7-00163e0987ed'}}
]}
[(('roles', ..., 'uuid', '$uuid'), '55e119ce-3b4f-11eb-adc7-00163e0987ed'),
 (('roles', ..., 'uuid', '$uuid'), '55e11cee-3b4f-11eb-adc7-00163e0987ed')]

Or just use a type:

>>> (1,2,3,list,4,5,6)
(1, 2, 3, <class 'list'>, 4, 5, 6)

@shivam-gaur-mox
Copy link

Hey! Thanks for creating this library 🥇

Would like to 1 up this issue as well. I'm looking into using this for a project which could benefit from dict flattening / unflattening - but unfortunately I would need this feature (i.e. given a dictionary which contains arrays in one or more values - flattening then unflattening results should result in arrays not being converted to dictionaries).

@ori-levi
Copy link

ori-levi commented Jan 12, 2021

Hey @ianlini,
Sorry for bringing this up.
I think I've found a solution inspired by JsonPath

You can represent the dict key split by any delimiter, but when it's come to lists, append to the key the index.
something like this:

{
    "data": [
        {
            "active": True,
        },
        {
            "active": False,
        }
    ],
    "another-dict": {
        1: "a",
        2: "b"
    }
}

flatten_dict

{
    "data[0].active": True,
    "data[1].active": False,
    "another-dict.1": "a",
    "another-dict.2": "b",
}

What do you think of this solution?

I might implement this later this day and open a pull request to your library.

@ianlini
Copy link
Owner

ianlini commented Jan 12, 2021

Thanks @ori-levi, This might be a good starting point.

I tried to implement a general version for all kinds of splitters a few months ago, but I found that there are so many edge cases and different behaviors to decide. The edge cases make the behavior less intuitive and less general no matter how I design it. After thinking a lot of those cases, I had a concrete idea on the requirements, but I became very busy before finishing the implementation.

I knew JsonPath long time ago and use it a lot, but I seldom use flatten-dict. I am actually very curious about why people don't simply use JsonPath to access their dict if they only want to use a string as key to access it, so I didn't think in that direction. Anyway, I will be very happy if we can first have a reducer and splitter pair that can flatten a dict into your JsonPath format and unflatten it back. I will dig into making it more general or customizable in the future.

@ori-levi
Copy link

@ianlini I just finish do develop the suggested solution, with JSONPath.
Note that only JSONPath is reverseable.

I reformatted my code and write some test and open pull request for this.
I hope to do this before Sunday.

@HoernchenJ
Copy link

@ianlini & @ori-levi Hello, are there any updates to this planed feature?

@transfluxus
Copy link

Is there a fork, which does it?

@andchir
Copy link

andchir commented Dec 7, 2024

Solution: flatten_json
https://github.com/amirziai/flatten

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests