unflatten with lists #8

benbowen · 2019-06-05T12:14:51Z

Flattening a nested dict that contains lists works great, but unflatten makes dicts instead of lists when index is list index. I rewrote part of your lib to unflatten for my needs and thought you might want to integrate it into you unflatten.

I'm worried that my changes aren't generic enough work for all kinds of mixed list with dict.

Here is I how did the unflattening. The only function I change is this one:

def nested_set_dict(d, keys, value):
    """Set a value to a sequence of nested keys

    Parameters
    ----------
    d : Mapping
    keys : Sequence[str]
    value : Any
    """
    assert keys
    key = keys[0]
    if len(keys) == 1:
        if type(d) == list:
            d.append(value)
        else:
            d[key] = value
        return

    # the type is a string so make a dict if none exists
    if type(keys[1]) == int:
        if key in d:
            pass
        else:
            d[key] = []
        d = d[key]
    elif type(key)==int:
        if (key+1) > len(d):
            d.append({})
        d = d[key]
    else:
        d = d.setdefault(key, {})
    nested_set_dict(d, keys[1:], value)

Testing it out:

d1 = {'a':{'b':[{'c1':'nested1!','d1':[{'e1':'so_nested1!!!'}]},
               {'c2':'nested2!','d2':[{'e2':'so_nested2!!!'}]},
               {'c3':'nested3!','d3':[{'e3':'so_nested3!!!'}]},
               {'c4':'nested4!','d4':[{'e4':'so_nested4a!!!'},
                                      {'e4':'so_nested4b!!!'},
                                      {'e4':'so_nested4c!!!'},
                                      {'e4':'so_nested4d!!!'},
                                      {'e4':'so_nested4e!!!'}]}]}}

Flatten works great for this out of the box

df = mzm.flatten(d1,enumerate_types=(list,))
kv = sorted([(k,v) for (k,v) in df.items()])

(('a', 'b', 0, 'c1'), 'nested1!')
(('a', 'b', 0, 'd1', 0, 'e1'), 'so_nested1!!!')
(('a', 'b', 1, 'c2'), 'nested2!')
(('a', 'b', 1, 'd2', 0, 'e2'), 'so_nested2!!!')
(('a', 'b', 2, 'c3'), 'nested3!')
(('a', 'b', 2, 'd3', 0, 'e3'), 'so_nested3!!!')
(('a', 'b', 3, 'c4'), 'nested4!')
(('a', 'b', 3, 'd4', 0, 'e4'), 'so_nested4a!!!')
(('a', 'b', 3, 'd4', 1, 'e4'), 'so_nested4b!!!')
(('a', 'b', 3, 'd4', 2, 'e4'), 'so_nested4c!!!')
(('a', 'b', 3, 'd4', 3, 'e4'), 'so_nested4d!!!')
(('a', 'b', 3, 'd4', 4, 'e4'), 'so_nested4e!!!')

d2 = {}
for key_value in kv:
    k = key_value[0]
    v = key_value[1]
    nested_set_dict(d2,k,v)

Gives

d1 =

{'a': {'b': [{'c1': 'nested1!', 'd1': [{'e1': 'so_nested1!!!'}]}, {'c2': 'nested2!', 'd2': [{'e2': 'so_nested2!!!'}]}, {'c3': 'nested3!', 'd3': [{'e3': 'so_nested3!!!'}]}, {'d4': [{'e4': 'so_nested4a!!!'}, {'e4': 'so_nested4b!!!'}, {'e4': 'so_nested4c!!!'}, {'e4': 'so_nested4d!!!'}, {'e4': 'so_nested4e!!!'}], 'c4': 'nested4!'}]}}

d2 =

{'a': {'b': [{'c1': 'nested1!', 'd1': [{'e1': 'so_nested1!!!'}]}, {'c2': 'nested2!', 'd2': [{'e2': 'so_nested2!!!'}]}, {'c3': 'nested3!', 'd3': [{'e3': 'so_nested3!!!'}]}, {'d4': [{'e4': 'so_nested4a!!!'}, {'e4': 'so_nested4b!!!'}, {'e4': 'so_nested4c!!!'}, {'e4': 'so_nested4d!!!'}, {'e4': 'so_nested4e!!!'}], 'c4': 'nested4!'}]}}

The text was updated successfully, but these errors were encountered:

ianlini · 2019-09-13T06:04:15Z

Thanks for the advice.
This is doable, but we need some design to make this general and intuitive enough.
My first thought is adding a parameter list_index_types to define when to create list.
If the splitter function generates a tuple with an element with a type in list_index_types, then we can look that element as a list index.
If an index i doesn't exist and there is some index bigger than i, I think making that element to be None is better.

unflatten does not support lists, the github user benbowen created an issue: ianlini#8 and described a solution that worked for him.

KoreyPeters · 2020-02-19T22:57:28Z

This would be a useful feature to me as well. It seems natural that the flatten/unflatten process should produce the same output as input, but that is not the case now.

ysfchn · 2020-08-01T12:47:04Z

Sorry for reviving the issue, but any updates about it? Because I'm also flattering a dictionary that contains arrays, but unflattering it results with list indices that converted to dictionary keys.

ianlini · 2020-08-01T16:32:56Z

Because there is no further feedback about the design, and it seems to be the most requested feature, I will implement it according to my last comment. Not sure about the timing, maybe in 1 or 2 months.

I would like to emphasize this again: I'm not expecting flatten() or unflatten() to be invertible. I really want to make them invertible, but I couldn't figure out the way. If you think it's possible, then please kindly give me the idea. Otherwise, you can only expect that a may not equal to unflatten(flatten(a)) except that a has some constraints and you use correct arguments for flatten() and unflatten().

ysfchn · 2020-08-01T16:55:48Z

/test/0/example

If one of the keys in the path just contains a number, then it can take as a list and insert the object in the specified index (in this case 0), but yes, this can be a dictionary key too. Then maybe in the flatten() method, you can show list indices between a different character like this: /test/[0]/example, so when unflattering, it can know this is a list or not. But this will also affect keys that contains [ and/or ]

Then the only choice will be making these flatten() and unflatten() methods as class objects, and with keypath objects (that will contain the key path and it will have own properties and methods like getvalue() (to get the value of key path), etc), so it would be easier for you to implement new features maybe. Because as class objects will have their own properties, it will be much easier and readable (for us and for you) in the flatten and unflatten operations.

Sorry, I'm not experienced well in nested dictionaries and recursive stuff, because I know too how it is hard to deal with them, so I can only say these.

ianlini · 2020-08-03T16:44:51Z

@ysfchn , thanks for your suggestion. I have considered making the key as a special object. It is one of the most feasible idea in my mind. It's great to see that you have similar idea.

I have also considered making flatten() and unflatten() as methods of some class. I think it's not related to the keypath idea because they can be done separately. The benefit of making a class is that we don't need to worry about making corresponding arguments when calling unflatten() after flatten().

Anyway, one of the difficulties is that I don't really know how people use this library. I guess people use it very differently, and I actually only use this library in some simple way.

For example, If we make the key as a special object, then {"test/0": 1} cannot be unflatten to {"test": [1]} because "test/0" is not our special object. They should transform the dict into something like {KeyPath("test", ListIndex(0)): 1}. This design is very useful when the dict we want to unflatten() is always generated by flatten(), but I don't know whether unflattening {"test/0": 1} is also important. To be a general library, we might need to support both ways without making things complicated.

aneuway2 · 2020-09-24T02:06:46Z

Hi! I'm investigating switching to using this project from another dict flattening library and this is one of the missing features that I would need.

I was able to easily switch this out using the code that @benbowen provided, but it looks like 2 other use cases are missing:

nested lists e.g. example.0.0.world which we use in some of our API requests/responses
list values are not always ints, sometimes they are strings and can be cast to int

def nested_set_dict(d, keys, value):
    # https://github.com/ianlini/flatten-dict/issues/8
    """Set a value to a sequence of nested keys

    Parameters
    ----------
    d : Mapping
    keys : Sequence[str]
    value : Any
    """
    assert keys
    key = keys[0]
    if len(keys) == 1:
        if type(d) == list:
            d.append(value)
        else:
            d[key] = value
        return

    # convert to int if it is a string digit
    if isinstance(keys[1], str) and keys[1].isdigit():
        keys[1] = int(keys[1])

    # the type is a string so make a dict if none exists
    if type(keys[1]) == int:
        if key in d:
            pass
        elif type(d) == list and type(key) == int:
            if not d:
                d.append([])
            if key == len(d):
                d.append([])
        else:
            d[key] = []
        d = d[key]
    elif type(key) == int:
        if (key + 1) > len(d):
            d.append({})
        d = d[key]
    else:
        d = d.setdefault(key, {})
    nested_set_dict(d, keys[1:], value)
flatten_dict.nested_set_dict = nested_set_dict

Example to unflatten:

{
    "hello.world.0.item.inside.0.0.again": False,
    "hello.world.0.item.inside.0.1.again": True,
    "hello.world.0.item.inside.1.0.andagain": 1,
    "hello.world.0.item.inside.1.1.andagain": 2,
}

Example to flatten:

{
    "data": [
        {
            "active": True,
            "conditions": {
                "field": "segment_group",
                "operator": "and",
                "value": [
                    [
                        {
                            "action": "include",
                            "segment_id": 94427
                        },
                        {
                            "action": "include",
                            "segment_id": 94431
                        }
                    ]
                ]
            },
        }
    ]
}

benbowen · 2020-09-24T18:34:40Z

I just saw this in my email with the at-mention. I'm so sorry for 1 year of silence! but I'm excited that others are looking and working on this. The flatten/unflatten is an essential step in a pipeline that I have to run. I'm transferring the "battle code" I wrote for this pipeline about a year ago to another developer and by then if not sooner we will check out your commits and others suggestions.

ianlini · 2020-09-25T09:55:35Z

@benbowen I'm planning to implement this next week.

whardier · 2020-12-11T01:55:31Z

If I can propose using the Ellipsis (python2.7) or ... (python3+) variable in place of array indexes.. that would indicate (at least with tuples formatting) that this is part of a list.

{'roles': [
    {'uuid': {'$uuid': '55e119ce-3b4f-11eb-adc7-00163e0987ed'}},
    {'uuid': {'$uuid': '55e11cee-3b4f-11eb-adc7-00163e0987ed'}}
]}

[(('roles', ..., 'uuid', '$uuid'), '55e119ce-3b4f-11eb-adc7-00163e0987ed'),
 (('roles', ..., 'uuid', '$uuid'), '55e11cee-3b4f-11eb-adc7-00163e0987ed')]

Or just use a type:

>>> (1,2,3,list,4,5,6)
(1, 2, 3, <class 'list'>, 4, 5, 6)

shivam-gaur-mox · 2020-12-22T05:09:36Z

Hey! Thanks for creating this library 🥇

Would like to 1 up this issue as well. I'm looking into using this for a project which could benefit from dict flattening / unflattening - but unfortunately I would need this feature (i.e. given a dictionary which contains arrays in one or more values - flattening then unflattening results should result in arrays not being converted to dictionaries).

ori-levi · 2021-01-12T08:44:54Z

Hey @ianlini,
Sorry for bringing this up.
I think I've found a solution inspired by JsonPath

You can represent the dict key split by any delimiter, but when it's come to lists, append to the key the index.
something like this:

{
    "data": [
        {
            "active": True,
        },
        {
            "active": False,
        }
    ],
    "another-dict": {
        1: "a",
        2: "b"
    }
}

flatten_dict

{
    "data[0].active": True,
    "data[1].active": False,
    "another-dict.1": "a",
    "another-dict.2": "b",
}

What do you think of this solution?

I might implement this later this day and open a pull request to your library.

ianlini · 2021-01-12T14:50:03Z

Thanks @ori-levi, This might be a good starting point.

I tried to implement a general version for all kinds of splitters a few months ago, but I found that there are so many edge cases and different behaviors to decide. The edge cases make the behavior less intuitive and less general no matter how I design it. After thinking a lot of those cases, I had a concrete idea on the requirements, but I became very busy before finishing the implementation.

I knew JsonPath long time ago and use it a lot, but I seldom use flatten-dict. I am actually very curious about why people don't simply use JsonPath to access their dict if they only want to use a string as key to access it, so I didn't think in that direction. Anyway, I will be very happy if we can first have a reducer and splitter pair that can flatten a dict into your JsonPath format and unflatten it back. I will dig into making it more general or customizable in the future.

ori-levi · 2021-01-14T19:32:24Z

@ianlini I just finish do develop the suggested solution, with JSONPath.
Note that only JSONPath is reverseable.

I reformatted my code and write some test and open pull request for this.
I hope to do this before Sunday.

HoernchenJ · 2021-09-03T04:06:58Z

@ianlini & @ori-levi Hello, are there any updates to this planed feature?

transfluxus · 2024-04-09T10:04:32Z

Is there a fork, which does it?

andchir · 2024-12-07T21:27:39Z

Solution: flatten_json
https://github.com/amirziai/flatten

AlexTelon added a commit to AlexTelon/flatten-dict that referenced this issue Oct 25, 2019

Unflatten for by benbowen

ff0166c

unflatten does not support lists, the github user benbowen created an issue: ianlini#8 and described a solution that worked for him.

AlexTelon added a commit to AlexTelon/flatten-dict that referenced this issue Oct 25, 2019

Unflatten for lists by benbowen

37fa60b

unflatten does not support lists, the github user benbowen created an issue: ianlini#8 and described a solution that worked for him.

aianta mentioned this issue Jul 7, 2021

implemented decode->flatten->filter->unflatten->encode enhancement cioos-atlantic/ckanext-vitality#8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unflatten with lists #8

unflatten with lists #8

benbowen commented Jun 5, 2019

ianlini commented Sep 13, 2019 •

edited

Loading

KoreyPeters commented Feb 19, 2020

ysfchn commented Aug 1, 2020

ianlini commented Aug 1, 2020 •

edited

Loading

ysfchn commented Aug 1, 2020 •

edited

Loading

ianlini commented Aug 3, 2020

aneuway2 commented Sep 24, 2020

benbowen commented Sep 24, 2020

ianlini commented Sep 25, 2020

whardier commented Dec 11, 2020 •

edited

Loading

shivam-gaur-mox commented Dec 22, 2020

ori-levi commented Jan 12, 2021 •

edited

Loading

ianlini commented Jan 12, 2021

ori-levi commented Jan 14, 2021

HoernchenJ commented Sep 3, 2021

transfluxus commented Apr 9, 2024

andchir commented Dec 7, 2024

unflatten with lists #8

unflatten with lists #8

Comments

benbowen commented Jun 5, 2019

ianlini commented Sep 13, 2019 • edited Loading

KoreyPeters commented Feb 19, 2020

ysfchn commented Aug 1, 2020

ianlini commented Aug 1, 2020 • edited Loading

ysfchn commented Aug 1, 2020 • edited Loading

ianlini commented Aug 3, 2020

aneuway2 commented Sep 24, 2020

benbowen commented Sep 24, 2020

ianlini commented Sep 25, 2020

whardier commented Dec 11, 2020 • edited Loading

shivam-gaur-mox commented Dec 22, 2020

ori-levi commented Jan 12, 2021 • edited Loading

ianlini commented Jan 12, 2021

ori-levi commented Jan 14, 2021

HoernchenJ commented Sep 3, 2021

transfluxus commented Apr 9, 2024

andchir commented Dec 7, 2024

ianlini commented Sep 13, 2019 •

edited

Loading

ianlini commented Aug 1, 2020 •

edited

Loading

ysfchn commented Aug 1, 2020 •

edited

Loading

whardier commented Dec 11, 2020 •

edited

Loading

ori-levi commented Jan 12, 2021 •

edited

Loading