Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 2 error - to_ipaddress #32

Open
siyer32 opened this issue May 24, 2018 · 7 comments
Open

Python 2 error - to_ipaddress #32

siyer32 opened this issue May 24, 2018 · 7 comments

Comments

@siyer32
Copy link

siyer32 commented May 24, 2018

Not sure if this is supported in Python 2.7.13, I got this error. Works fine in Python 3.6.5

appdata['sa'] = cypd.to_ipaddress(appdata['sa'])

63 '%r does not appear to be an IPv4 or IPv6 address. '
164 'Did you pass in a bytes (str in Python 2) instead of'
--> 165 ' a unicode object?' % address)
166
167 raise ValueError('%r does not appear to be an IPv4 or IPv6 address' %

AddressValueError: '10.44.129.135' does not appear to be an IPv4 or IPv6 address. Did you pass in a bytes (str in Python 2) instead of a unicode object?

@siyer32
Copy link
Author

siyer32 commented May 24, 2018

Here is Python 3.6.5 output:
appdata['sa'] = cypd.to_ipaddress(appdata['sa'])
appdata.dtypes
sa ip
da object
sp int64
dp int64
ipkt int64
ibyt int64
Application Label object

@TomAugspurger
Copy link
Contributor

We either need to make a better error message here, or break with Python 2's ipaddress module.

In Python2, it expects unicode object when parsing a string IP Address like '192.168.1.1'. https://cyberpandas.readthedocs.io/en/latest/usage.html#parsing

In [13]: import pandas as pd

In [14]: from cyberpandas import to_ipaddress

In [15]: df = pd.DataFrame({"addr": ['192.168.1.1', '192.168.1.2']})

In [16]: to_ipaddress(df.addr)
---------------------------------------------------------------------------
AddressValueError                         Traceback (most recent call last)
<ipython-input-16-1f1c4ac488eb> in <module>()
----> 1 to_ipaddress(df.addr)

/Users/taugspurger/sandbox/cyberpandas/cyberpandas/parser.py in to_ipaddress(values)
     40         values = [values]
     41
---> 42     return IPArray(_to_ip_array(values))
     43
     44

/Users/taugspurger/sandbox/cyberpandas/cyberpandas/parser.py in _to_ip_array(values)
     59     elif not (isinstance(values, np.ndarray) and
     60               values.dtype == IPType._record_type):
---> 61         values = _to_int_pairs(values)
     62     return np.atleast_1d(np.asarray(values, dtype=IPType._record_type))
     63

/Users/taugspurger/sandbox/cyberpandas/cyberpandas/parser.py in _to_int_pairs(values)
     79         pass
     80     else:
---> 81         values = [ipaddress.ip_address(v)._ip for v in values]
     82         values = [unpack(pack(v)) for v in values]
     83     return values

/Users/taugspurger/miniconda3/envs/py27-ipaddr/lib/python2.7/site-packages/ipaddress.pyc in ip_address(address)
    163             '%r does not appear to be an IPv4 or IPv6 address. '
    164             'Did you pass in a bytes (str in Python 2) instead of'
--> 165             ' a unicode object?' % address)
    166
    167     raise ValueError('%r does not appear to be an IPv4 or IPv6 address' %

AddressValueError: '192.168.1.1' does not appear to be an IPv4 or IPv6 address. Did you pass in a bytes (str in Python 2) instead of a unicode object?

In [17]: to_ipaddress(df.addr.astype(unicode))
Out[17]: IPArray([u'192.168.1.1', u'192.168.1.2'])

So in literal code it should be u'192.168.1.1' instead of '192.168.1.1'. The current way is pretty unfriendly :/

@seibert
Copy link
Collaborator

seibert commented May 25, 2018

By definition, IP address strings have to be ASCII (unlike hostnames), so I don't see a problem with to_ipaddress silently decoding Python 2 str to unicode assuming it is ASCII. Does that seem reasonable?

@TomAugspurger
Copy link
Contributor

Does that seem reasonable?

Yeah, I think so.

@siyer32
Copy link
Author

siyer32 commented May 25, 2018

Does this mean, the ip address passed have to be strings ? Most data (like the one I tested) that are captured from the devices are not strings.

@seibert
Copy link
Collaborator

seibert commented May 25, 2018

There are other input methods describe in the docs. Python integers, or raw address in byte form (see IPArray.from_bytes)

@TomAugspurger
Copy link
Contributor

To clarify things, let's use Python 3's terminology. "string" is a unicode string, and "bytes" is a bytestring.

The valid options are

  • A string in decimal-dot notation, consisting of four decimal integers in the inclusive range 0–255, separated by dots (e.g. 192.168.0.1). Each integer represents an octet (byte) in the address. Leading zeroes are tolerated only for values less than 8 (as there is no ambiguity between the decimal and octal interpretations of such strings).
  • An integer that fits into 32 bits.
  • An integer packed into a bytes object of length 4 (most significant octet first).

Most data (like the one I tested) that are captured from the devices are not strings.

What does the raw data look like for you? If performance is a concern, the absolute fastest was is https://cyberpandas.readthedocs.io/en/latest/api.html#cyberpandas.IPArray.from_bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants