Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect data in network protocol address lists. #2150

Open
vel21ripn opened this issue Nov 16, 2023 · 8 comments
Open

Incorrect data in network protocol address lists. #2150

vel21ripn opened this issue Nov 16, 2023 · 8 comments
Labels

Comments

@vel21ripn
Copy link
Contributor

Describe the bug

Some networks are described in more than one protocol.

Another problem is the lack of subnet aggregation.

To solve these problems, we need to abandon "include inc_generation/*.c.inc" and switch to automated construction of subnet lists.
It also makes sense to abandon separate loading of address lists. We need to make a mask of loaded lists and one list of addresses.
If I’m not mistaken, I proposed storing lists of addresses in .yaml files and collecting an optimized list of addresses from them, but for some reason the implementation was not included in nDPI. The format of the file with the list of addresses is not significant. Using '.c' files to store lists of addresses is also a good option.

The only difficulty in solving these problems is the lack of protocol names.
I use a non-cross-platform solution in the form of a perl script that generates the necessary data from the ndpi_protocol_ids.h file.
I don't know how much this is acceptable in an nDPI project.

I can offer my PR

Ambiguous address:

IPv4

23.239.227.0/24    GOTO                  != 23.239.227.0/24    CITRIX
67.217.68.0/24     GOTO                  != 67.217.68.0/24     CITRIX
67.217.70.0/23     GOTO                  != 67.217.70.0/23     CITRIX
67.217.72.0/24     GOTO                  != 67.217.72.0/24     CITRIX
67.217.75.0/24     GOTO                  != 67.217.75.0/24     CITRIX
67.217.76.0/23     GOTO                  != 67.217.76.0/23     CITRIX
67.217.78.0/24     GOTO                  != 67.217.78.0/24     CITRIX
67.217.80.0/23     GOTO                  != 67.217.80.0/23     CITRIX
67.217.82.0/24     GOTO                  != 67.217.82.0/24     CITRIX
67.217.84.0/24     GOTO                  != 67.217.84.0/24     CITRIX
67.217.86.0/24     GOTO                  != 67.217.86.0/24     CITRIX
67.217.88.0/24     GOTO                  != 67.217.88.0/24     CITRIX
67.217.90.0/23     GOTO                  != 67.217.90.0/23     CITRIX
67.217.92.0/24     GOTO                  != 67.217.92.0/24     CITRIX
67.217.94.0/23     GOTO                  != 67.217.94.0/23     CITRIX
68.64.8.0/23       GOTO                  != 68.64.8.0/23       CITRIX
68.64.10.0/24      GOTO                  != 68.64.10.0/24      CITRIX
68.64.12.0/24      GOTO                  != 68.64.12.0/24      CITRIX
68.64.14.0/24      GOTO                  != 68.64.14.0/24      CITRIX
68.64.17.0/24      GOTO                  != 68.64.17.0/24      CITRIX
68.64.18.0/23      GOTO                  != 68.64.18.0/23      CITRIX
68.64.20.0/24      GOTO                  != 68.64.20.0/24      CITRIX
68.64.22.0/23      GOTO                  != 68.64.22.0/23      CITRIX
68.64.24.0/23      GOTO                  != 68.64.24.0/23      CITRIX
68.64.27.0/24      GOTO                  != 68.64.27.0/24      CITRIX
68.64.28.0/23      GOTO                  != 68.64.28.0/23      CITRIX
68.64.30.0/24      GOTO                  != 68.64.30.0/24      CITRIX
78.108.116.0/22    GOTO                  != 78.108.116.0/22    CITRIX
78.108.120.0/23    GOTO                  != 78.108.120.0/23    CITRIX
78.108.126.0/23    GOTO                  != 78.108.126.0/23    CITRIX
173.199.0.0/22     GOTO                  != 173.199.0.0/22     CITRIX
173.199.12.0/23    GOTO                  != 173.199.12.0/23    CITRIX
173.199.15.0/24    GOTO                  != 173.199.15.0/24    CITRIX
173.199.17.0/24    GOTO                  != 173.199.17.0/24    CITRIX
173.199.18.0/23    GOTO                  != 173.199.18.0/23    CITRIX
173.199.20.0/24    GOTO                  != 173.199.20.0/24    CITRIX
173.199.23.0/24    GOTO                  != 173.199.23.0/24    CITRIX
173.199.26.0/23    GOTO                  != 173.199.26.0/23    CITRIX
173.199.30.0/23    GOTO                  != 173.199.30.0/23    CITRIX
173.199.43.0/24    GOTO                  != 173.199.43.0/24    CITRIX
173.199.44.0/22    GOTO                  != 173.199.44.0/22    CITRIX
173.199.50.0/23    GOTO                  != 173.199.50.0/23    CITRIX
173.199.52.0/22    GOTO                  != 173.199.52.0/22    CITRIX
173.199.60.0/22    GOTO                  != 173.199.60.0/22    CITRIX
188.66.43.0/24     GOTO                  != 188.66.43.0/24     CITRIX
202.173.25.0/24    GOTO                  != 202.173.25.0/24    CITRIX
216.115.208.0/24   GOTO                  != 216.115.208.0/24   CITRIX
216.115.210.0/23   GOTO                  != 216.115.210.0/23   CITRIX
216.115.213.0/24   GOTO                  != 216.115.213.0/24   CITRIX
216.115.214.0/23   GOTO                  != 216.115.214.0/23   CITRIX
216.115.217.0/24   GOTO                  != 216.115.217.0/24   CITRIX
216.115.218.0/24   GOTO                  != 216.115.218.0/24   CITRIX
216.115.221.0/24   GOTO                  != 216.115.221.0/24   CITRIX
216.115.222.0/23   GOTO                  != 216.115.222.0/23   CITRIX
216.219.114.0/23   GOTO                  != 216.219.114.0/23   CITRIX
216.219.116.0/24   GOTO                  != 216.219.116.0/24   CITRIX
216.219.119.0/24   GOTO                  != 216.219.119.0/24   CITRIX
216.219.120.0/22   GOTO                  != 216.219.120.0/22   CITRIX
157.55.39.0/24     MODBUS                != 157.55.39.0/24     MICROSOFT_AZURE
207.46.13.0/24     MODBUS                != 207.46.13.0/24     MICROSOFT_AZURE
40.77.167.0/24     MODBUS                != 40.77.167.0/24     MICROSOFT_AZURE
40.77.188.0/22     MODBUS                != 40.77.188.0/22     MICROSOFT_AZURE
65.55.210.0/24     MODBUS                != 65.55.210.0/24     MICROSOFT_AZURE
199.30.24.0/23     MODBUS                != 199.30.24.0/23     MICROSOFT_AZURE
40.77.202.0/24     MODBUS                != 40.77.202.0/24     MICROSOFT_AZURE
40.77.139.0/25     MODBUS                != 40.77.139.0/25     MICROSOFT_AZURE
69.63.176.0/20     MODBUS                != 69.63.176.0/20     FACEBOOK
66.220.144.0/20    MODBUS                != 66.220.144.0/20    FACEBOOK
74.119.76.0/22     MODBUS                != 74.119.76.0/22     FACEBOOK
173.252.64.0/18    MODBUS                != 173.252.64.0/18    FACEBOOK
69.171.224.0/19    MODBUS                != 69.171.224.0/19    FACEBOOK
103.4.96.0/22      MODBUS                != 103.4.96.0/22      FACEBOOK
31.13.64.0/18      MODBUS                != 31.13.64.0/18      FACEBOOK
31.13.24.0/21      MODBUS                != 31.13.24.0/21      FACEBOOK
179.60.192.0/22    MODBUS                != 179.60.192.0/22    FACEBOOK
185.60.216.0/22    MODBUS                != 185.60.216.0/22    FACEBOOK
45.64.40.0/22      MODBUS                != 45.64.40.0/22      FACEBOOK
157.240.0.0/17     MODBUS                != 157.240.0.0/17     FACEBOOK
204.15.20.0/22     MODBUS                != 204.15.20.0/22     FACEBOOK
102.132.96.0/20    MODBUS                != 102.132.96.0/20    FACEBOOK
157.240.192.0/18   MODBUS                != 157.240.192.0/18   FACEBOOK
129.134.0.0/17     MODBUS                != 129.134.0.0/17     FACEBOOK
163.70.128.0/17    MODBUS                != 163.70.128.0/17    FACEBOOK
185.89.216.0/22    MODBUS                != 185.89.216.0/22    FACEBOOK
20.190.128.0/18    MICROSOFT_365         != 20.190.128.0/18    MICROSOFT_AZURE
20.20.32.0/19      MICROSOFT_365         != 20.20.32.0/19      MICROSOFT_AZURE
20.231.128.0/19    MICROSOFT_365         != 20.231.128.0/19    MICROSOFT_AZURE
40.126.0.0/18      MICROSOFT_365         != 40.126.0.0/18      MICROSOFT_AZURE
104.47.0.0/17      MS_OUTLOOK            != 104.47.0.0/17      MICROSOFT_AZURE
13.107.64.0/18     SKYPE_TEAMS           != 13.107.64.0/18     MICROSOFT_AZURE
89.187.171.248     WHATSAPP_CALL         != 89.187.171.248     PROTONVPN
178.249.214.65     WHATSAPP_CALL         != 178.249.214.65     PROTONVPN

IPv6

2620:0:1c00::/40   MODBUS                != 2620:0:1c00::/40   FACEBOOK
2a03:2880::/32     MODBUS                != 2a03:2880::/32     FACEBOOK
2603:1006:2000::/48 MICROSOFT_365        != 2603:1006:2000::/48 MICROSOFT_AZURE
2603:1007:200::/48 MICROSOFT_365         != 2603:1007:200::/48 MICROSOFT_AZURE
2603:1016:1400::/48 MICROSOFT_365        != 2603:1016:1400::/48 MICROSOFT_AZURE
2603:1017::/48     MICROSOFT_365         != 2603:1017::/48     MICROSOFT_AZURE
2603:1026:3000::/48 MICROSOFT_365        != 2603:1026:3000::/48 MICROSOFT_AZURE
2603:1027:1::/48   MICROSOFT_365         != 2603:1027:1::/48   MICROSOFT_AZURE
2603:1036:3000::/48 MICROSOFT_365        != 2603:1036:3000::/48 MICROSOFT_AZURE
2603:1037:1::/48   MICROSOFT_365         != 2603:1037:1::/48   MICROSOFT_AZURE
2603:1046:2000::/48 MICROSOFT_365        != 2603:1046:2000::/48 MICROSOFT_AZURE
2603:1047:1::/48   MICROSOFT_365         != 2603:1047:1::/48   MICROSOFT_AZURE
2603:1056:2000::/48 MICROSOFT_365        != 2603:1056:2000::/48 MICROSOFT_AZURE
2603:1057:2::/48   MICROSOFT_365         != 2603:1057:2::/48   MICROSOFT_AZURE
2a01:111:f403::/48 MS_OUTLOOK            != 2a01:111:f403::/48 MICROSOFT_AZURE
@vel21ripn vel21ripn added the bug label Nov 16, 2023
@IvanNardi
Copy link
Collaborator

IvanNardi commented Nov 16, 2023

Hi @vel21ripn, thanks for these interesting inputs!
We are having some internal discussions about how to improve these lists, and so any feedback is welcomed!

Let' start with the real bug: the overlapping addresses... There are a few different cases:

  1. Goto/citrix: we are importing the same list twice! Nice catch. I am going to remove one of them

  2. MICROSOFT_AZURE vs MICROSOFT_365: these addresses are present in both the original lists (azure and ms365) explicitly provided by Microsoft itself... not sure what we should do here....

  3. We don't have a MODBUS list... There are two logical separated lists in inc/generation: one list with the addresses used for protocol classification (usually used to match server address; FB, Telegram, Whatsapp,...) and one list used for flow risk detection (used to match client address; iCloudPrivateRelay, ProtonVPN exit nodes and crawlers). It is definite possible to have some addresses in both logical list

IvanNardi added a commit to IvanNardi/nDPI that referenced this issue Nov 16, 2023
We are loading the same AS list as GOTO
See ntop#2150
IvanNardi added a commit that referenced this issue Nov 16, 2023
We are loading the same AS list as GOTO
See #2150
@IvanNardi
Copy link
Collaborator

Another topic: aggregation.
We already have a function (mergeipaddrlist.py) to aggregate addresses, but we don't use it everywhere. We should improve that...

@vel21ripn
Copy link
Contributor Author

Thanks for the clarification regarding the MODBUS, iCloudPrivateRelay, ProtonVPN lists.

Information on the benefits of address aggregation.
We have 40700 entries for ipv4 and 12397 entries for ipv6.
After aggregation, we get 27811 records for ipv4 and 8216 records for ipv6.
IMHO aggregation is useful.

@IvanNardi
Copy link
Collaborator

Thanks for the clarification regarding the MODBUS, iCloudPrivateRelay, ProtonVPN lists.

Information on the benefits of address aggregation. We have 40700 entries for ipv4 and 12397 entries for ipv6. After aggregation, we get 27811 records for ipv4 and 8216 records for ipv6. IMHO aggregation is useful.

@vel21ripn, could you check if bdb73db fixes the aggregation issue, please?

@vel21ripn
Copy link
Contributor Author

Very big difference in address lists between commit bdb73db and 6c9571d.
Before this commit there were 40700 ipv4 addresses, but now there are 7679.

The TOR and MULLVAD address list is not aggregated.
TOR 1327 -> 896
MULLVAD 643 -> 537

@vel21ripn
Copy link
Contributor Author

vel21ripn commented Nov 17, 2023

Thank you.
Reducing the number of networks by more than 4 times is very good result.

There is one more question: if the lists are generated by a script, then what is the point of storing ipv6 addresses as a string?
The sum of the lengths of all 2980 lines with ipv6 addresses is equal to 50386 bytes, and 2980*16 is equal to 47888.
So, if we use a binary representation for storage, this will also reduce the required amount of memory and reduce the cost of initializing address lists.

@IvanNardi
Copy link
Collaborator

The TOR and MULLVAD address list is not aggregated.

Done in 5566439

@IvanNardi
Copy link
Collaborator

There is one more question: if the lists are generated by a script, then what is the point of storing ipv6 addresses as a string?

No specific reasons: it was the simplest implementation...

The sum of the lengths of all 2980 lines with ipv6 addresses is equal to 50386 bytes, and 2980*16 is equal to 47888.

You need to take into account at least one bytes for the prefix length: 2980 * (16 + 1) = 50660 > 50386. So, I don't think we have any space benefits from the binary format.
The startup might be faster, though. We might look into that...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants