Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

新版本性能变低 #1337

Closed
xiaotudoubaba opened this issue Mar 18, 2023 · 38 comments
Closed

新版本性能变低 #1337

xiaotudoubaba opened this issue Mar 18, 2023 · 38 comments

Comments

@xiaotudoubaba
Copy link

xiaotudoubaba commented Mar 18, 2023

问题现象
新版本性能变差

压测说明

  1. 使用压力测试工具queryperf 测试smartdns压力,
  2. 在压力测试过程中,使用nslookup对smartdns进行查询,测试是否会出现失败情况.

运行环境
centos7.6

smartdns 版本1: smartdns.1.2020.09.08-2235.x86-linux-all.tar.gz
smartdns 版本2: smartdns.1.2023.03.04-1125.x86_64-linux-all.tar.gz

相同配置, 相同压测工具,queryperf,在压测期间,同时使用nslookup不停对smartdns服务查询站点,查看在压力情况下是否能正常查询.

server-name smartdns1
bind :53
cache-size 4096
log-level debug
log-size 100M
#  配置上游解析DNS
server 61.132.163.68  -group china
server 202.96.128.86 -group china
server 202.96.134.133 -group china
server 202.101.224.69 -group china
server 222.172.200.68 -group china
server 202.106.0.20 -group china
server 210.22.70.3 -group china
# 国内DNS
server 223.5.5.5 -group alibaba -exclude-default-group
server 223.6.6.6 -group ailbaba  -exclude-default-group
server 119.29.29.29  -group tencent  -exclude-default-group
server 180.76.76.76  -group baidu  -exclude-default-group
# 国外DNS
server 1.1.1.1 -blacklist-ip -group foreign -exclude-default-group
server 80.80.80.80 -blacklist-ip -group foreign -exclude-default-group
server 8.8.8.8 -blacklist-ip -group foreign -exclude-default-group
# 指定域名IP地址
#address /www.baidu.com/14.215.177.39
audit-enable yes

测试结果


smartdns.1.2020.09.08-2235.x86-linux-all.tar.gz 版本

{'qps': '21304.715020'}
{'totaltime': '188.249924329', 'success': '820', 'fail': '0', 'sucrate': '100.0%', 'rate': '4.355911445504249%'}
{'cpu_avg_used': 72.62}

nslookup 在压测期间,nslookup共执行的次数和总消耗的时间.
执行时间:188.249924329 秒
成功执行 820 个命令
失败执行 0 个命令


smartdns.1.2023.03.04-1125.x86_64-linux-all.tar.gz 版本

{'qps': '4889.277531'}
{'totaltime': '780.141572664', 'success': '87460', 'fail': '0', 'sucrate': '100.0%', 'rate': '112.10785716923746%'}
{'cpu_avg_used': 20.36}

nslookup 在压测期间,nslookup共执行的次数和总消耗的时间.
执行时间:780.141572664 秒
成功执行 87460 个命令
失败执行 0 个命令


可以发现smartdns.1.2023.03.04-1125.x86_64-linux-all.tar.gz版本比smartdns.1.2020.09.08-2235.x86-linux-all.tar.gz qps 少很多**(4889.277531<21304.715020)** 是否需要对新版做什么配置调整?

出现大量的TRR is out of range.
Warning: RTT is out of range: 17.498381 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 17.498380 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 17.498381 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 17.498379 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 17.491195 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 17.491202 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 17.491204 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 17.491220 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 17.491259 [query=www.baidu.com/1, rcode=0]

@xiaotudoubaba xiaotudoubaba changed the title 新版本性能变低了. 新版本性能变低 Mar 18, 2023
@pymumu
Copy link
Owner

pymumu commented Mar 18, 2023

Statistics:

  Parse input file:     once
  Ended due to:         reaching end of file

  Queries sent:         400000 queries
  Queries completed:    400000 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:              1.611810 sec
  RTT min:              0.001360 sec
  RTT average:          0.005006 sec
  RTT std deviation:    0.005153 sec
  RTT out of range:     0 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Sun Mar 19 00:20:21 2023
  Finished at:          Sun Mar 19 00:20:32 2023
  Ran for:              11.214739 seconds

  Queries per second:   35667.348121 qps

没发现什么问题,相比之前版本有下降,但没下降10倍那么多,我不知道你测试方法有什么不同。

@pymumu
Copy link
Owner

pymumu commented Mar 18, 2023

压测过程中,单次用nslookup查询,国内网站,响应速度也是在30毫秒内完成的。

@pymumu
Copy link
Owner

pymumu commented Mar 18, 2023

应该还是配置上的差异。可以看看是否是因为双栈优选功能,2020年版本如果配置文件中没有显示启用双栈优选功能的话 ,默认是关闭的。

2022年版本,默认启用了双栈优选。双栈优选开启的话,会拖慢查询速度,因为要等测速对比。

@xiaotudoubaba
Copy link
Author

双栈优选是哪个配置项呢?我来设置看看.

@pymumu
Copy link
Owner

pymumu commented Mar 18, 2023

dualstack-ip-selection no

@pymumu
Copy link
Owner

pymumu commented Mar 18, 2023

@xiaotudoubaba
Copy link
Author

xiaotudoubaba commented Mar 18, 2023

使用smartdns.1.2023.03.04-1125.x86_64-linux-all.tar.gz版本,
单独使用queryperf 速度是保存在2w多 qps(正常情况)

queryperf -p 53 -s 172.16.0.200 -d ./data/querytest_400W.txt -T 50000 > ret_172.16.0.200_2023-03-19-00-42-37_50000.txt

但是在运行的过程中,另开一个终端执行循环执行nslookup命令, 就会发现有大量的超时了, 可以直接复制以下命令来执行.
我刚设置了dualstack-ip-selection no, 看上去不起作用.

#!/bin/bash
VIP=172.16.0.200 # 需要修改测试ip地址.
PORT=53


# 定义变量记录成功和失败次数
success_count=0
failure_count=0

echo "执行命令..."

chinalist=( \
    www.baidu.com \
    www.163.com \
    www.sina.com.cn \
    www.csdn.net \
    www.huaweicloud.com \
)


while true; do
    for element in ${chinalist[@]}
    do
        echo $element
        nslookup -port=${PORT} ${element} ${VIP}
        if [ $? -ne 0 ]; then
            failure_count=$((failure_count+1))
        else
            success_count=$((success_count+1))
        fi
    done
done

@pymumu
Copy link
Owner

pymumu commented Mar 19, 2023

我这边测试没有发现明显问题,关了debug log的结果

Statistics:

  Parse input file:     once
  Ended due to:         reaching end of file

  Queries sent:         2688270 queries
  Queries completed:    2688270 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:              30.257200 sec
  RTT min:              0.001086 sec
  RTT average:          0.002150 sec
  RTT std deviation:    0.009496 sec
  RTT out of range:     164 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Sun Mar 19 09:37:05 2023
  Finished at:          Sun Mar 19 09:37:58 2023
  Ran for:              53.058199 seconds

  Queries per second:   50666.438942 qps

nslookup循环查询,没有失败

success_count=3795
failure_count=0
success_count=3840
failure_count=0
success_count=3885
failure_count=0
success_count=3935
failure_count=0
success_count=3980

@pymumu
Copy link
Owner

pymumu commented Mar 19, 2023

可以尝试把cache-size调大,把log级别改成info或error。

@xiaotudoubaba
Copy link
Author

xiaotudoubaba commented Mar 19, 2023

这个是我的配置,

bind :53
cache-size 4096
log-level error
log-size 100M
#  配置上游解析DNS
server 61.132.163.68  -group china
server 202.96.128.86 -group china
server 202.96.134.133 -group china
server 202.101.224.69 -group china
server 222.172.200.68 -group china
server 202.106.0.20 -group china
server 210.22.70.3 -group china
# 国内DNS
server 223.5.5.5 -group alibaba -exclude-default-group
server 223.6.6.6 -group ailbaba  -exclude-default-group
server 119.29.29.29  -group tencent  -exclude-default-group
server 180.76.76.76  -group baidu  -exclude-default-group
# 国外DNS
server 1.1.1.1 -blacklist-ip -group foreign -exclude-default-group
server 80.80.80.80 -blacklist-ip -group foreign -exclude-default-group
server 8.8.8.8 -blacklist-ip -group foreign -exclude-default-group
# 指定域名IP地址
#address /www.baidu.com/14.215.177.39
audit-enable yes
dualstack-ip-selection no   # --------增加了这句.

重启smartdns,在做测试, 只要加上循环nslookup, 一下就出现大量超时

Warning: RTT is out of range: 28.638424 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.638405 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.638170 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.637758 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.643495 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.643497 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.643448 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.643759 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.643651 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.643528 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.643519 [query=www.baidu.com/1, rcode=0]
Warning: RTT is out of range: 28.643510 [query=www.baidu.com/1, rcode=0]

nslookup循环查询,没有失败, 只是qps降了好几倍.

@pymumu
Copy link
Owner

pymumu commented Mar 19, 2023

查询过程中重启smartdns?

@xiaotudoubaba
Copy link
Author

xiaotudoubaba commented Mar 19, 2023

压力测试过程中没有重启smartdns, 只是刚开始配置后重启smartdns就开始做压力测试+循环nslookup.

@pymumu
Copy link
Owner

pymumu commented Mar 19, 2023

cache-size调大看看吧

@PikuZheng
Copy link
Contributor

能否比较一下41.rc3(20230223)和当前版本

@xiaotudoubaba
Copy link
Author

设置了cache-size 也没有效果,我设置成

cache-size 104096

@xiaotudoubaba
Copy link
Author

能否比较一下41.rc3(20230223)和当前版本

我试试看.

@pymumu
Copy link
Owner

pymumu commented Mar 19, 2023

测试的域名列表中有多少个不同的域名?重复率有多少?

@pymumu
Copy link
Owner

pymumu commented Mar 19, 2023

代码增加了相关性能的测试用例。随机生成100K域名,上游为本机的一个模拟服务器。

目前在RK3588的板子上,测试结果如下:

rock@rock-5b:~/code/smartdns/test$ make all -j8 && ./test.bin --gtest_filter=Perf.*
make: Nothing to be done for 'all'.
Note: Google Test filter = Perf.*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Perf
[ RUN      ] Perf.no_speed_check
DNS Performance Testing Tool
Version 2.9.0

[Status] Command line: dnsperf -p 60053 -d /tmp/smartdns-perftest-domain.listfa37J
[Status] Sending queries (to 127.0.0.1:60053)
[Status] Started at: Sun Mar 19 14:08:59 2023
[Status] Stopping after 1 run through file
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         100000
  Queries completed:    100000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 100000 (100.00%)
  Average packet size:  request 32, response 54
  Run time (s):         1.979820
  Queries per second:   50509.642291

  Average Latency (s):  0.001945 (min 0.000070, max 0.017268)
  Latency StdDev (s):   0.001110

[       OK ] Perf.no_speed_check (2334 ms)
[----------] 1 test from Perf (2334 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (2334 ms total)
[  PASSED  ] 1 test.

需要安装dnsperf命令

apt install dnsperf

从测试看,基本上没有大问题。和之前最早的测试结果,差异不大。

测试代码样例:

TEST(Perf, no_speed_check)
{
smartdns::MockServer server_upstream;
smartdns::Server server;
if (smartdns::IsCommandExists("dnsperf") == false) {
printf("dnsperf not found, skip test, please install dnsperf first.\n");
GTEST_SKIP();
}
server_upstream.Start("udp://0.0.0.0:61053", [](struct smartdns::ServerRequestContext *request) {
std::string domain = request->domain;
if (request->domain.length() == 0) {
return smartdns::SERVER_REQUEST_ERROR;
}
if (request->qtype == DNS_T_A) {
unsigned char addr[4] = {1, 2, 3, 4};
dns_add_A(request->response_packet, DNS_RRS_AN, domain.c_str(), 61, addr);
} else if (request->qtype == DNS_T_AAAA) {
unsigned char addr[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
dns_add_AAAA(request->response_packet, DNS_RRS_AN, domain.c_str(), 61, addr);
} else {
return smartdns::SERVER_REQUEST_ERROR;
}
request->response_packet->head.rcode = DNS_RC_NOERROR;
return smartdns::SERVER_REQUEST_OK;
});
server.Start(R"""(bind [::]:60053
server 127.0.0.1:61053
log-num 0
log-console yes
speed-check-mode none
log-level error
cache-persist no)""");
std::string file = "/tmp/smartdns-perftest-domain.list" + smartdns::GenerateRandomString(5);
std::string cmd = "dnsperf -p 60053";
cmd += " -d ";
cmd += file;
std::ofstream ofs(file);
ASSERT_TRUE(ofs.is_open());
Defer
{
ofs.close();
unlink(file.c_str());
};
for (int i = 0; i < 100000; i++) {
std::string domain = smartdns::GenerateRandomString(10);
domain += ".";
domain += smartdns::GenerateRandomString(3);
if (random() % 2 == 0) {
domain += " A";
} else {
domain += " AAAA";
}
domain += "\n";
ofs.write(domain.c_str(), domain.length());
ofs.flush();
}
system(cmd.c_str());
}

@yyysuo
Copy link

yyysuo commented Mar 19, 2023

我遇到一个情况供大佬参考,dns结构图如下,因为是上游是同一个Adguard Home,所以从2个mosdns发到smartdnsdns的完全相同的dns请求几乎是同时到达smartdns的,只要有dns请求,2个mosdns都会出现大量错误日志:

2023-03-17T10:10:23.418+0800 WARN forward_local upstream error {"uqid": 103, "qname": "www.sonystyle.com.cn.", "qtype": 28, "qclass": 1, "upstream": "10.10.10.1:8055", "error": "context deadline exceeded"}
2023-03-17T10:10:23.418+0800 WARN udp_server entry err {"query": {"uqid": 103, "client": "::ffff:10.10.10.1", "qname": "www.sonystyle.com.cn.", "qtype": 28, "qclass": 1, "elapsed": "5.000205624s"}, "error": "exchange: context deadline exceeded"}
2023-03-17T10:10:26.396+0800 WARN forward_local upstream error {"uqid": 106, "qname": "api.miwifi.com.", "qtype": 28, "qclass": 1, "upstream": "10.10.10.1:8055", "error": "context deadline exceeded"}
2023-03-17T10:10:26.396+0800 WARN udp_server entry err {"query": {"uqid": 106, "client": "::ffff:10.10.10.1", "qname": "api.miwifi.com.", "qtype": 28, "qclass": 1, "elapsed": "5.002534806s"}, "error": "exchange: context deadline exceeded"}
2023-03-17T10:10:26.508+0800 WARN forward_local upstream error {"uqid": 107, "qname": "newstreamcdncnc.inter.ptqy.gitv.tv.", "qtype": 28, "qclass": 1, "upstream": "10.10.10.1:8055", "error": "context deadline exceeded"}
2023-03-17T10:10:26.508+0800 WARN udp_server entry err {"query": {"uqid": 107, "client": "::ffff:10.10.10.1", "qname": "newstreamcdncnc.inter.ptqy.gitv.tv.", "qtype": 28, "qclass": 1, "elapsed": "5.000910025s"}, "error": "exchange: context deadline exceeded"}

把其中1个mosdns去掉,保留的mosdns就完全不会出现错误日志,关于此错误日志,mosdns的FAQ里面解释为上游dns不可用:
IrineSistiana/mosdns#98

image

@xiaotudoubaba
Copy link
Author

test.tar.gz
您好,这个是我的测试工具.

  1. 修改confi.sh, 就是修改测试ip
  2. sh runquerypref.sh 执行压力测试工程
  3. sh testnslookup.sh 在压力测试时,启动nlookup,遍历查询,查找是否能正常返回.

@PikuZheng
Copy link
Contributor

PikuZheng commented Mar 19, 2023

test.tar.gz 您好,这个是我的测试工具.

  1. 修改confi.sh, 就是修改测试ip
  2. sh runquerypref.sh 执行压力测试工程
  3. sh testnslookup.sh 在压力测试时,启动nlookup,遍历查询,查找是否能正常返回.

系统是alpine,由于没有queryperf,改成dnsperf

# bash ./runquerypref.sh &
DNS Performance Testing Tool
Version 2.11.1

Warning: requested number of threads (-T 50000) exceeds number of clients (-c 1), lowering number of threads

[Status] Command line: dnsperf -p 53 -s 172.19.0.7 -d ./data/querytest_400W.txt -T 50000
[Status] Sending queries (to 172.19.0.7:53)
[Status] Started at: Mon Mar 20 07:09:39 2023
[Status] Stopping after 1 run through file
....过了一会儿
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         4000000
  Queries completed:    4000000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 4000000 (100.00%)
  Average packet size:  request 31, response 87
  Run time (s):         1662.559869
  Queries per second:   2405.928397

  Average Latency (s):  0.041442 (min 0.000110, max 0.680308)
  Latency StdDev (s):   0.057275

此期间另外启一个shell做testnslookup.sh,人眼观察是没有失败的查询。
试了两次,发现测试期间cpu耗尽
已导出

@PikuZheng
Copy link
Contributor

smartdns.1.2020.09.08-2235.x86-linux-all.tar.gz 版本

{'qps': '21304.715020'} {'totaltime': '188.249924329', 'success': '820', 'fail': '0', 'sucrate': '100.0%', 'rate': '4.355911445504249%'} {'cpu_avg_used': 72.62}

smartdns.1.2023.03.04-1125.x86_64-linux-all.tar.gz 版本

{'qps': '4889.277531'} {'totaltime': '780.141572664', 'success': '87460', 'fail': '0', 'sucrate': '100.0%', 'rate': '112.10785716923746%'} {'cpu_avg_used': 20.36}

这个cpu使用 我比较在意,为什么新版本cpu使用量大大降低了?

再就是您连接上游都是udp,考虑到默认测速的ping,很容易导致连接数超系统限制

@pymumu
Copy link
Owner

pymumu commented Mar 20, 2023

@PikuZheng
你用最新代码,在你的机器上执行perf的测试用例看看。

cd test
make all -j4
./test.bin --gtest_filter=Perf.*

你那个性能才2400个。那个dnsperf命令替换后,要去掉-T参数。-T那个意思不同。

@pymumu
Copy link
Owner

pymumu commented Mar 20, 2023

@yyysuo 新提交一个issue,提供对应出问题时间的smartdns的debug log。

@xiaotudoubaba
Copy link
Author

xiaotudoubaba commented Mar 20, 2023

@PikuZheng 你的环境是在哪里搭建的? 怎么搭建的呢? 我搭建个和你一样的环境我在测试下.

@PikuZheng
Copy link
Contributor

@PikuZheng 你的环境是在哪里搭建的? 怎么搭建的呢? 我搭建个和你一样的环境我在测试下.

我用的alpine容器跑smartdns,目前最新代码自动编译,日常使用配置(约6000多domain-rule,上游只有tcp,国内ping测速,有缓存),本机(容器外)跑dnsperf。cpu是 i3-4010U,猜测是日志读写导致的cpu爆满。

降到37.2.10(第一个支持domain-rule的版本):

localhost:~/test# bash ./runquerypref.sh
DNS Performance Testing Tool
Version 2.11.1

Warning: requested number of threads (-T 50000) exceeds number of clients (-c 1), lowering number of threads

[Status] Command line: dnsperf -p 53 -s 172.19.0.4 -d ./data/querytest_400W.txt -T 50000
[Status] Sending queries (to 172.19.0.4:53)
[Status] Started at: Mon Mar 20 09:29:15 2023
[Status] Stopping after 1 run through file
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         4000000
  Queries completed:    4000000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 4000000 (100.00%)
  Average packet size:  request 31, response 87
  Run time (s):         1134.481978
  Queries per second:   3525.838292

  Average Latency (s):  0.028291 (min 0.000130, max 0.169304)
  Latency StdDev (s):   0.017118

此期间做了5000次nslookup,虽然响应时间显著下降,但是没有失败的。

那个dnsperf命令替换后,要去掉-T参数。-T那个意思不同。

去掉 -T 后,没有明显变化

localhost:~/test# bash ./runquerypref.sh
DNS Performance Testing Tool
Version 2.11.1

[Status] Command line: dnsperf -p 53 -s 172.19.0.4 -d ./data/querytest_400W.txt
[Status] Sending queries (to 172.19.0.4:53)
[Status] Started at: Mon Mar 20 09:57:01 2023
[Status] Stopping after 1 run through file
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         4000000
  Queries completed:    4000000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 4000000 (100.00%)
  Average packet size:  request 31, response 89
  Run time (s):         1127.979465
  Queries per second:   3546.163848

  Average Latency (s):  0.028139 (min 0.000130, max 0.165441)
  Latency StdDev (s):   0.016704

@PikuZheng
Copy link
Contributor

检查配置发现日志级别是debug,改为error后(但是开audit log)

localhost:~/test# bash ./runquerypref.sh
DNS Performance Testing Tool
Version 2.11.1

[Status] Command line: dnsperf -p 53 -s 172.19.0.4 -d ./data/querytest_400W.txt
[Status] Sending queries (to 172.19.0.4:53)
[Status] Started at: Mon Mar 20 10:20:13 2023
[Status] Stopping after 1 run through file
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         4000000
  Queries completed:    4000000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 4000000 (100.00%)
  Average packet size:  request 31, response 89
  Run time (s):         197.626107
  Queries per second:   20240.240830

  Average Latency (s):  0.004923 (min 0.000057, max 0.140882)
  Latency StdDev (s):   0.001567

这个结果和楼主用旧版的好像差不多。此期间cpu依旧爆满。

在runquerypref.sh的同时nslookup循环5000次(testnslookup.sh 改while [[ $success_count -le 5000 ]]),0失败,但是 Queries per second 下降到 11868.327517

总体来说楼主旧版和新版测试结果可能都有问题。旧版受x86架构所限,可能实际性能会更好。新版结果就离谱了,中间肯定是产生了什么问题。

@xiaotudoubaba
Copy link
Author

xiaotudoubaba commented Mar 20, 2023

在runquerypref.sh的同时nslookup循环5000次(testnslookup.sh 改while [[ $success_count -le 5000 ]]),0失败,但是 Queries > per second 下降到 11868.327517

是的, 单纯执行runquerypref.sh 压测, 性能可以接受,
但是只要在压测的同时,加入了nslookup循环, Queries per second 就立马下降了.
旧版本smartdns.1.2020.09.08-2235.x86-linux-all.tar.gz没有此问题.

@yyysuo
Copy link

yyysuo commented Mar 20, 2023

@yyysuo 新提交一个issue,提供对应出问题时间的smartdns的debug log。

开了一个issue,发现之前主路由开了绑定到设备开关,关掉之后错误日志减少了很多,但是还是有一些ipv6相关的错误日志,单独1个mosdns的话,就完全没错误日志。

#1340

@pymumu
Copy link
Owner

pymumu commented Mar 20, 2023

@xiaotudoubaba @PikuZheng 跑一下smartdns代码里面的性能测试看看吧。

cd test
make all -j4
./test.bin --gtest_filter=Perf.*

@PikuZheng
Copy link
Contributor

./test.bin --gtest_filter=Perf.*

编译时报错

g++ -g -I./ -I../src -I../src/include   -c -o server.o server.cc
In file included from /usr/include/fortify/unistd.h:22,
                 from server.h:27,
                 from server.cc:19:
server.cc: In static member function 'static bool smartdns::MockServer::GetAddr(const string&, std::string, int, int, sockaddr_storage*, socklen_t*)':
server.cc:234:16: error: converting to 'bool' from 'std::nullptr_t' requires direct-initialization [-fpermissive]
  234 |         return NULL;
      |                ^~~~
make: *** [<builtin>: server.o] Error 1

-fpermissive 忽略错误后

/smartdns/test # ./test.bin --gtest_filter=Perf.*
Note: Google Test filter = Perf.*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Perf
[ RUN      ] Perf.no_speed_check
DNS Performance Testing Tool
Version 2.11.1

[Status] Command line: dnsperf -p 60053 -d /tmp/smartdns-perftest-domain.list0Gp3i
[Status] Sending queries (to 127.0.0.1:60053)
[Status] Started at: Mon Mar 20 08:34:50 2023
[Status] Stopping after 1 run through file
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         100000
  Queries completed:    100000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 100000 (100.00%)
  Average packet size:  request 32, response 53
  Run time (s):         7.549946
  Queries per second:   13245.127846

  Average Latency (s):  0.007517 (min 0.000188, max 0.525216)
  Latency StdDev (s):   0.002211

[       OK ] Perf.no_speed_check (8064 ms)
[----------] 1 test from Perf (8065 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (8065 ms total)
[  PASSED  ] 1 test.

@pymumu
Copy link
Owner

pymumu commented Mar 20, 2023

@xiaotudoubaba 你的那个测试方法,因为域名是重复的,所以基本上都是缓存生效。数据并不会发送到上游。

我这边也用你的测试验证了一下,运行runquerypref.sh的同时执行testnslookup.sh,并没有性能下降的问题。
用你的脚本,2023年版本相比2020年版本CPU消耗上升5%。性能上是持平的,测试CPU占用率33%左右。

DNS Query Performance Testing Tool
Version: $Id: queryperf.c,v 1.12 2007/09/05 07:36:04 marka Exp $

[Status] Processing input data
[Status] Sending queries (beginning with 192.168.59.2)
[Status] Testing complete

Statistics:

  Parse input file:     once
  Ended due to:         reaching end of file

  Queries sent:         4000000 queries
  Queries completed:    4000000 queries
  Queries lost:         0 queries
  Queries delayed(?):   0 queries

  RTT max:              0.015537 sec
  RTT min:              0.001030 sec
  RTT average:          0.001227 sec
  RTT std deviation:    0.000352 sec
  RTT out of range:     0 queries

  Percentage completed: 100.00%
  Percentage lost:        0.00%

  Started at:           Mon Mar 20 17:44:14 2023
  Finished at:          Mon Mar 20 17:48:07 2023
  Ran for:              232.778404 seconds

  Queries per second:   17183.724655 qps
  Total QPS/target:     17183.806079/50000 qps

测试机器信息:
机器类型:X86-64虚拟机。
CPU: Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz
CPU核数:1核
内存: 1GB

./runquerypref.sh脚本,使用的是默认并发数量,默认是20,可以使用-q参数,将并发数调整到120,将CPU消耗到90%以上。
我这边设置-q 80后,CPU在95%,差不多已经到目标设置的50K qps。但有一定数量丢包,应该CPU处理不过来丢包了。

DNS Query Performance Testing Tool
Version: $Id: queryperf.c,v 1.12 2007/09/05 07:36:04 marka Exp $

[Status] Processing input data
[Status] Sending queries (beginning with 192.168.59.2)
[Status] Testing complete

Statistics:

  Parse input file:     once
  Ended due to:         reaching end of file

  Queries sent:         4000000 queries
  Queries completed:    3848144 queries
  Queries lost:         151856 queries
  Queries delayed(?):   0 queries

  RTT max:              1.317070 sec
  RTT min:              0.001054 sec
  RTT average:          0.016324 sec
  RTT std deviation:    0.128972 sec
  RTT out of range:     0 queries

  Percentage completed:  96.20%
  Percentage lost:        3.80%

  Started at:           Mon Mar 20 17:53:42 2023
  Finished at:          Mon Mar 20 17:55:02 2023
  Ran for:              80.001606 seconds

  Queries per second:   48100.834376 qps
  Total QPS/target:     49999.933125/50000 qps

目前在x86机器,ARM机器,没有发现问题,
测试方法,我这边测试都是直接连接smartdns的。用你的脚本。
我代码中的那个测试用例,基本都没有缓存的,性能会贴近实际,你的这个完全都是命中缓存了。性能只会更好。
所以,你说的那个性能急剧下降,目前需要先排除你的网络问题,比如是否用了负载均衡。

@PikuZheng 你那个机器性能看上去和我那个虚拟机差不多。你可以吧 ./runquerypref.sh和testnslookup.sh拿到另外一个机器上测试下看看。

@PikuZheng
Copy link
Contributor

@PikuZheng 你那个机器性能看上去和我那个虚拟机差不多。你可以吧 ./runquerypref.sh和testnslookup.sh拿到另外一个机器上测试下看看。

找了个 Xeon Silver 4210,dnsperf -p 53 -s 127.0.0.1 -d ./data/querytest_400W.txt -q 100 结果也是两万多,cpu 90%以上。所以预期应该在5万吗

@pymumu
Copy link
Owner

pymumu commented Mar 22, 2023

2G Hz的CPU性能至少4W以上。

dnsperf没有-q参数,不要混淆两个工具。

dnsperf好像是-c,或者是-t

但是代码中那个性能测试用例,我的rock5b的CPU,默认参数是可以到5W的。

@xiaotudoubaba
Copy link
Author

2G Hz的CPU性能至少4W以上。

dnsperf没有-q参数,不要混淆两个工具。

dnsperf好像是-c,或者是-t

但是代码中那个性能测试用例,我的rock5b的CPU,默认参数是可以到5W的。

请教下,这个你是怎么搭建测试环境的, 有这个步骤吗? 我和你同步下测试环境,我在做下测试.谢谢.

@PikuZheng
Copy link
Contributor

我这个是实际使用配置,六千多domain-set相当于两万多行匹配规则,应该是卡在cpu性能了

@pymumu
Copy link
Owner

pymumu commented Mar 22, 2023

2G Hz的CPU性能至少4W以上。
dnsperf没有-q参数,不要混淆两个工具。
dnsperf好像是-c,或者是-t
但是代码中那个性能测试用例,我的rock5b的CPU,默认参数是可以到5W的。

请教下,这个你是怎么搭建测试环境的, 有这个步骤吗? 我和你同步下测试环境,我在做下测试.谢谢.

你直接下载代码,然后用test目录的性能用例。

cd test
make all -j4
./test.bin --gtest_filter=Perf.*

需要安装dnsperf

@pymumu
Copy link
Owner

pymumu commented Mar 22, 2023

我这个是实际使用配置,六千多domain-set相当于两万多行匹配规则,应该是卡在cpu性能了

匹配规则的时间复杂度是O(1),不会影响太多性能。应该不是这个问题。

你可以不启用规则测试看看。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants