Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: fix configuration type cast issue on big endian systems #8904

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rightblank
Copy link

@rightblank rightblank commented Jun 2, 2024

Addresses #8828

It fixes two types configuration failure issues that only appear on big endian systems,

  • Plugin configurations with predefined int values are always zero, .e.g rotate_wait for tail plugin.
  • Plugin configuration cannot be turned on in configuration files, .e.g lowercase in systemd plugin.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
[INPUT]
    Name             tail
    Path             /var/log/dpkg.log
    Read_from_Head   True

[OUTPUT]
    Name   stdout
    Match  *
  • Debug log output from testing the change
fluent-bit/bin# ./fluent-bit -c fluent.conf 
Fluent Bit v3.0.4
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

___________.__                        __    __________.__  __          ________  
\_   _____/|  |  __ __   ____   _____/  |_  \______   \__|/  |_  ___  _\_____  \ 
 |    __)  |  | |  |  \_/ __ \ /    \   __\  |    |  _/  \   __\ \  \/ / _(__  < 
 |     \   |  |_|  |  /\  ___/|   |  \  |    |    |   \  ||  |    \   / /       \
 \___  /   |____/____/  \___  >___|  /__|    |______  /__||__|     \_/ /______  /
     \/                     \/     \/               \/                        \/ 

[2024/06/13 07:49:45] [ info] [fluent bit] version=3.0.4, commit=, pid=36844
[2024/06/13 07:49:45] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/06/13 07:49:45] [ info] [cmetrics] version=0.9.0
[2024/06/13 07:49:45] [ info] [ctraces ] version=0.5.1
[2024/06/13 07:49:45] [ info] [input:tail:tail.0] initializing
[2024/06/13 07:49:45] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/06/13 07:49:45] [ info] [output:stdout:stdout.0] worker #0 started
[2024/06/13 07:49:45] [ info] [input:tail:tail.0] inotify_fs_add(): inode=27934660 watch_fd=1 name=/var/log/dpkg.log
[0] tail.0: [[1718264985.477004907, {}], {"log"=>"2023-10-04 02:05:44 startup archives install"}]
[1] tail.0: [[1718264985.477014492, {}], {"log"=>"2023-10-04 02:05:44 install base-passwd:s390x <none> 3.5.52build1"}]
[2] tail.0: [[1718264985.477016162, {}], {"log"=>"2023-10-04 02:05:44 status half-installed base-passwd:s390x 3.5.52build1"}]
[3] tail.0: [[1718264985.477017612, {}], {"log"=>"2023-10-04 02:05:44 status unpacked base-passwd:s390x 3.5.52build1"}]
[4] tail.0: [[1718264985.477018979, {}], {"log"=>"2023-10-04 02:05:44 configure base-passwd:s390x 3.5.52build1 3.5.52build1"}]
[5] tail.0: [[1718264985.477020322, {}], {"log"=>"2023-10-04 02:05:44 status half-configured base-passwd:s390x 3.5.52build1"}]
[6] tail.0: [[1718264985.477021651, {}], {"log"=>"2023-10-04 02:05:44 status installed base-passwd:s390x 3.5.52build1"}]
[7] tail.0: [[1718264985.477022937, {}], {"log"=>"2023-10-04 02:05:44 startup archives install"}]
[8] tail.0: [[1718264985.477024258, {}], {"log"=>"2023-10-04 02:05:44 install base-files:s390x <none> 12ubuntu4"}]

[7175] tail.0: [[1718264985.486439385, {}], {"log"=>"2024-06-13 07:33:40 status installed librdkafka-dev:s390x 1.8.0-1build1"}]
[7176] tail.0: [[1718264985.486440608, {}], {"log"=>"2024-06-13 07:33:40 trigproc libc-bin:s390x 2.35-0ubuntu3.4 <none>"}]
[7177] tail.0: [[1718264985.486441821, {}], {"log"=>"2024-06-13 07:33:40 status half-configured libc-bin:s390x 2.35-0ubuntu3.4"}]
[7178] tail.0: [[1718264985.486443057, {}], {"log"=>"2024-06-13 07:33:40 status installed libc-bin:s390x 2.35-0ubuntu3.4"}]

....

^C[2024/06/13 07:50:42] [engine] caught signal (SIGINT)
[2024/06/13 07:50:42] [ warn] [engine] service will shutdown in max 5 seconds
[2024/06/13 07:50:42] [ info] [input] pausing tail.0
[2024/06/13 07:50:43] [ info] [engine] service has stopped (0 pending tasks)
[2024/06/13 07:50:43] [ info] [input] pausing tail.0
[2024/06/13 07:50:43] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2024/06/13 07:50:43] [ info] [output:stdout:stdout.0] thread worker #0 stopped

  • Attached Valgrind output that shows no leaks or memory corruption was found
fluent-bit/bin# valgrind ./fluent-bit -c fluent.conf 
==36999== Memcheck, a memory error detector
==36999== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==36999== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==36999== Command: ./fluent-bit -c fluent.conf
==36999== 
Fluent Bit v3.0.4-ibm
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

___________.__                        __    __________.__  __          ________  
\_   _____/|  |  __ __   ____   _____/  |_  \______   \__|/  |_  ___  _\_____  \ 
 |    __)  |  | |  |  \_/ __ \ /    \   __\  |    |  _/  \   __\ \  \/ / _(__  < 
 |     \   |  |_|  |  /\  ___/|   |  \  |    |    |   \  ||  |    \   / /       \
 \___  /   |____/____/  \___  >___|  /__|    |______  /__||__|     \_/ /______  /
     \/                     \/     \/               \/                        \/ 

[2024/06/13 07:52:43] [ info] [fluent bit] version=3.0.4, commit=, pid=36999
[2024/06/13 07:52:43] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/06/13 07:52:43] [ info] [cmetrics] version=0.9.0
[2024/06/13 07:52:43] [ info] [ctraces ] version=0.5.1
[2024/06/13 07:52:43] [ info] [input:tail:tail.0] initializing
[2024/06/13 07:52:43] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/06/13 07:52:43] [ info] [output:stdout:stdout.0] worker #0 started
==36999== Warning: client switching stacks?  SP change: 0x77f80c8 --> 0x53f5ad8
==36999==          to suppress, use: --max-stackframe=37758448 or greater
==36999== Thread 4 flb-out-stdout.0:
==36999== Invalid write of size 8
==36999==    at 0x2A6680: output_pre_cb_flush (flb_output.h:552)
==36999==    by 0xF4FC15: springboard (sjlj.c:36)
==36999==    by 0x8000C796F: ??? (in /usr/local/libexec/valgrind/memcheck-s390x-linux)
==36999==  Address 0x53f5b08 is 21,448 bytes inside a block of size 24,576 alloc'd
==36999==    at 0x48404E0: malloc (vg_replace_malloc.c:446)
==36999==    by 0xF4FDD3: co_create (sjlj.c:61)
==36999==    by 0x2A8799: flb_output_flush_create (flb_output.h:888)
==36999==    by 0x2A8799: output_thread (flb_output_thread.c:290)
==36999==    by 0x303E49: step_callback (flb_worker.c:43)
==36999==    by 0x4F35295: start_thread (pthread_create.c:442)
==36999==    by 0x4FAEF8D: ??? (clone.S:66)
==36999==    by 0xFFFFFFFFFFFFFFFF: ???
==36999== 
==36999== Warning: client switching stacks?  SP change: 0x53f5830 --> 0x77f8168
==36999==          to suppress, use: --max-stackframe=37759288 or greater
==36999== Warning: client switching stacks?  SP change: 0x77f80c8 --> 0x53f58d0
==36999==          to suppress, use: --max-stackframe=37758968 or greater
==36999==          further instances of this message will not be shown.
[0] tail.0: [[1718265163.960652883, {}], {"log"=>"2023-10-04 02:05:44 startup archives install"}]
[1] tail.0: [[1718265163.984352851, {}], {"log"=>"2023-10-04 02:05:44 install base-passwd:s390x <none> 3.5.52build1"}]
[2] tail.0: [[1718265163.984560019, {}], {"log"=>"2023-10-04 02:05:44 status half-installed base-passwd:s390x 3.5.52build1"}]
[3] tail.0: [[1718265163.984657398, {}], {"log"=>"2023-10-04 02:05:44 status unpacked base-passwd:s390x 3.5.52build1"}]
[4] tail.0: [[1718265163.984752655, {}], {"log"=>"2023-10-04 02:05:44 configure base-passwd:s390x 3.5.52build1 3.5.52build1"}]
[5] tail.0: [[1718265163.984847608, {}], {"log"=>"2023-10-04 02:05:44 status half-configured base-passwd:s390x 3.5.52build1"}]
[6] tail.0: [[1718265163.984941561, {}], {"log"=>"2023-10-04 02:05:44 status installed base-passwd:s390x 3.5.52build1"}]
[7] tail.0: [[1718265163.985035225, {}], {"log"=>"2023-10-04 02:05:44 startup archives install"}]
[8] tail.0: [[1718265163.985125880, {}], {"log"=>"2023-10-04 02:05:44 install base-files:s390x <none> 12ubuntu4"}]
[9] tail.0: [[1718265163.985220335, {}], {"log"=>"2023-10-04 02:05:44 status half-installed base-files:s390x 12ubuntu4"}]
[10] tail.0: [[1718265163.985315302, {}], {"log"=>"2023-10-04 02:05:44 status unpacked base-files:s390x 12ubuntu4"}]
[11] tail.0: [[1718265163.985408302, {}], {"log"=>"2023-10-04 02:05:44 configure base-files:s390x 12ubuntu4 12ubuntu4"}]
[12] tail.0: [[1718265163.985501991, {}], {"log"=>"2023-10-04 02:05:44 status half-configured base-files:s390x 12ubuntu4"}]
[13] tail.0: [[1718265163.985595795, {}], {"log"=>"2023-10-04 02:05:44 status installed base-files:s390x 12ubuntu4"}]

......

[6739] tail.0: [[1718265164.926488724, {}], {"log"=>"2024-06-13 07:52:33 trigproc man-db:s390x 2.10.2-1 <none>"}]
[6740] tail.0: [[1718265164.926581033, {}], {"log"=>"2024-06-13 07:52:33 status half-configured man-db:s390x 2.10.2-1"}]
[6741] tail.0: [[1718265164.926674522, {}], {"log"=>"2024-06-13 07:52:33 status installed man-db:s390x 2.10.2-1"}]
[6742] tail.0: [[1718265164.926768316, {}], {"log"=>"2024-06-13 07:52:33 trigproc libc-bin:s390x 2.35-0ubuntu3.4 <none>"}]
[6743] tail.0: [[1718265164.926861738, {}], {"log"=>"2024-06-13 07:52:33 status half-configured libc-bin:s390x 2.35-0ubuntu3.4"}]
[6744] tail.0: [[1718265164.926955747, {}], {"log"=>"2024-06-13 07:52:33 status installed libc-bin:s390x 2.35-0ubuntu3.4"}]
^C[2024/06/13 07:52:50] [engine] caught signal (SIGINT)
[2024/06/13 07:52:50] [ warn] [engine] service will shutdown in max 5 seconds
[2024/06/13 07:52:50] [ info] [input] pausing tail.0
[2024/06/13 07:52:51] [ info] [engine] service has stopped (0 pending tasks)
[2024/06/13 07:52:51] [ info] [input] pausing tail.0
[2024/06/13 07:52:51] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2024/06/13 07:52:51] [ info] [output:stdout:stdout.0] thread worker #0 stopped
[2024/06/13 07:52:51] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=27934660 watch_fd=1
==36999== 
==36999== HEAP SUMMARY:
==36999==     in use at exit: 0 bytes in 0 blocks
==36999==   total heap usage: 45,140 allocs, 45,140 frees, 67,221,223 bytes allocated
==36999== 
==36999== All heap blocks were freed -- no leaks are possible
==36999== 
==36999== For lists of detected and suppressed errors, rerun with: -s
==36999== ERROR SUMMARY: 20 errors from 1 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@rightblank
Copy link
Author

rightblank commented Jun 13, 2024

Hi, @cosmo0920 @edsiper , could you please help to review this PR, it fixes the issues reported in #8828

@rightblank rightblank changed the title core: fix configuration type cast issue on s390x core: fix configuration type cast issue on big endian systems Jun 14, 2024
@@ -649,10 +649,10 @@ int flb_config_map_set(struct mk_list *properties, struct mk_list *map, void *co
}
else if (m->type == FLB_CONFIG_MAP_TIME) {
m_i_num = (int *) (base + m->offset);
*m_i_num = m->value.val.s_num;
*m_i_num = m->value.val.i_num;
Copy link
Author

@rightblank rightblank Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is stored in the first 4 byte of val, get it using s_num will add 4 bytes of zero on the higher address in memory, this gives different result on big endian systems and little endian systems, this can be proved using the below code

#include <stdio.h>
#include <stddef.h> // Include this header for size_t

int main() {
    printf("Size of size_t: %zu bytes\n", sizeof(size_t));    // 8
    printf("Size of int:    %zu bytes\n\n", sizeof(int));     // 4

    int a[2] = {1, 0};

    int *i_num = (int *)a;
    printf("Value of i_num: %d\n\n", *i_num);

    size_t *s_num = (size_t *)a;
    printf("Value of s_num: %zu\n", *s_num);
    printf("Cast s_num as int: %d\n", (int)(*s_num));

    return 0;
}

The output of the code on x86(little endian system):

Size of size_t: 8 bytes
Size of int:    4 bytes

Value of i_num: 1

Value of s_num: 1
Cast s_num as int: 1

On s390s system(big endian):

Size of size_t: 8 bytes
Size of int:    4 bytes

Value of i_num: 1

Value of s_num: 4294967296
Cast s_num as int: 0

Copy link
Author

@rightblank rightblank Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The s_num 4294967296 in the output of big endian system equals to 1<<32

@@ -776,7 +776,7 @@ int flb_config_map_set(struct mk_list *properties, struct mk_list *map, void *co
*m_d_num = atof(kv->val);
}
else if (m->type == FLB_CONFIG_MAP_BOOL) {
m_bool = (char *) (base + m->offset);
m_bool = (int *) (base + m->offset);
Copy link
Author

@rightblank rightblank Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assigning the value 1 to the byte indexed at base + m->offset and read it combining with the 3 following bytes as int will lead to different result on big endian and little endian systems,
on big endian systems, the 4 byte number |01|00|00|00| equals to 16777216(1<<24), on little endian systems it equals to 1.
This can be verified by the code snippet below:

#include <stdio.h>
#include <stdbool.h>

int main() {
    int a = 0;
    char *p = (char*)(&a);
    *p = 1;

    printf("The value of a is now: %d\n", a);

    int b;
    int *p1 = (int*)(&b);
    *p1 = 1;
    printf("The value of b is now: %d\n", b);
    return 0;
}

The output on x86(little endian system):

The value of a is now: 1
The value of b is now: 1

on s390x(big endian system):

The value of a is now: 16777216
The value of b is now: 1

@rightblank
Copy link
Author

rightblank commented Jun 17, 2024

Hi, @cosmo0920 @edsiper, could you please help to review this PR? I added 2 code snippet to show the issues.

It fixes two types configuration failure issues that only appear on big endian systems,

  • Plugin configurations with predefined int values are always zero, .e.g rotate_wait for tail plugin.
  • Plugin configuration cannot be turned on in configuration files, e.g. lowercase in systemd plugin.

@cosmo0920
Copy link
Contributor

Could you ensure DCO, that is, adding Signed-off ... line in your commit?

@rightblank
Copy link
Author

Could you ensure DCO, that is, adding Signed-off ... line in your commit?

Hi, @cosmo0920, could you please help to review again? I am from IBM and can assure DCO is complied, signed-off has been added to the commit.

@rightblank
Copy link
Author

Hi, @cosmo0920, could you help to get this PR merged?

Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this change should be nice. However, I didn't have BE byteorder systems. All of my having platforms are LE, i.e. x86_64 (Windows, Linux), aarch64 Linux on Chromebook and Windows on ARM64, and Linux on riscv64 . Before getting merged, should we confirm this PR effects on somewhere else of s390x systems on cloud?

@rightblank
Copy link
Author

rightblank commented Jul 2, 2024

I believe that this change should be nice. However, I didn't have BE byteorder systems. All of my having platforms are LE, i.e. x86_64 (Windows, Linux), aarch64 Linux on Chromebook and Windows on ARM64, and Linux on riscv64 . Before getting merged, should we confirm this PR effects on somewhere else of s390x systems on cloud?

Hi, @cosmo0920, the test log and the valgrind output in the PR description could help to double confirm this is done on s390x,

==36999== by 0x8000C796F: ??? (in /usr/local/libexec/valgrind/memcheck-s390x-linux)

but if you really want to, you can use the LinuxONE Community Cloud. The usage is free for Open-Source Developers:
https://developer.ibm.com/components/ibm-linuxone/gettingstarted/

@ScarletTanager
Copy link

ScarletTanager commented Jul 3, 2024

@cosmo0920 This is a really important change for us, so we'd like to see it merged as soon as feasibly possible. Do you think the test log and valgrind output supplied by @rightblank will be sufficient assurance that this has actually been tested on a BE system?

/cc @edsiper @fujimotos @koleini @leonardo-albertovich @agup006

@rightblank
Copy link
Author

but if you really want to, you can use the LinuxONE Community Cloud. The usage is free for Open-Source Developers:
https://developer.ibm.com/components/ibm-linuxone/gettingstarted/

@cosmo0920, Did you get the s390x environment and try the PR? Is there a chance we can get this fix into v3.0.8?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants