-
Notifications
You must be signed in to change notification settings - Fork 131
Uninitialized data in the TokuDB recovery log
A TokuDB MySQL test run with valgrind reported an uninitialized data error when writing into the TokuDB recovery log.
==1032== Syscall param write(buf) points to uninitialised byte(s)
==1032== at 0x3EFA60E4ED: ??? (in /lib64/libpthread-2.12.so)
==1032== by 0xB894038: toku_os_full_write(int, void const*, unsigned long) (file.cc:249)
==1032== by 0xB83248A: write_outbuf_to_logfile(tokulogger*, __toku_lsn*) (logger.cc:513)
==1032== by 0xB83326C: toku_logger_maybe_fsync(tokulogger*, __toku_lsn, int, bool) (logger.cc:836)
==1032== by 0xB8327DE: toku_logger_fsync_if_lsn_not_fsynced(tokulogger*, __toku_lsn) (logger.cc:586)
==1032== by 0xB8493E6: toku_txn_maybe_fsync_log(tokulogger*, __toku_lsn, bool) (txn.cc:600)
==1032== by 0xB7B4EBB: toku_txn_commit(__toku_db_txn*, unsigned int, void (*)(__toku_txn_progress*, void*), void*, bool, bool) (ydb_txn.cc:198)
==1032== by 0xB7B55E9: locked_txn_commit_with_progress(__toku_db_txn*, unsigned int, void (*)(__toku_txn_progress*, void*), void*) (ydb_txn.cc:338)
==1032== by 0xB798B04: commit_txn_with_progress(__toku_db_txn*, unsigned int, THD*) (hatoku_hton.cc:638)
==1032== by 0xB798E82: tokudb_commit(handlerton*, THD*, bool) (hatoku_hton.cc:679)
==1032== by 0x655597: ha_commit_low(THD*, bool, bool) (handler.cc:1496)
==1032== by 0x726681: TC_LOG_MMAP::commit(THD*, bool) (log.cc:2642)
==1032== by 0x6553F0: ha_commit_trans(THD*, bool, bool) (handler.cc:1439)
==1032== by 0x8A825F: trans_commit_stmt(THD*) (transaction.cc:434)
==1032== by 0x7E95E2: mysql_execute_command(THD*) (sql_parse.cc:4997)
TokuDB should not be storing uninitialized data into its files, so I need to find the the code that neglected to initialize the memory that eventually got written to the recovery log. An update SQL statement was being committed at the time of the valgrind error, so I started searching for the error in the TokuDB update logic.
Updates in MySQL are typically use a read, modify, write algorithm. I discovered that the value being written into the fractal tree by the update function had one uninitialized byte. This byte contained the null bits. I further discovered that the value read from the fractal tree by the TokuDB update function contained this uninitialized byte. So, the input to the update statement was uninitialized. Some previous SQL statement must have written an uninitialized null byte.
I found a previous alter table statement that added a column to the table. In addition, I found that the column addition update callback was the source of the uninitialized null byte.
The column addition update callback sets or clears a bit in the null bytes array of the new row for every nullable column in the row. The initial state of the new null bytes array is random since it mallocs some memory but does not zero it. The column addition algorithm sets or clears the set of bits corresponding to the nullable columns, but it does not set all of the bits in the null bytes array since it ignores those bits that do not have a corresponding nullable column. Valgrind detected that not all of the bits were set and reported the uninitialized data error.
So, no logic error here. I changed the column addition callback to zero the initial state of the null bits array prior to the same code setting or clearing the nullable column's null bit. The end result is that all the bits in the null bytes array are initialized and valgrind no longer complains.
I did not realize that valgrind tracks the state of each memory bit. Here is a simple program that shows this.
1 #include <stdio.h>
2 #include <assert.h>
3 #include <stdlib.h>
4 #include <unistd.h>
5 #include <memory.h>
6 static void setbits(unsigned char *b, unsigned max_bits) {
7 for (unsigned j = 0; j < max_bits; j++) {
8 if (j & 1)
9 *b |= 1<<j;
10 else
11 *b &= ~(1<<j);
12 }
13 }
14 int main(int argc, char *argv[]) {
15 assert(argc == 2);
16 unsigned max_bits = atoi(argv[1]);
17 unsigned char *b = new unsigned char[1];
18 setbits(b, max_bits);
19 write(1000, b, 1);
20 delete [] b;
21 return 0;
22 }
If we only write the first 4 bits, then valgrind reports an uninitialized data issue.
$ valgrind ./setbits 4
$ valgrind ./setbits 4
==26368== Memcheck, a memory error detector
==26368== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==26368== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26368== Command: ./setbits 4
==26368==
==26368== Syscall param write(buf) points to uninitialised byte(s)
==26368== at 0x3EF9ADB690: __write_nocancel (in /lib64/libc-2.12.so)
==26368== by 0x400781: main (setbits.cc:19)
==26368== Address 0x4c21040 is 0 bytes inside a block of size 1 alloc'd
==26368== at 0x4A07152: operator new[](unsigned long) (vg_replace_malloc.c:363)
==26368== by 0x400756: main (setbits.cc:17)
==26368==
==26368==
==26368== HEAP SUMMARY:
==26368== in use at exit: 0 bytes in 0 blocks
==26368== total heap usage: 1 allocs, 1 frees, 1 bytes allocated
==26368==
==26368== All heap blocks were freed -- no leaks are possible
==26368==
==26368== For counts of detected and suppressed errors, rerun with: -v
==26368== Use --track-origins=yes to see where uninitialised values come from
==26368== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6)
If we write all 8 bits, then valgrind is silent as it should be.
$ valgrind ./setbits 8
==26288== Memcheck, a memory error detector
==26288== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==26288== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26288== Command: ./setbits 8
==26288==
==26288==
==26288== HEAP SUMMARY:
==26288== in use at exit: 0 bytes in 0 blocks
==26288== total heap usage: 1 allocs, 1 frees, 1 bytes allocated
==26288==
==26288== All heap blocks were freed -- no leaks are possible
==26288==
==26288== For counts of detected and suppressed errors, rerun with: -v
==26288== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)