-
Notifications
You must be signed in to change notification settings - Fork 131
Why TokuDB does not use the 'uint3korr' function
The 'uint3korr' function inside of the mysqld server extracts a 3 byte unsigned integer from a memory buffer. One use is for 'mediumint' columns which encode their value in 3 bytes. MySQL 5.6 and MariaDB 10.0 claims to have optimized this function for x86 and x86_64 processors. There is a big comment that says:
Attention: Please, note, uint3korr reads 4 bytes (not 3)!
It means, that you have to provide enough allocated space.
The 'uint3korr' optimization may be fast, but it is not valgrind safe. Here is an example where valgrind detects TokuDB reading beyond the end of a buffer when it uses the 'uint3korr' function.
==3899== Thread 36:
==3899== Invalid read of size 4
==3899== at 0xB76C089: tokudb_uint3korr(unsigned char const*) (hatoku_defines.h:533)
==3899== by 0xB795C5E: cmp_toku_int(unsigned char*, unsigned char*, bool, unsigned int) (hatoku_cmp.cc:431)
==3899== by 0xB797211: compare_toku_field(unsigned char*, unsigned char*, unsigned char*, unsigned int*, unsigned int*, unsigned int*, bool*) (hatoku_cmp.cc:1291)
==3899== by 0xB797D9A: tokudb_compare_two_keys(void const*, unsigned int, void const*, unsigned int, void const*, unsigned int, bool, bool*) (hatoku_cmp.cc:1652)
...
==3899== by 0xB826B4B: ft_search_basement_node(ftnode_leaf_basement_node*, ft_search*, int (*)(unsigned int, void const*, unsigned int, void const*, void*, bool), void*, bool*, ft_cursor*, bool) (ft-o\
ps.cc:4980)
==3899== by 0xB827B0E: ft_search_node(ft_handle*, ftnode*, ft_search*, int, int (*)(unsigned int, void const*, unsigned int, void const*, void*, bool), void*, bool*, ft_cursor*, unlockers*, ancestors*\
, pivot_bounds const*, bool) (ft-ops.cc:5371)
==3899== by 0xB827E75: toku_ft_search(ft_handle*, ft_search*, int (*)(unsigned int, void const*, unsigned int, void const*, void*, bool), void*, ft_cursor*, bool) (ft-ops.cc:5498)
==3899== by 0xB8282C3: ft_cursor_search(ft_cursor*, ft_search*, int (*)(unsigned int, void const*, unsigned int, void const*, void*, bool), void*, bool) (ft-ops.cc:5563)
==3899== Address 0x264c03b1 is 1 bytes inside a block of size 4 alloc'd
==3899== at 0x4A0693D: malloc (vg_replace_malloc.c:271)
==3899== by 0xB8A3BA5: os_malloc(unsigned long) (os_malloc.cc:267)
==3899== by 0xB8A2EA9: toku_xmalloc(unsigned long) (memory.cc:332)
==3899== by 0xB8A3934: toku_xmemdup(void const*, unsigned long) (memory.cc:395)
==3899== by 0xB860BFB: toku_memdup_dbt(__toku_dbt*, void const*, unsigned long) (ybt.cc:193)
==3899== by 0xB826E02: ft_search_basement_node(ftnode_leaf_basement_node*, ft_search*, int (*)(unsigned int, void const*, unsigned int, void const*, void*, bool), void*, bool*, ft_cursor*, bool) (ft-o\
ps.cc:5053)
==3899== by 0xB827B0E: ft_search_node(ft_handle*, ftnode*, ft_search*, int, int (*)(unsigned int, void const*, unsigned int, void const*, void*, bool), void*, bool*, ft_cursor*, unlockers*, ancestors*\
, pivot_bounds const*, bool) (ft-ops.cc:5371)
==3899== by 0xB827E75: toku_ft_search(ft_handle*, ft_search*, int (*)(unsigned int, void const*, unsigned int, void const*, void*, bool), void*, ft_cursor*, bool) (ft-ops.cc:5498)
==3899== by 0xB8282C3: ft_cursor_search(ft_cursor*, ft_search*, int (*)(unsigned int, void const*, unsigned int, void const*, void*, bool), void*, bool) (ft-ops.cc:5563)
==3899== by 0xB828625: toku_ft_cursor_first(ft_cursor*, int (*)(unsigned int, void const*, unsigned int, void const*, void*, bool), void*) (ft-ops.cc:5626)
==3899== by 0xB7B2717: c_getf_first(__toku_dbc*, unsigned int, int (*)(__toku_dbt const*, __toku_dbt const*, void*), void*) (ydb_cursor.cc:280)
==3899== by 0xB7B2C74: c_getf_next(__toku_dbc*, unsigned int, int (*)(__toku_dbt const*, __toku_dbt const*, void*), void*) (ydb_cursor.cc:379)
==3899== by 0xB77CF2A: ha_tokudb::get_next(unsigned char*, int, __toku_dbt*, bool) (ha_tokudb.cc:5339)
==3899== by 0xB77DF30: ha_tokudb::rnd_next(unsigned char*) (ha_tokudb.cc:5578)
==3899== by 0x658127: handler::ha_rnd_next(unsigned char*) (handler.cc:2690)
==3899== by 0x97FE49: rr_sequential(READ_RECORD*) (records.cc:496)
It is OK to read beyond the end of a malloc'ed buffer since the 'uint3korr' function throws away the extra byte. But what if the extra byte is not mapped into mysqld's address space? If this is the case, then mysqld will crash.
Since 'uint3korr' is not safe unless one does a bunch of memory accounting, TokuDB no longer uses the 'uint3korr' function.
The code generated for the following test program
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#if HAVE_VALGRIND
#define uint3korr(A) (uint32_t) (((uint32_t) ((uint8_t) (A)[0])) +\
(((uint32_t) ((uint8_t) (A)[1])) << 8) +\
(((uint32_t) ((uint8_t) (A)[2])) << 16))
#else
#define uint3korr(A) (uint32_t) (*((unsigned int *) (A)) & 0xFFFFFF)
#endif
int main(void) {
uint8_t *p = (uint8_t *) calloc(1, 3);
uint32_t x = uint3korr(p);
printf("%d\n", x);
free(p);
return 0;
}
consists of a 32 bit mov instruction followed by a 24 bit mask instruction
4005eb: 8b 00 mov (%rax),%eax
4005ed: 25 ff ff ff 00 and $0xffffff,%eax
which demonstrates the problem at the instruction level for x86 processors.