Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is it possible to make tiny-utf8 case insensitive? #64

Open
rdev1983 opened this issue Sep 16, 2021 · 4 comments
Open

is it possible to make tiny-utf8 case insensitive? #64

rdev1983 opened this issue Sep 16, 2021 · 4 comments

Comments

@rdev1983
Copy link

rdev1983 commented Sep 16, 2021

For example we can make a separate typedef using std::string class to behave like case insensitive class. I tried same approach with tiny-utf8 class and i got too many errors. following is the code which makes std::string derived class to behave like case insensitive.
any clue?


struct ci_char_traits : public char_traits<char> {
    static bool eq(char c1, char c2) { return toupper(c1) == toupper(c2); }
    static bool ne(char c1, char c2) { return toupper(c1) != toupper(c2); }
    static bool lt(char c1, char c2) { return toupper(c1) <  toupper(c2); }
    static int compare(const char* s1, const char* s2, size_t n) {
        while( n-- != 0 ) {
            if( toupper(*s1) < toupper(*s2) ) return -1;
            if( toupper(*s1) > toupper(*s2) ) return 1;
            ++s1; ++s2;
        }
        return 0;
    }
    static const char* find(const char* s, int n, char a) {
        while( n-- > 0 && toupper(*s) != toupper(a) ) {
            ++s;
        }
        return s;
    }
};

typedef std::basic_string<char, ci_char_traits> ci_string;

@DuffsDevice
Copy link
Owner

Hey, I will figure out, what I can do for you! I might need some time to reply, but I won't forget.

@vadim-berman
Copy link

Hi Jakob,

I happen to have a lookup table, including the accented characters, the funky variants, and exceptions like the Turkish dotless i. I can email you the source code, if you like.

@DuffsDevice
Copy link
Owner

DuffsDevice commented Nov 15, 2021

Hi Vadim, sure! I will have a look at it and see, how it might be include. What License is it under?

@vadim-berman
Copy link

Thanks, Jakob.

As a compilation, it's ours (traversal of Unicode tables + manual changes and proofreading), so whatever license you prefer, I guess :) . Nothing fancy, it looks like this:

    LOAD_LATIN_LETTER_PAIR(u8"F", u8"f");
    LOAD_LATIN_LETTER_PAIR(u8"G", u8"g");
    LOAD_LATIN_LETTER_PAIR(u8"H", u8"h");
    if (_standardCode == "tr") {
        LOAD_LATIN_LETTER_PAIR(u8"İ", u8"i");
        LOAD_LATIN_LETTER_PAIR(u8"I", u8"ı");
    } else {
        LOAD_LATIN_LETTER_PAIR(u8"I", u8"i");
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants