-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sequence more abstract, so it can be anything, not just array of chars. #90
Comments
So far there have been 3 issues asking for multibyte support, so I assigned important label to this feature as it seems to be important to users. |
This is a workaround until Martinsos#90 is implemented. If either query or target contain non-ascii values, they are mapped into an ASCII alphabet and the resulting byte sequences are used for doing the alignment.
This is a workaround until Martinsos#90 is implemented. If either query or target contain non-ascii values, they are mapped into an ASCII alphabet and the resulting byte sequences are used for doing the alignment.
This is a workaround until #90 is implemented. If either query or target contain non-ascii values, they are mapped into an ASCII alphabet and the resulting byte sequences are used for doing the alignment. This works only if whole alphabet does not have more than 256 characters.
With @jbaiter 's addition to Python version of Edlib this issue is less pressing, but still, it should be the next one to do. |
This is also linked to this: #141 (Unicode support in python edlib). |
@masri2019 has been working on this for some time now with a little bit of my guidance, so I will document here what has been done and what is yet to be done to call this feature complete!
We are using "big" feature branch Additional ideas/considerations:
|
Hey @masri2019, how are you doing? We made great progress with this one and then stopped -> are you still interested in possibly continuing with it, how are you with time? |
Hi Martin!
Thanks for asking. Yes I'm definitely interested in finishing what we have
started. I have been busy doing some other projects but I can plan to
dedicate some time to edlib.
Based on what you sent, the next step is updating the readme. I'll create a
pull request for that.
…-Mobin
On Tue, Aug 31, 2021 at 11:31 AM Martin Šošić ***@***.***> wrote:
Hey @masri2019 <https://github.com/masri2019>, how are you doing? We made
great progress with this one and then stopped -> are you still interested
in possibly continuing with it, how are you with time?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#90 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANLIBF55QRLUOAKGYESXSXLT7UNZ5ANCNFSM4DXXI44A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@masri2019 that is awesome :)!! I will also do my best to help you, I believe the two us can finish it together, if needed I can involve myself more, I should also be able to carve out some time. Yes, the next step is README based on the checklist I created above (which I am now really happy I made because I would have no idea where we stopped otherwise :D). And then python bindings. I am sure we can get both of those done. Next will be discussion about C wrapper, that might be a bit harder, but ok that is also doable. And then final polishing! All together sounds like we (you) did the hardest part already, so really looking forward to this. Although, you know how they say: last 20% takes 80% of the time. But let's hope in this case percentages will be gentle to us. |
@masri2019 I am guessing it might be a bit hard getting back into it after so much time, so I would advise you do what you can and if you get stuck somewhere no worries, make a draft PR and I can jump in, we will figure it out together. I also forget a lot of things but I am sure we will remember it relatively quickly, since we were writing pretty nice code. |
Multiple people where asking about support for multybyte characters (unicode).
One way to provide that and even more is by making a sequence not an array of chars, but instead an array of objects that satisfy the condition that they have equality operator defined over them.
What would the impact on speed be in this case? I think it would not be big impact, since they are anyway used only to calculate Peq and after that Peq is used.
Would it make it harder to use edlib for usual cases? Would it become to general, hard to use for strings? How could we make sure it is still easy to use while offering flexilibity?
Finally, this might be easier to implement if I decide before that to go with just C++ interface, so I should think about that first.
The text was updated successfully, but these errors were encountered: