Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add main encoding detection functions #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nalimilan
Copy link
Member

I had wrapped some functions to convert from one encoding to another, and then I realized you had just done the same a few days ago. ;-) So I went on and wrote wrappers for encoding detection functions, which seem to work relatively well in the few tests I did.

Something I'm not sure I did right is what to do when ucsdet_detect returns NULL. I could return C_NULL or nothing, which would break type stability. So for now I return an object with a null pointer, and I handle this in ucsdet_getName, by returning an empty encoding name.

@nalimilan
Copy link
Member Author

Actually, I'd now be inclined to throwing an exception when ucsdet_detect returns NULL. That would allow using try, which is more natural than checking for a NULL pointer or an empty string. What do you think?

@nolta
Copy link
Collaborator

nolta commented Apr 21, 2014

Thanks! Yes, having uscdet_detect throw an error sounds better.

@nalimilan
Copy link
Member Author

Here's a new version. What do you think of the name of the exception?

@nolta
Copy link
Collaborator

nolta commented Apr 22, 2014

I'm having second thoughts about the exception. Is there a scenario where both ucsdet_detect returns NULL and U_SUCCESS(err[1]) is true?

(Ptr{Void},Ptr{Int32},Ptr{UErrorCode}),
csd.p, n, err)
U_SUCCESS(err[1]) || error("ICU: could not detect encoding")
n[1] > 0 || throw(UNoMatchingEncoding())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return an empty array instead of throwing an exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? I first wanted to do that, but I though for consistency with ucsdet_detect it would be better to throw an exception. As you like.

@nalimilan
Copy link
Member Author

Actually, when no encoding matches, U_SUCCESS(err[1]) is true, but the function returns NULL. U_SUCCESS(err[1]) will only fail under exceptional circumstances (i.e. a bug somewhere), and I think it's right to distinguish this total failure from "normal" failure to find a matching encoding.

@nalimilan
Copy link
Member Author

Bump! I have rebased this old branch.

@ScottPJones
Copy link
Contributor

Could this be merged? We have a need for it also. Thanks!

@nalimilan
Copy link
Member Author

Re-bump!

@ScottPJones
Copy link
Contributor

You might want to use https://github.com/JuliaString/ICU.jl instead, which supports the character detection API, fixes a bunch of errors with leaking memory (by adding finalizers and keeping a pointer to the other Julia object to prevent premature GC), and is tested (I am working on getting it to 100% coverage)

@nalimilan
Copy link
Member Author

Ah, cool. I guess it would make sense to redirect to your fork and register it in METADATA if you plan to maintain it in the future?

@ScottPJones
Copy link
Contributor

Yes, that's what I was hoping for. I never did hear back from @nolta about that. If you get a chance, I'd love your comments on it, constructive criticism is very very much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants