-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low-fat word parts missing #116
Comments
This is an issue I am working on now. I have recently found ~ 900 places where words are missing or in the wrong place in the current Hebrew lowfat. I will update here when I have found and fixed the problem. These words are present in the nodes representation and in the TSV. |
Yes, it seems that the TSV has it correctly. (Presumably the TSV is made from the nodes representation? Would you say that the TSV has more or less information in it than the low-fat XML? Maybe I should just switch to using the TSV if it's got everything I need?) |
Ah, TSV lacks the valuable role info (which can occur in w fields but is often in the parent wg fields). Oh well...I'll go back to waiting for the repaired lowfat XML. |
Ah, thanks for the update. What does 'c' stand for here?
EDIT: Ah, I guess it stands for "compound". But somehow all glossing type fields were lost??? Should they have been on the compound word? |
Closing this issue because I think all the Hebrew word parts are now there. Created a new issue for the 'c' problem: #121 |
From 01-Gen-003-lowfat.xml: (the first occurrence of this systematic problem)
<wg type="conjuncted-wg" class="cjp" rule="cj2cjp"> <w xml:id="o010030050101" morph="C" ref="GEN 3:5!10" lemma="וְ">וִ</w>
After this entry (which I abbreviated) for Gen 3:5 word 10, the next word is word 11. AFAICS, the rest (the main portion 'הְיִיתֶם֙') of word 10 is missing as word 11 seems correct with two morphemes?
The text was updated successfully, but these errors were encountered: