Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-fat word parts missing #116

Closed
RobH123 opened this issue Apr 3, 2024 · 5 comments
Closed

Low-fat word parts missing #116

RobH123 opened this issue Apr 3, 2024 · 5 comments

Comments

@RobH123
Copy link

RobH123 commented Apr 3, 2024

From 01-Gen-003-lowfat.xml: (the first occurrence of this systematic problem)

<wg type="conjuncted-wg" class="cjp" rule="cj2cjp"> <w xml:id="o010030050101" morph="C" ref="GEN 3:5!10" lemma="וְ">וִ</w>

After this entry (which I abbreviated) for Gen 3:5 word 10, the next word is word 11. AFAICS, the rest (the main portion 'הְיִיתֶם֙') of word 10 is missing as word 11 seems correct with two morphemes?

@jonathanrobie
Copy link
Contributor

jonathanrobie commented Apr 3, 2024

This is an issue I am working on now. I have recently found ~ 900 places where words are missing or in the wrong place in the current Hebrew lowfat. I will update here when I have found and fixed the problem.

These words are present in the nodes representation and in the TSV.

@RobH123
Copy link
Author

RobH123 commented Apr 3, 2024

Yes, it seems that the TSV has it correctly. (Presumably the TSV is made from the nodes representation? Would you say that the TSV has more or less information in it than the low-fat XML? Maybe I should just switch to using the TSV if it's got everything I need?)

@RobH123
Copy link
Author

RobH123 commented Apr 4, 2024

Ah, TSV lacks the valuable role info (which can occur in w fields but is often in the parent wg fields). Oh well...I'll go back to waiting for the repaired lowfat XML.

@RobH123
Copy link
Author

RobH123 commented Apr 10, 2024

Ah, thanks for the update. What does 'c' stand for here?

           <wg class="np" rule="Np-Appos" head="true">
              <c role="">
                 <w xml:id="o010040220061"
                    morph="Np"
                    pos="noun"
                    after=" "
                    type="proper"
                    ref="GEN 4:22!6"
                    sdbh="007730001001000"
                    stronglemma="תּוּבַל־קַ֫יִן"
                    lexdomain="003001007"
                    coredomain=""
                    unicode="תּ֣וּבַל קַ֔יִן"
                    class="noun"
                    lang="H"
                    lemma="תּוּבַל־קַ֫יִן">תּ֣וּבַל</w>
                 <w xml:id="o010040220071"
                    morph="Np"
                    pos="noun"
                    after=" "
                    type="proper"
                    ref="GEN 4:22!7"
                    sdbh="007730001001000"
                    stronglemma="תּוּבַל־קַ֫יִן"
                    lexdomain="003001007"
                    coredomain=""
                    unicode="תּ֣וּבַל קַ֔יִן"
                    class="noun"
                    lang="H"
                    lemma="תּוּבַל־קַ֫יִן">קַ֔יִן</w>
              </c>

EDIT: Ah, I guess it stands for "compound". But somehow all glossing type fields were lost??? Should they have been on the compound word?
EDIT2: Yes, I rebuilt my literal OT and everything is MUCH better now except for those missing compound glosses. Thanks to @jacobwegner and @jonathanrobie.

@RobH123
Copy link
Author

RobH123 commented Apr 11, 2024

Closing this issue because I think all the Hebrew word parts are now there.

Created a new issue for the 'c' problem: #121

@RobH123 RobH123 closed this as completed Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants