Low-fat word parts missing #116

RobH123 · 2024-04-03T20:13:17Z

From 01-Gen-003-lowfat.xml: (the first occurrence of this systematic problem)

<wg type="conjuncted-wg" class="cjp" rule="cj2cjp"> <w xml:id="o010030050101" morph="C" ref="GEN 3:5!10" lemma="וְ">וִ</w>

After this entry (which I abbreviated) for Gen 3:5 word 10, the next word is word 11. AFAICS, the rest (the main portion 'הְיִיתֶם֙') of word 10 is missing as word 11 seems correct with two morphemes?

The text was updated successfully, but these errors were encountered:

jonathanrobie · 2024-04-03T20:17:22Z

This is an issue I am working on now. I have recently found ~ 900 places where words are missing or in the wrong place in the current Hebrew lowfat. I will update here when I have found and fixed the problem.

These words are present in the nodes representation and in the TSV.

RobH123 · 2024-04-03T20:18:15Z

Yes, it seems that the TSV has it correctly. (Presumably the TSV is made from the nodes representation? Would you say that the TSV has more or less information in it than the low-fat XML? Maybe I should just switch to using the TSV if it's got everything I need?)

RobH123 · 2024-04-04T01:25:22Z

Ah, TSV lacks the valuable role info (which can occur in w fields but is often in the parent wg fields). Oh well...I'll go back to waiting for the repaired lowfat XML.

RobH123 · 2024-04-10T21:07:37Z

Ah, thanks for the update. What does 'c' stand for here?

           <wg class="np" rule="Np-Appos" head="true">
              <c role="">
                 <w xml:id="o010040220061"
                    morph="Np"
                    pos="noun"
                    after=" "
                    type="proper"
                    ref="GEN 4:22!6"
                    sdbh="007730001001000"
                    stronglemma="תּוּבַל־קַ֫יִן"
                    lexdomain="003001007"
                    coredomain=""
                    unicode="תּ֣וּבַל קַ֔יִן"
                    class="noun"
                    lang="H"
                    lemma="תּוּבַל־קַ֫יִן">תּ֣וּבַל</w>
                 <w xml:id="o010040220071"
                    morph="Np"
                    pos="noun"
                    after=" "
                    type="proper"
                    ref="GEN 4:22!7"
                    sdbh="007730001001000"
                    stronglemma="תּוּבַל־קַ֫יִן"
                    lexdomain="003001007"
                    coredomain=""
                    unicode="תּ֣וּבַל קַ֔יִן"
                    class="noun"
                    lang="H"
                    lemma="תּוּבַל־קַ֫יִן">קַ֔יִן</w>
              </c>

EDIT: Ah, I guess it stands for "compound". But somehow all glossing type fields were lost??? Should they have been on the compound word?
EDIT2: Yes, I rebuilt my literal OT and everything is MUCH better now except for those missing compound glosses. Thanks to @jacobwegner and @jonathanrobie.

RobH123 · 2024-04-11T19:56:19Z

Closing this issue because I think all the Hebrew word parts are now there.

Created a new issue for the 'c' problem: #121

RobH123 closed this as completed Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low-fat word parts missing #116

Low-fat word parts missing #116

RobH123 commented Apr 3, 2024

jonathanrobie commented Apr 3, 2024 •

edited

Loading

RobH123 commented Apr 3, 2024 •

edited

Loading

RobH123 commented Apr 4, 2024 •

edited

Loading

RobH123 commented Apr 10, 2024 •

edited

Loading

RobH123 commented Apr 11, 2024 •

edited

Loading

Low-fat word parts missing #116

Low-fat word parts missing #116

Comments

RobH123 commented Apr 3, 2024

jonathanrobie commented Apr 3, 2024 • edited Loading

RobH123 commented Apr 3, 2024 • edited Loading

RobH123 commented Apr 4, 2024 • edited Loading

RobH123 commented Apr 10, 2024 • edited Loading

RobH123 commented Apr 11, 2024 • edited Loading

jonathanrobie commented Apr 3, 2024 •

edited

Loading

RobH123 commented Apr 3, 2024 •

edited

Loading

RobH123 commented Apr 4, 2024 •

edited

Loading

RobH123 commented Apr 10, 2024 •

edited

Loading

RobH123 commented Apr 11, 2024 •

edited

Loading