Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open work items for 5.12.5 #1454

Open
linas opened this issue Feb 24, 2023 · 12 comments
Open

Open work items for 5.12.5 #1454

linas opened this issue Feb 24, 2023 · 12 comments

Comments

@linas
Copy link
Member

linas commented Feb 24, 2023

See comment in #1446 (comment) for pending work items for 5.12.1

I think it makes sense to also start a 5.13.0 branch that will include proposals #1450, and #1453 and #1452 and maybe #1449 depending on how that goes. And if #1449 can happen easily, then it would be version 6.0

@linas
Copy link
Member Author

linas commented Feb 24, 2023

The emscripten issues are in #1361 #1374 and #1377

@ampli
Copy link
Member

ampli commented Feb 24, 2023

For 6.00 I have many PRs that I would like to include at least some of them:

  1. Dict token insertion (need to find the issue number).
  2. Tokenization drastic speed improvements.
  3. Generator drastic speedup.
  4. Generator API.
  5. Cross-links implementation (I need your answers to my old questions + more discussion, in order to complete it).
  6. Implement power-prune for expressions in order to make power_prune() much faster.
  7. Simplify expressions before converting them to disjuncts (it speeds up building the disjuncts).
    (The code was ready for PR but then I changed Exp_struct before I sent it, and its conversion to the new struct turned out to be buggy so I need to work on it some more...).
  8. More power-pruning! It removed an additional ~5% of the disjuncts. (This new power pruning had worked but then I introduced a bug without committing the working code..., so again I need to continue debugging...).
  9. Rewritten post-processing, for drastic postprocessing speedup and drastically increasing the number of good linkages per linkage_limit.
  10. Tests for link-parser.
  11. Graphical link-parser (Python).
  12. Local hard costs (we need to discuss this).
  13. Segmentation according to the dict.
  14. Partial parsing infrastructure.
  15. Phantom word handling.
  16. Capitalization handling by dict definitions.

@linas
Copy link
Member Author

linas commented Feb 28, 2023

Re tokenization speed: In one of my atomese use-cases on and older slower machine, I see the following performance:

  • 500 millisecs tokenization
  • 42 millisecs prepare-to-parse
  • 400 millisecs count
  • 1200 millisecs extract linkages

The above was obtained using sentences that are all exactly 12 words long. Dictionary lookup times not included in the tokenization. Linkages limit = 15K

@linas
Copy link
Member Author

linas commented Mar 2, 2023

More about tokenization. With the atomese dicts, the dict can grow after every sentence. Thus, I call condesc_setup(dict); after tokenization, before parsing. It took me two days to discover that it runs about 1sec at first, growing to 10 sec after a while. Thus, it acounts from 1/3 of grand-total sentence time at first, to 80% after a while.

I need to find some way of doing what it does incrementally. Possibly by telling it exactly what expressions were added. -- fixed in #1459

@linas linas changed the title Open work items for 5.12.1 Open work items for 5.12.2 Mar 5, 2023
@linas
Copy link
Member Author

linas commented Mar 5, 2023

I published version 5.12.1 -- I couldn't wait, certain automation scripts depend on the published tarballs.

@SoapGentoo
Copy link
Contributor

hi @linas
I tried updating to 5.12.2 in Gentoo but am getting build failures:

In file included from /var/tmp/portage/dev-libs/link-grammar-5.12.2/work/link-grammar-5.12.2/link-grammar/sat-solver/word-tag.cpp:1:
/var/tmp/portage/dev-libs/link-grammar-5.12.2/work/link-grammar-5.12.2/link-grammar/sat-solver/word-tag.hpp:23:83: error: 'X_node' does not name a type
   23 |                     const std::vector<int>& er, const std::vector<int>& el, const X_node *w_xnode, Parse_Options opts)
      |                                                                                   ^~~~~~
In file included from /var/tmp/portage/dev-libs/link-grammar-5.12.2/work/link-grammar-5.12.2/link-grammar/sat-solver/word-tag.cpp:1:
/var/tmp/portage/dev-libs/link-grammar-5.12.2/work/link-grammar-5.12.2/link-grammar/sat-solver/word-tag.hpp:82:9: error: 'X_node' does not name a type
   82 |   const X_node *word_xnode;
      |         ^~~~~~

which we haven't seen in 5.12.0

linas added a commit that referenced this issue Mar 11, 2023
@linas
Copy link
Member Author

linas commented Mar 11, 2023

build failures:

I'm looking. Recommended fix is to disable the build of the sat-solver code. Since it's disabled by default, your build scripts must have turned it on. (Just run ../configure without any options.)

The recommendation is to disable, because the SAT parser is slower, in all situations, than the regular parser; in some cases, it is 10x or 20x slower. I've been considering deleting it permanently, although Amir convinced me that it can be fixed up. And so .. its in limbo ...

@SoapGentoo If you are willing to carry patches, I just pushed a fix here: ffdf5d8

Otherwise, wait for 5.12.3 ... which might appear in a few weeks(? I have plans for "urgent" Atomese fixes which necessitate an LG release.)

@linas linas changed the title Open work items for 5.12.2 Open work items for 5.12.3 Mar 11, 2023
@linas linas changed the title Open work items for 5.12.3 Open work items for 5.12.4 Mar 24, 2023
@linas
Copy link
Member Author

linas commented Mar 24, 2023

@SoapGentoo Version 5.12.3 is now out, with the fix you reported above.

@SoapGentoo
Copy link
Contributor

@linas after confirming that 5.12.3 works indeed, I proceeded to pass --disable-sat-solver to ./configure to disable the SAT solver as per your recommendations. Thanks 👍

@linas
Copy link
Member Author

linas commented Mar 25, 2023

Cool. OK. FWIW. the SAT solver is already disabled by default (configure.ac lines 365ff) so if it was on for you, then somehow you were carrying a config setting from long ago? Keep in mind that ./configure does not start with a clean state; it remembers flags from prior invocations. (This also reveals my testing is incomplete.)

@SoapGentoo
Copy link
Contributor

in general, we like to specify all options to ./configure, since it makes our configuration more robust to changes of default settings. In this case, the --enable-sat-solver=bundled was added due to a conflict with the system minisat: https://bugs.gentoo.org/593662

gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Mar 26, 2023
* Upstream recommends not using the sat solver anymore:
  opencog/link-grammar#1454 (comment)

Bug: https://bugs.gentoo.org/593662
Signed-off-by: David Seifert <[email protected]>
@linas
Copy link
Member Author

linas commented Mar 26, 2023

Hm. OK. SAT was disabled to discourage it's use. In all situations, it is always slower, sometimes slower by factors of 10x or 100x. Amir says that, in fact, this can be fixed up and repaired, which might make SAT faster than the regular parser, maybe.

Whether this is worth the effort, or not, depends mostly on future applications, rather than on the current situation. For the present English, russian, Thai, etc. dictionaries, reviving SAT seems pointless: the current parser is good enough. However, I'm working with brand-new dicts which have radically different structure, and different performance profiles, and make different demands on the parser. And for those, maybe the SAT parser could be faster or more space-efficient. Maybe, or maybe not. Unexplored.

@linas linas pinned this issue Apr 25, 2024
@linas linas changed the title Open work items for 5.12.4 Open work items for 5.12.5 Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants