KonText is an advanced corpus query interface for the Manatee-open corpus search engine. It builds on top of core server-side libraries from NoSketchEngine and both applications are data-compatible as well. The development is maintained by the Institute of the Czech National Corpus.
- fully editable query chain
- any operation from a user defined sequence (e.g. query -> filter -> sample -> sorting) can be changed and the whole sequence is then re-executed.
- support for spoken corpora
- defined concordance segments can be played back as audio
- KWIC detail provides a custom rendering with easily distinguishable speeches
- support for user-defined line groups
- user can define custom numeric tags attached to concordance lines, filter out other lines, review groups ratios
- improved subcorpus creation
- user can easily examine corpus structure by selecting some text types and see how other text type attributes availability changed ("which publishers are there in case only fiction is selected?")
- a custom text types ratio can be defined ("give me 20% fiction and 80% journalism")
- a sub-corpus can be created by a custom CQL expression
- subcorpora are backed up as CQL queries which makes further modification/restoring possible
- frequency distribution
- 2-dimensional frequency distribution for both positional and structural attributes
- result caching decreases time required to navigate between pages
- on the multilevel frequency distribution page, starting word can be specified for multi-word KWICs
- persistent URLs for large queries - you can send a link to someone even if the query was in megabytes
- access to previous queries, named queries
- access to favorite corpora (subcorpora, aligned corpora)
- interactive PoS tag tool - in case of positional PoS tag formats an interactive tool can be used to write tag queries
- a concordance/frequency/collocation listing can be saved in Excel format (xlsx)
- a correct (i.e. the one calculating only with selected text types) i.p.m. can be calculated on-demand for ad-hoc subcorpora
- result shuffling can be pre-set
- less full page reloads
- server-side rewritten as a WSGI application (Bonito-open is CGI-based)
- completely rewritten client-side code (React+Flux architecture, TypeScript + ES6, modularized)
- modular code design with dynamically loadable plug-ins providing custom functionality implementation (e.g. custom database adapters, authentication method, corpus listing widgets, HTTP session management)
- fully decoupled background concordance/frequency/collocation calculation based on the Celery task queue (alternatively, Python's multiprocessing package can be used)
- improved logging, error processing and debugging support
- improved code documentation
- a WSGI-compatible server
- Python 2.7 and:
- corpus search engine Manatee
- versions from 2.83.3 to 2.150 are supported (the latest one is highly recommended); unless there is an incompatible change in Manatee, newer versions should work too
- a key-value storage
- (optional) Celery task queue task queue for (asynchronous) background calculations and maintenance tasks
Please refer to the doc/INSTALL.md file for details.
Please refer to our Wiki.