GitHub - petrduda/kontext: An alternative web front-end for the Manatee-open corpus search engine

Introduction

KonText is an advanced corpus query interface for the Manatee-open corpus search engine. It builds on top of core server-side libraries from NoSketchEngine and both applications are data-compatible as well. The development is maintained by the Institute of the Czech National Corpus.

Features

new features

fully editable query chain
- any operation from a user defined sequence (e.g. query -> filter -> sample -> sorting) can be changed and the whole sequence is then re-executed.
support for spoken corpora
- defined concordance segments can be played back as audio
- KWIC detail provides a custom rendering with easily distinguishable speeches
support for user-defined line groups
- user can define custom numeric tags attached to concordance lines, filter out other lines, review groups ratios
improved subcorpus creation
- user can easily examine corpus structure by selecting some text types and see how other text type attributes availability changed ("which publishers are there in case only fiction is selected?")
- a custom text types ratio can be defined ("give me 20% fiction and 80% journalism")
- a sub-corpus can be created by a custom CQL expression
- subcorpora are backed up as CQL queries which makes further modification/restoring possible
frequency distribution
- 2-dimensional frequency distribution for both positional and structural attributes
- result caching decreases time required to navigate between pages
- on the multilevel frequency distribution page, starting word can be specified for multi-word KWICs
persistent URLs for large queries - you can send a link to someone even if the query was in megabytes
access to previous queries, named queries
access to favorite corpora (subcorpora, aligned corpora)
interactive PoS tag tool - in case of positional PoS tag formats an interactive tool can be used to write tag queries
a concordance/frequency/collocation listing can be saved in Excel format (xlsx)
a correct (i.e. the one calculating only with selected text types) i.p.m. can be calculated on-demand for ad-hoc subcorpora
result shuffling can be pre-set
less full page reloads

internal changes

server-side rewritten as a WSGI application (Bonito-open is CGI-based)
completely rewritten client-side code (React+Flux architecture, TypeScript + ES6, modularized)
modular code design with dynamically loadable plug-ins providing custom functionality implementation (e.g. custom database adapters, authentication method, corpus listing widgets, HTTP session management)
fully decoupled background concordance/frequency/collocation calculation based on the Celery task queue (alternatively, Python's multiprocessing package can be used)
improved logging, error processing and debugging support
improved code documentation

Requirements

a WSGI-compatible server
- recommended setup: Gunicorn + a reverse proxy (e.g. Nginx or Apache2)
- supported setup: Apache2 with mod_wsgi
Python 2.7 and:
- Cheetah Template Engine
- lxml library
- werkzeug library (provides WSGI middleware)
- PyICU library (optional but preferred)
- markdown library (optional, for formatted corpora references)
- openpyxl library (optional, for XLSX export)
corpus search engine Manatee
- versions from 2.83.3 to 2.150 are supported (the latest one is highly recommended); unless there is an incompatible change in Manatee, newer versions should work too
a key-value storage
- any custom implementation (Redis and SQLite backends are available by default)
(optional) Celery task queue task queue for (asynchronous) background calculations and maintenance tasks

Name		Name	Last commit message	Last commit date
Latest commit History 5,387 Commits
cmpltmpl		cmpltmpl
conf		conf
doc		doc
lib		lib
locale		locale
public		public
scripts		scripts
templates		templates
test-data/tags		test-data/tags
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
Makefile		Makefile
README.md		README.md
dev-requirements.txt		dev-requirements.txt
package.json		package.json
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json
webpack.dev.js		webpack.dev.js
webpack.prod.js		webpack.prod.js
worker.py		worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Features

new features

internal changes

Requirements

Build and installation

Customization and contribution

Notable installations

About

Releases

Packages

Languages

License

petrduda/kontext

Folders and files

Latest commit

History

Repository files navigation

Introduction

Features

new features

internal changes

Requirements

Build and installation

Customization and contribution

Notable installations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages