20 Dec 20:32

javanna

8849540

10.1.0 Latest

Latest

New Features

Add IndexInput::isLoaded to determine if the contents of an input is resident in physical memory
FeatureField now supports storing term vectors.

Improvements

TieredMergePolicy now allows merging up to maxMergeAtOnce segments for merges below the floor segment size, even if maxMergeAtOnce is greater than segmentsPerTier. This makes it more efficient to configure TieredMergePolicy to merge segments aggressively by configuring a high value of floorSegmentSize (e.g. 64MB), a low value of segmentsPerTier (e.g. 4) and a high value of maxMergeAtOnce (e.g. 32).

Optimizations

Many speedups to top-k query evaluation, in particular: top-level disjunctions, filtered disjunctions, conjunctions, DisjunctionMaxQuery.
Speedup to exhaustive evaluation of conjunctive queries by vectorizing the intersection of postings lists.
Reduced contention for top-k query evaluation when IndexSearcher is configured with an executor.

Assets 2

13 Dec 11:23

ChrisHegarty

releases/lucene/9.12.1

7a97a05

9.12.1

Improvements

Allow easier configuration of the Panama vectorization provider with newer Java versions. Set the org.apache.lucene.vectorization.upperJavaFeatureVersion system property to increase the set of Java versions that Panama vectorization will provide optimized implementations for.

Bug fixes

Fixed backwards compatibility bug that caused sparse (not all documents have a vector) KNN indices written with 9.0.0 to give silently (no exception) terrible recall results when searched by any 9.x release
Improve Tessellatorlogic when two holes share the same vertex with the polygon which was failing in valid polygons.
Fix backwards compatibility bug that caused 9.12.0 to incorrectly throw IllegalStateException when trying to open an IndexReader on an index created with quantized (int4, int7, int8) KNN vectors using Lucene99HnswScalarQuantizedVectorsFormat.

Assets 2

14 Oct 13:02

javanna

releases/lucene/10.0.0

eadc07c

10.0.0

System requirements

Lucene 10.0 requires JDK 21 or newer

API changes

KNN vector values now have a random-access API.
Deprecated APIs have been removed and a number of API changes have been made. Please consult the migrate guide for an extensive list and actions to take to migrate to 10.0.

New Features

A new IndexInput#prefetch API has been added, allowing query evaluation logic to let the Directory know about regions of data that are about to be read. This helps perform I/O concurrently under the hood. MMapDirectory implements this API using the madvise system call and the MADV_WILLNEED flag on Linux and Mac OS.
Lucene now supports sparse indexing on doc values via FieldType#setDocValuesSkipIndexType. The sparse index will record the minimum and maximum values per block of doc IDs. Used in conjunction with index sorting to cluster similar documents together, this allows for very space-efficient and CPU-efficient filtering.
Search concurrency is now decoupled from the index geometry, so that an index can be searched using any number of threads, regardless of its number of segments.
Kmeans clustering on vectors

Improvements

Lucene now opens files with the MADV_RANDOM advice by default on Linux and Mac OS. This results in better efficiency for indexes that exceed the size of the page cache, but can make it slower to load indexes in the page cache. It is possible to revert to the MADV_NORMAL read advice by default by passing -Dorg.apache.lucene.store.defaultReadAdvice=NORMAL as a JVM startup flag.
Snowball dictionaries have been upgraded, resulting in improved tokenization. This may require reindexing to ensure consistency of search results with pre-10.0 indexes.
The expressions module is now using MethodHandles and Dynamic Class-File Constants (JEP 309) in combination with hidden classes (JEP 371) to implement a strict and type-safe call to external functions. This allows to easier extend expressions with custom functions in secure way because runtime linking of custom functions is no longer the responsibility of the expressions scripting engine. In addition, the hidden classes created by the expressions engine no longer suffer from global classloader locks.

... plus a multitude of helpful bug fixes!

Assets 2

28 Sep 20:19

ChrisHegarty

releases/lucene/9.12.0

e913796

9.12.0

Security Fixes

Deserialization of Untrusted Data vulnerability in Apache Lucene Replicator - CVE-2024-45772

New Features

Improve intra-merge parallelism for many value types. (Ben Trent)
Add support JDK 23 to the Panama Vectorization Provider. (Chris Hegarty)

Improvements

Add Intervals.regexp and Intervals.range methods to produce IntervalsSource for regexp and range queries. (Mayya Sharipova)
Remove support for writing 8 bit scalar vector quantization. 4 and 7 bit quantization are still supported (Michael McCandless )

Optimizations

Inline postings skip data to improve performance of queries that need skipping such as conjunctions. (Adrien Grand)
Optimizations to the decoding logic of blocks of postings. (Adrien Grand, Uwe Schindler, Greg Miller)
Avoid performance degradation with closing shared mapped segment data (Chris Hegarty, Michael Gibney, Uwe Schindler)

... plus a multitude of helpful bug fixes!

Assets 2

27 Jun 13:46

iverase

releases/lucene/9.11.1

0c087df

9.11.1

Bug Fixes

Fix performance regression in NumericComparator.
Remove intra-merge parallelism for everything except HNSW graph merges.
Fix bug that prevented adding a parent field to an index with no fields.
Fix IndexOutOfBoundsException thrown in DefaultPassageFormatter by unordered matches.
StringValueFacetCounts stops throwing NPE when faceting over an empty match-set.

Assets 2

06 Jun 14:29

benwtrent

releases/lucene/9.11.0

d433394

9.11.0

New features

Add support for posix_madvise to MMapDirectory: If running on Linux/macOS and Java 21 or later, MMapDirectory uses IOContext to pass suitable MADV flags to kernel of operating system. This may improve paging logic especially when working with large indexes under memory pressure.
Expand support for new scalar bit levels for HNSW vectors. This includes 4-bit vectors and an option to compress them to gain a 50% reduction in memory usage.
Recursive graph bisection is now supported on indexes that have blocks

Improvements

MergeScheduler can now provide an executor for intra-merge parallelism. The first implementation is the ConcurrentMergeScheduler.
Upgrade icu4j to version 74.2.

Optimizations

Use RWLock to access LRUQueryCache to reduce contention.
Speedup multi-segment HNSW graph search for diversifying child kNN queries.
Add a MemorySegment Vector scorer - for scoring without copying on-heap. This can improve search latency by almost 2x for byte vectors.
Switch to using optimized, primitive collections where possible to improve performance and heap utilization.

Full Changelog: releases/lucene/9.10.0...releases/lucene/9.11.0

Assets 2

20 Feb 17:21

jpountz

releases/lucene/9.10.0

695c0ac

9.10.0

New Features

Support for similarity-based vector searches, ie. finding all nearest neighbors whose similarity is greater than a configured threshold from a query vector. See [Byte|Float]VectorSimilarityQuery.
Index sorting is now compatible with block joins. See IndexWriterConfig#setParentField.
MMapDirectory now takes advantage of the now finalized JDK foreign memory API internally when running on Java 22 (or later). This was only supported with Java 19 to 21 until now.
SIMD vectorization now takes advantage of JDK vector incubator on Java 22. This was only supported with Java 20 or 21 until now.

Optimizations

Tail postings are now encoded using group-varint. This yielded speedups on queries that match lots of terms that have short postings lists in Lucene's nightly benchmarks.
Range queries on points now exit earlier when evaluating a segment that has no matches. This will improve performance when intersected with other queries that have a high up-front cost such as multi-term queries.
BooleanQueries that mix SHOULD and FILTER clauses now propagate minimum competitive scores to the SHOULD clauses, yielding significant speedups for top-k queries sorted by descending score.
IndexSearcher#count has been optimized on pure disjunctions of two term queries.

Assets 2

29 Jan 15:33

ChrisHegarty

releases/lucene/9.9.2

a293978

9.9.2

Lucene 9.9.2 release

Assets 2

16 Dec 23:00

ChrisHegarty

releases/lucene/9.9.1

eee32cb

9.9.1

Lucene 9.9.1 release

Assets 2

04 Dec 14:42

ChrisHegarty

releases/lucene/9.9.0

06070c0

9.9.0

Lucene 9.9.0 release

Full Changelog: releases/lucene/9.8.0...releases/lucene/9.9.0

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features

Improvements

Optimizations

Improvements

Bug fixes

System requirements

API changes

New Features

Improvements

Security Fixes

New Features

Improvements

Optimizations

Bug Fixes

New features

Improvements

Optimizations

New Features

Optimizations

Releases: apache/lucene

10.1.0

New Features

Improvements

Optimizations

9.12.1

Improvements

Bug fixes

10.0.0

System requirements

API changes

New Features

Improvements

9.12.0

Security Fixes

New Features

Improvements

Optimizations

9.11.1

Bug Fixes

9.11.0

New features

Improvements

Optimizations

9.10.0

New Features

Optimizations

9.9.2

9.9.1

9.9.0