Skip to content

Releases: apache/lucene

10.1.0

20 Dec 20:32
Compare
Choose a tag to compare

New Features

  • Add IndexInput::isLoaded to determine if the contents of an  input is resident in physical memory
  • FeatureField now supports storing term vectors.

Improvements

  • TieredMergePolicy now allows merging up to maxMergeAtOnce segments for merges below the floor segment size, even if maxMergeAtOnce is greater than segmentsPerTier. This makes it more efficient to configure TieredMergePolicy to merge segments aggressively by configuring a high value of floorSegmentSize (e.g. 64MB), a low value of segmentsPerTier (e.g. 4) and a high value of maxMergeAtOnce (e.g. 32).

Optimizations

  • Many speedups to top-k query evaluation, in particular: top-level disjunctions, filtered disjunctions, conjunctions, DisjunctionMaxQuery.
  • Speedup to exhaustive evaluation of conjunctive queries by vectorizing the intersection of postings lists.
  • Reduced contention for top-k query evaluation when IndexSearcher is configured with an executor.

9.12.1

13 Dec 11:23
Compare
Choose a tag to compare

Improvements

  • Allow easier configuration of the Panama vectorization provider with newer Java versions. Set the org.apache.lucene.vectorization.upperJavaFeatureVersion system property to increase the set of Java versions that Panama vectorization will provide optimized implementations for.

Bug fixes

  • Fixed backwards compatibility bug that caused sparse (not all documents have a vector) KNN indices written with 9.0.0 to give silently (no exception) terrible recall results when searched by any 9.x release
  • Improve Tessellatorlogic when two holes share the same vertex with the polygon which was failing in valid polygons.
  • Fix backwards compatibility bug that caused 9.12.0 to incorrectly throw IllegalStateException when trying to open an IndexReader on an index created with quantized (int4, int7, int8) KNN vectors using Lucene99HnswScalarQuantizedVectorsFormat.

10.0.0

14 Oct 13:02
Compare
Choose a tag to compare

System requirements

  • Lucene 10.0 requires JDK 21 or newer

API changes

  • KNN vector values now have a random-access API.
  • Deprecated APIs have been removed and a number of API changes have been made. Please consult the migrate guide for an extensive list and actions to take to migrate to 10.0.

New Features

  • A new IndexInput#prefetch API has been added, allowing query evaluation logic to let the Directory know about regions of data that are about to be read. This helps perform I/O concurrently under the hood. MMapDirectory implements this API using the madvise system call and the MADV_WILLNEED flag on Linux and Mac OS.
  • Lucene now supports sparse indexing on doc values via FieldType#setDocValuesSkipIndexType. The sparse index will record the minimum and maximum values per block of doc IDs. Used in conjunction with index sorting to cluster similar documents together, this allows for very space-efficient and CPU-efficient filtering.
  • Search concurrency is now decoupled from the index geometry, so that an index can be searched using any number of threads, regardless of its number of segments.
  • Kmeans clustering on vectors

Improvements

  • Lucene now opens files with the MADV_RANDOM advice by default on Linux and Mac OS. This results in better efficiency for indexes that exceed the size of the page cache, but can make it slower to load indexes in the page cache. It is possible to revert to the MADV_NORMAL read advice by default by passing -Dorg.apache.lucene.store.defaultReadAdvice=NORMAL as a JVM startup flag.
  • Snowball dictionaries have been upgraded, resulting in improved tokenization. This may require reindexing to ensure consistency of search results with pre-10.0 indexes.
  • The expressions module is now using MethodHandles and Dynamic Class-File Constants (JEP 309) in combination with hidden classes (JEP 371) to implement a strict and type-safe call to external functions. This allows to easier extend expressions with custom functions in secure way because runtime linking of custom functions is no longer the responsibility of the expressions scripting engine. In addition, the hidden classes created by the expressions engine no longer suffer from global classloader locks.

... plus a multitude of helpful bug fixes!

9.12.0

28 Sep 20:19
Compare
Choose a tag to compare

Security Fixes

  • Deserialization of Untrusted Data vulnerability in Apache Lucene Replicator - CVE-2024-45772

New Features

  • Improve intra-merge parallelism for many value types. (Ben Trent)
  • Add support JDK 23 to the Panama Vectorization Provider. (Chris Hegarty)

Improvements

  • Add Intervals.regexp and Intervals.range methods to produce IntervalsSource for regexp and range queries. (Mayya Sharipova)
  • Remove support for writing 8 bit scalar vector quantization. 4 and 7 bit quantization are still supported (Michael McCandless )

Optimizations

  • Inline postings skip data to improve performance of queries that need skipping such as conjunctions. (Adrien Grand)
  • Optimizations to the decoding logic of blocks of postings. (Adrien Grand, Uwe Schindler, Greg Miller)
  • Avoid performance degradation with closing shared mapped segment data (Chris Hegarty, Michael Gibney, Uwe Schindler)

... plus a multitude of helpful bug fixes!

9.11.1

27 Jun 13:46
Compare
Choose a tag to compare

Bug Fixes

  • Fix performance regression in NumericComparator.
  • Remove intra-merge parallelism for everything except HNSW graph merges.
  • Fix bug that prevented adding a parent field to an index with no fields.
  • Fix IndexOutOfBoundsException thrown in DefaultPassageFormatter by unordered matches.
  • StringValueFacetCounts stops throwing NPE when faceting over an empty match-set.

9.11.0

06 Jun 14:29
Compare
Choose a tag to compare

New features

  • Add support for posix_madvise to MMapDirectory: If running on Linux/macOS and Java 21 or later, MMapDirectory uses IOContext to pass suitable MADV flags to kernel of operating system. This may improve paging logic especially when working with large indexes under memory pressure.
  • Expand support for new scalar bit levels for HNSW vectors. This includes 4-bit vectors and an option to compress them to gain a 50% reduction in memory usage.
  • Recursive graph bisection is now supported on indexes that have blocks

Improvements

  • MergeScheduler can now provide an executor for intra-merge parallelism. The first implementation is the ConcurrentMergeScheduler.
  • Upgrade icu4j to version 74.2.

Optimizations

  • Use RWLock to access LRUQueryCache to reduce contention.
  • Speedup multi-segment HNSW graph search for diversifying child kNN queries.
  • Add a MemorySegment Vector scorer - for scoring without copying on-heap. This can improve search latency by almost 2x for byte vectors.
  • Switch to using optimized, primitive collections where possible to improve performance and heap utilization.

Full Changelog: releases/lucene/9.10.0...releases/lucene/9.11.0

9.10.0

20 Feb 17:21
695c0ac
Compare
Choose a tag to compare

New Features

  • Support for similarity-based vector searches, ie. finding all nearest neighbors whose similarity is greater than a configured threshold from a query vector. See [Byte|Float]VectorSimilarityQuery.
  • Index sorting is now compatible with block joins. See IndexWriterConfig#setParentField.
  • MMapDirectory now takes advantage of the now finalized JDK foreign memory API internally when running on Java 22 (or later). This was only supported with Java 19 to 21 until now.
  • SIMD vectorization now takes advantage of JDK vector incubator on Java 22. This was only supported with Java 20 or 21 until now.

Optimizations

  • Tail postings are now encoded using group-varint. This yielded speedups on queries that match lots of terms that have short postings lists in Lucene's nightly benchmarks.
  • Range queries on points now exit earlier when evaluating a segment that has no matches. This will improve performance when intersected with other queries that have a high up-front cost such as multi-term queries.
  • BooleanQueries that mix SHOULD and FILTER clauses now propagate minimum competitive scores to the SHOULD clauses, yielding significant speedups for top-k queries sorted by descending score.
  • IndexSearcher#count has been optimized on pure disjunctions of two term queries.

9.9.2

29 Jan 15:33
Compare
Choose a tag to compare

Lucene 9.9.2 release

9.9.1

16 Dec 23:00
Compare
Choose a tag to compare

Lucene 9.9.1 release

9.9.0

04 Dec 14:42
Compare
Choose a tag to compare

Lucene 9.9.0 release

Full Changelog: releases/lucene/9.8.0...releases/lucene/9.9.0