-
Notifications
You must be signed in to change notification settings - Fork 3
/
CHANGES
500 lines (364 loc) · 15.2 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
big Changelog
=============
Here you can see the full list of changes between each big release.
Version 0.9.1
-------------
Released on ...
Changed
- TODO
Version 0.9.0
-------------
Released on Jul 11th 2019
Changed
- `BedEntry.unpack` logic has changed significantly. It now throws on missing fields. The exception can be
used to determine the offending field and adjust the BED format correspondingly.
- `BedEntry.unpack` now controls the extra fields parsing with a flag instead of a number. The old signature
remains available for compatibility sake, but is deprecated.
New
- `BedEntryUnpackException` is thrown when `BedEntry.unpack` fails to parse an entry. The exception properties
provide the reason and the number of the field which caused parsing to fail.
Fixed
- `ExtendedBedEntry.score` is now Int instead of Short. The reason for this is that many
BED file providers (e.g. MACS2, SICER) don't respect the UCSC standard which limits the score
to 0..1000 range, and we want to be able to parse those files.
Version 0.8.4
-------------
Released on Feb 12th 2019
New
- `ExtendedBedEntry.getField` is a public method that returns the `i`th field of the entry
as an instance of a correct type, whatever that might be, with no need for conversion
to and from `String`.
Version 0.8.3
-------------
Released on Oct 24th 2018
New
- `ExtendedBedEntry.rest` is a public method to return the list of fields except for
the first three obligatory BED columns. Is used during `ExtendedBedEntry.pack`.
It could be convenient when we want to iterate through optional fields. For simplicity
and to avoid code duplication the method returns a string representation of extended
bed entry fields (the same values as joined by `pack` method.) instead of original values.
Such behavior is consistent with values which user will see in a BED/BigBed file.
Version 0.8.2
-------------
Released on June 26th 2018
Fixed
- `BigWig.read` & `BigBed.read` should close factory in case of exception
while creating big file
- Default buffer size for `EndianSynchronizedBufferFactory` and `EndianBufferFactory` fixed
to `BetterSeekableBufferedStream.DEFAULT_BUFFER_SIZE`
Version 0.8.1
-------------
Released on June 20th 2018
New
- `BigFile.source` public property added
Changed
- Report stream source in exception messages from `BetterSeekableBufferedStream`
Version 0.8.0
-------------
Released on April 29th 2018
New
- HTTP/HTTPS urls support for BigWig, BigBed, TDF files. Server is
supposed to support accept-ranges http header
- New buffer factories for files/urls reading based on **hts-jdk**
`SeekableStream`:
* Without concurrent BigFile access support: `EndianBufferFactory`
* With concurrent BigFile access support: `EndianSynchronizedBufferFactory`,
`EndianThreadSafeBufferFactory`
- Big files `read` and `write` methods accepts `cancelledChecker` closure to
abort current operation, e.g. if you need to render another range.
Just throw exception in this closure.
- 'BigFile#BigFile.determineFileType(src)' returns src file type, could be
`BigFile.Type.BIGBED`, `BigFile.Type.BIGWIG` or `null`.
Changed
- Show 'Header extensions are unsupported' only in debug log
- Use `RomBufferFactory` to detect file endianness
- Changed `prefetch` option to int value in `BigFile.read()`,
`BigWigFile.read()`, `BigBedFile.read()`. Supported values are:
* `BigFile.PREFETCH_LEVEL_OFF` : do not prefetch
* `BigFile.PREFETCH_LEVEL_FAST` : prefetch zoom level indexes tree,
chromosomes list
* `BigFile.PREFETCH_LEVEL_DETAILED` : prefetch w/o zoom level data tree
indexes up to chromosomes level
Larger prefetch values increase big file opening time and memory
consumption but provides faster data access with fewer i/o operations
Removed:
- `RandomAccessFile` based factory and rom buffer
removed. Please use `EndianBufferFactory`, `EndianThreadSafeBufferFactory`
or `EndianSynchronizedBufferFactory` factories instead.
Version 0.7.1
-------------
Released on March 31th 2018
Changed
- Use maxHeapSize = "1024m" option for tests. Seems by default on
windows heap size in about 256m and it isn't enough for tests
- `RAFBufferFactory`, `RAFBuffer`, `RomBufferFactory` moved to
`org.jetbrains.bio` package
- `RAFBufferFactory` supports buffer size option for underlying
random access file
- By default `RAFBufferFactory` uses default Random Access File
buffers size (8092 bytes) instead of 125 kb.
Version 0.7.0
-------------
Released on March 30th 2018
Fixed
- RandomAccessFile (https://www.unidata.ucar.edu) copied to project and
rewritten in Kotlin so as not to depend on heavy netcdf-java library.
For simplicity writing operations were removed from file.
- Windows support #36, using RandomAccessFile
(https://www.unidata.ucar.edu/software/thredds/v4.3/netcdf-java project)
instead of memory mapped buffer. The previous impl (works only on
Mac/Linux) is available using `MMBRomBufferFactory` in `Big*.read()`.
Improved
- Show less warnings like "R+ tree leaves are overlapping", see issue #35
Changed
- BedFile was made closeable
- Default `name` field value in `ExtendedBedEntry` was changed to ".".
also while parsing empty name value is converted to ".".
- Bed entry pack/unpack methods now supports custom delimiter and
- `ExtendedBedEntry.unpack()` method supports `omitEmptyStrings` option,
which treats several consecutive separators as one, e.g. if TAB
delimiter was converted by error to sequence of whitespace characters.
- Score range [0..1000] check disable because MACS2 output may contain
scores > 1000
- `ExtendedBedEntry.unpack()` parses '.' values as fields default values
Version 0.6.0
-------------
Released on December 13th 2017
Fixed
- Issue #33: Encode *.bigBed files: NumberFormatException: For input
string: "5.30862042004096
Changed
- `org.jetbrains.bio.big.BedEntry` now is minimal bed entry impl and
represents same info as records in bed file. Obligatory fields are
chromosome, start, end, all other fields stored as '\t' separated string.
`BedEntry.unpack()` allow to parse as BED+ format to `ExtendedBedEntry`
which contain all BED12 fields plus optional extended fields.
'ExtendedBedEntry.pack()' allows to pack back to minimal
representation.
- Switched to Kotlin 1.2
- Source compatibility updated to Java 8
Removed:
- In class `org.jetbrains.bio.big.BedEntry` removed all fields except
'chrom', 'start', 'end', 'rest'
Version 0.5.3
-------------
Released on March 13th 2018
Improved
- Show less warnings like "R+ tree leaves are overlapping", see issue #35
Version 0.5.2
-------------
Released on November 21th 2017
Fixed
- "java.lang.IllegalArgumentException: n must be >0", see issue #34
Version 0.5.1
-------------
Released on November 10th 2017
Changed
- "AutoSQL queries are unsupported" warning turn off, see issue #7
Version 0.5.0
-------------
Released on November 2nd 2017
Changed
- Parse 'thickStart', 'thickEnd', 'itemRgb' bed columns make them accessible
using corresponding `BedEntry` fields
- Select number of bed columns when writing big bed file, the number is supposed
to be in 3..12 range.
Version 0.4.2
-------------
Bugfix release, released on September 13th 2017
Fixed:
- Memory leak in compressed (method: DEFLATE) files fixed.
Changed:
- Internal caching added for big bed files. Now both big wig and big bed
files will work faster if subsequent queries belongs to same block.
You can see cache misses info in logger TRACE mode after file has been
closed. Cached decompressed block will be different for each thread
(impl based on ThreadLocal storage)
Version 0.4.1
-------------
Bugfix release, released on September 8th 2017
Fixed
- Memory leak in BigFile. `BigFile.close()` should close internal
MMapBuffer
Version 0.4.0
-------------
Bugfix release, released on September 4th 2017
Changed
- Switch to Kotlin 1.1.x
- Concurrent access to big bed/wig/tdf files improved.
- Big bed serialization fixed.
Fixed
- Issue #30: IllegalStateException: impossible "browser" section type
in *.wig file.
- Issue #29: IllegalArgumentException in BedEntry.<init>.
- Issue #28: Support big files larger than 2G (issue #28) using
`com.indeed:util-mmap` (https://github.com/indeedeng/util/tree/master/mmap)
memory mapping library
Removed
- BigFile/TdfFile.duplicate() call not required any more for concurrent access.
Method removed.
Version 0.3.6
-------------
Released on August 23th 2017
- Added 'RomBuffer' caching for internal querying: Now if
'BigWig.summarizeInternal' of nearby regions are called one by one,
'RTreeIndexLeaf' which contains these regions wouldn't be
decompressed multiple times, it will be taken from cache.
- Added loop stop after section match in 'BigWig.queryInternal': It's
not necessary to go through all 'RTreeIndexLeaf' regions if we
already found ones which are needed.
Version 0.3.5
-------------
Bugfix release, released on March 28th 2017
- Fixed summary semantics: Now for each intersected interval summary ‘sum’
value is increased in "intersected nucleotides number" * "interval value"
instead of "intersection %" * "interval value".
Idea is that interval summary = "nucleotide number" * "interval value"
thus intersected sum should be increased in "intersection %" * "interval summary"
i.e. "intersected nucleotides number" * "interval value".
Old semantics was incorrect, difference was visible at 1-bp resolution.
- More informative error message if actual magic doesn’t match
expected guess
Version 0.3.4
-------------
Bugfix release, released on April 28th 2016
- Added a fast-path to 'VariableStepSection.set'.
- Sped up 'WigFile' and reduced allocation rate.
- Fixed unbounded looping in 'BigWigFile.write' and 'BigBedFile.write'
in case the data contains items with unknown chromosomes.
- Changed 'WigSection.size' to be a property.
Version 0.3.3
-------------
Bugfix release, released on April 14th 2016
- Fixed 'BigWigFile.write' and 'BigBedFile.write' in case the data
contains items with unknown chromosomes.
Version 0.3.2
-------------
Bugfix release, released on April 6th, 2016
- Renamed 'WigParser' to 'WigFile' which is initialized with a file to
prevent possible use-after-exhausted bugs.
- Allowed gzipped and zipped input in 'WigFile' and 'BedFile'.
Version 0.3.1
-------------
Bugfix release, released on April 5th, 2016
- Fixed 'BigWigFile.write' in the case of no-data chromosomes.
- Fixed offset tracking in 'BigWigSummary' and 'BigBedSummary' which
did not support multiple chromosomes.
Version 0.3.0
-------------
Released on February 17th, 2016
- Switched to mmapped files for reading 'Big*' and 'Tdf*' data.
- Introduced a custom Big format version using Snappy instead of
DEFLATE for data block compression.
- Added 'BigFile#duplicate' and 'TdfFile#duplicate' for creating an
independent view of the file data for parallel access.
- Fixed a bug in 'BedGraphTrack#span' which was always zero, see issue
#25 on GitHub for details.
- B+ tree keys should be byte arrays and *not* NUL-terminated strings.
See 'bPlusTree.c' in UCSC sources for complete spec.
- Changed 'BPlusTree#write' to use a smaller block size if a B+ tree
fits in a single block.
- Changed 'BigWigFile.write' and 'BigBedFile.write' *not* to load all
of the data in memory at once.
Version 0.2.5
-------------
Technical release, released on February 3rd 2016
- Migrated to Kotlin 1.0.0-rc.
Version 0.2.4
-------------
Technical release, released on February 1st 2016
Version 0.2.3
-------------
Released on December 10th 2015
- Cleaned up 'Tdf*' API and made it more consistent with the rest of
the codebase.
- Serializing 'TdfFile' access is now a responsibility of the caller.
See issue #22 on GitHub.
Version 0.2.2
-------------
Released on November 19th 2015
- Added initial support for TDF format, see 'TDFReader'. The API is
unstable at the moment, it might change in future releases.
Version 0.2.1
-------------
Bugfix release, released on November 2nd 2015
- Allowed creating empty 'BigWigFile' and 'BigBedFile'.
Version 0.2.0
--------------
Technical release, released on October 22th 2015
- Migrated to Kotlin 1.0.0-beta.
- Added reasonable defaults to 'BigFile#summarize'.
Version 0.1.9
-------------
Bugfix release, released on September 16th 2015
- Fixed a bug in 'WigSection#query', which threw if the callee was empty.
- Migrated to Kotlin M13.
Version 0.1.8
-------------
Released on September 11th 2015
- Added support for reading and writing BedGraph sections to 'BigWigFile'.
Version 0.1.7
-------------
Bugfix release, released on August 24th 2015
- Fixed 'BigFile.Post.Zoom' which produced degenerate R+ trees with
all leaves covering the same interval.
- Updated 'WigParser' to handle NaN and infinities.
Version 0.1.6
-------------
Released on August 24th 2015
- Fixed WIG section re-alignment in 'BigWigFile#queryInternal': queries
for overlapping intervals were incorrectly shifted to the right.
- Changed 'BigFile.Post.Zoom' to start with initial reduction estimated
from mean interval size in the data.
- Changed 'Big*.write' to accept chromosome names and sizes explicitly.
Version 0.1.5
-------------
Released on August 20th 2015
- Enabled buffering in 'SeekableDataOutput'.
- Fixed a bug in 'RTreeIndex.write' which sometimes produced invalid
offsets for internal R+ tree nodes.
- 'BigWigFile#queryInternal' now correctly filters variable step sections
in the case when a section overlaps the query interval.
- 'BigFile#query' now allows to query for items overlapping or completely
contained within the query.
- Changed 'BigFile.Post.Zoom' to aggregate summaries into slots prior
to building R+ tree index.
Version 0.1.4
-------------
Bugfix release, released on August 17th 2015
- Added total summary block which is required for version>=4
- Changed 'Big*.write' to skip empty chromosomes when pre-computing
zoom levels.
- Fixed interval alignment for partial 'BigWigFile' queries. In the
case when WIG section was overlapping the query the section was
truncated to the query start instead of the first section interval
contained in the query. The bug did not affect variable step sections.
Version 0.1.3
-------------
Released on August 12th 2015
- Enabled data section compression by default for both BigWIG and BigBED.
- Extended the API to allow non-standard byte order to be used when
creating BigWIG and BigBED files.
- Changed 'Big*.write' to pre-compute zoom levels. The current implementation
is far from being optimal, but it is, however, easy to comprehend.
- Implemented summarisation for BigWIG files.
- Changed 'BigWigFile.write' to splice WIG sections larger than
'Short.MAX_VALUE'.
- Fixed 'BigWigFile#query' which sometimes returned values outside
of a given query interval.
Version 0.1.2
-------------
Bugfix release, released on August 6th 2015
- Changed 'RTreeIndex' not to enforce the leaves to only contain items
from a single chromosome.
- Fixed 'RTreeIndex.write' for the case of a few items to be indexed.
- Fixed 'FixedStepSection#query' for the case of overlapping intervals.
Version 0.1.1
-------------
Released on July 29th 2015
- Added basic support for summarising BigBED data. The implementation
is only capable of using pre-computed zoom levels, i.e. 'BigBedFile.write'
does not build zoomed indices.
Version 0.1.0
-------------
Initial release, released on July 24th 2015