Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

universal in-place decompression interface #132

Open
Moelf opened this issue Feb 26, 2023 · 9 comments
Open

universal in-place decompression interface #132

Moelf opened this issue Feb 26, 2023 · 9 comments

Comments

@Moelf
Copy link

Moelf commented Feb 26, 2023

by "in-place" I mean the user would pre-allocate a Vector{UInt8} or something as the sink

sometimes decompression is needed for low-level stuff, such as handling "buffers" in a file spec, and multiple decompression together assemble the entire data blob.

it would be nice if there's a in-place interface that support multiple algorithms through this central package.

@Moelf Moelf changed the title universal in-place interface universal in-place decompression interface Feb 26, 2023
@svilupp
Copy link

svilupp commented Mar 9, 2023

I've drafted a PR around Buffers to support faster decompression when loading Arrow files.

Would that support your use case as well? Or would you need the sink to be ByteData (instead of buffer)?

mkitti added a commit that referenced this issue Apr 9, 2023
…transcode (#136)

* allow users to optionally provide an output buffer when calling transcode

* add tests and refactor docstring

* nicer formating

Co-authored-by: Mark Kittisopikul <[email protected]>

* new copydata! method

* reassociate docstring with correct method

* Expand ByteData

Co-authored-by: Mark Kittisopikul <[email protected]>

* Expand ByteData

Co-authored-by: Mark Kittisopikul <[email protected]>

* Expand ByteData

Co-authored-by: Mark Kittisopikul <[email protected]>

* clarify formatting

* formatting

* Generic args

Co-authored-by: Joao Aparicio <[email protected]>

* Generic args

Co-authored-by: Joao Aparicio <[email protected]>

* small fixes

* simplify

* fix buffer

* fix

* formatting

* fix buffer

* Address PR comment

#136 (comment)

* Address PR comment

#136 (comment)

* Address PR review

#136 (comment)
#136 (comment)

* dont use isnothing

* add doc-string

* replace keepbytes with marginpos

* fix typeo

---------

Co-authored-by: Mark Kittisopikul <[email protected]>
Co-authored-by: Joao Aparicio <[email protected]>
@mkitti
Copy link
Member

mkitti commented Apr 11, 2023

Closed by #132 and released as 0.9.12

@mkitti mkitti closed this as completed Apr 11, 2023
bauglir added a commit to bauglir/TranscodingStreams.jl that referenced this issue Apr 14, 2023
The examples in many of the codec packages rely on being able to call
`transcode` providing a `Codec` type and a string[^1][^2]. JuliaIO#132 removed
type annotations from the trailing arguments for `transcode(::Type{C},
...) where {C<:Codec}` causing a method ambiguity with `transcode(T,
src::String)` in Julia `Base`[^3]. This adds an additional method to
`Base.transcode` to resolve this ambiguity.

Fixes JuliaIO#139.

[^1]: https://github.com/JuliaIO/CodecZlib.jl/tree/f9fddaa28c093c590a7a93358709df2945306bc7#usage
[^2]: https://github.com/JuliaIO/CodecZstd.jl/tree/6327ffa9a3a12fc46d465dcfc8b30bed91cf284b#usage
[^3]: https://github.com/JuliaLang/julia/blob/ff7b8eb00bf887f20bf57fb7e53be0070a242c07/base/c.jl#L306
@nhz2
Copy link
Member

nhz2 commented Jul 5, 2024

I'm reopening this issue because I think there are some issues to work on with the current interface.

  1. The Buffer type used in #132 allow users to optionally provide an output buffer when calling transcode #136 is internal. See API for Buffer #202
  2. For use in Zarr.jl it would be helpful to be able to decompress directly into for example a Vector{Float64} to avoid an extra copy.
  3. The underlying codec should be informed somehow that it is doing a fully in-place operation, so it can internally avoid extra buffering and copies. Ref: Implement ZstdFrameCompressor via endOp CodecZstd.jl#52

@nhz2 nhz2 reopened this Jul 5, 2024
@Moelf
Copy link
Author

Moelf commented Jul 5, 2024

somehow I miseed @mkitti 's original comment since the "closed by" refer to this very issue, what was the PR that supposedly fixed this?

@Moelf
Copy link
Author

Moelf commented Jul 5, 2024

For use in Zarr.jl it would be helpful to be able to decompress directly into for example a Vector{Float64} to avoid an extra copy.

this can't work directly, the two possibilities are you have a buffer = reinterpret(UInt8, ...) and you give this buffer to TranscodingStreams.

Or, you have data = reinterpret(Float64, buffer) and give the buffer to TranscodingStreams

@nhz2
Copy link
Member

nhz2 commented Jul 5, 2024

Yes, I think this would require a new unsafe_transcode! function that works directly with pointers.

@nhz2
Copy link
Member

nhz2 commented Jul 5, 2024

Also, a general unsafe_transcode! interface could be useful for other packages that don't support or need a streaming API like Blosc.jl, LibDeflate.jl, JLD2.jl, HDF5.jl, Zarr.jl... so maybe it should go in a separate LosslessChunkCompressors.jl package, and be added as a dependency here.

@nhz2
Copy link
Member

nhz2 commented Dec 10, 2024

@Moelf @mkitti I have a draft interface for in-place encoding and decoding defined in https://github.com/nhz2/ChunkCodecs.jl/blob/main/ChunkCodecCore/src/interface.jl

The interface currently doesn't directly use pointers, which is nice for avoiding GC issues, but sometimes things don't work as expected, for example decoding a view of a PyArray: JuliaPy/PythonCall.jl#579

@mkitti
Copy link
Member

mkitti commented Dec 10, 2024

That's interesting. I will try to take a closer look next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants