Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO operations for ShortByteString #547

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

oberblastmeister
Copy link

Addresses #540. The functions mostly make use of unsafePlainToShortIO which performs a zero cost conversion between a PlainForeignPtr and ShortByteString, using some unsafe stuff that I hope is correct. For things like hPut, we make sure to pin the ByteArray before giving it to hPutBuf. Hopefully my keepAlive# usage is correct, which I copied from withForeignPtr. These operations should be useful for implementing utf8 IO in text, solving the readFile problem with good performance.

@oberblastmeister
Copy link
Author

One optimization is that we don't need to pin the ByteArray when it is small enough to fit in the buffer. But this would require rewriting hGetBuf.

Copy link
Member

@clyring clyring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's arguably allowed, ShortByteStrings backed by explicitly pinned ByteArray#s can create a nasty surprise for anyone who tries to put them into a compact region. So I lean against this implementation approach for the various read operations.

unsafeHPutOff handle sbs off len = withPtr sbs $ \p -> IO.hPutBuf handle (p `plusPtr` off) len

hPut :: IO.Handle -> ShortByteString -> IO ()
hPut h sbs = unsafeHPutOff h sbs 0 (length sbs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to re-invent the wheel here. fromShort already basically does what the pointer-related conversions above are meant to.

Suggested change
hPut h sbs = unsafeHPutOff h sbs 0 (length sbs)
hPut h sbs = BS.hPut h (fromShort sbs)

To keep from allocating the intermediate ByteString, we can just move the pinnedness check from fromShort into fromShortIO and then use the latter.

Suggested change
hPut h sbs = unsafeHPutOff h sbs 0 (length sbs)
hPut h sbs = fromShortIO sbs >>= BS.hPut h

@oberblastmeister
Copy link
Author

Although it's arguably allowed, ShortByteStrings backed by explicitly pinned ByteArray#s can create a nasty surprise for anyone who tries to put them into a compact region. So I lean against this implementation approach for the various read operations.

I don't see any other implementation approach that is efficient. If people want to unpin it, they can do it themselves. We can make sure to document that the functions will return pinned ShortByteStrings, and we can add versions that do not return pinned ShortByteStrings if necessary.

I don't think that the use case of compact regions warrants sacrificing the performance for the most common use case.

@clyring
Copy link
Member

clyring commented Oct 4, 2022

Well...

  • Explicit pinning is typically not required for buffers larger than around 3-4K, and an extra copy of a small buffer while performing an IO operation is probably not a big deal. (...though I wouldn't be surprised if even an extra copy of a large buffer while performing an IO operation is not a very big deal in most cases.)
  • For the case of BS.hGetContents we usually expect to concatenate several (pinned) buffers to produce the final result anyway; it seems likely that the right thing to do for SBS.hGetContents is to concatenate into an not-explicitly-pinned buffer instead of concatenating into a pinned buffer and then copying the result.
  • Usability with compact regions is itself a reason a user might switch from ByteString to ShortByteString or Text.
  • Although this failure mode is likely uncommon, *** Exception: compaction failed: cannot compact pinned objects provides little help with finding either the compaction that failed or the source of the pinned object causing the failure.

Come to think of it, the whole "was this ByteArray# explicitly pinned" test in compaction is really a proxy for "are there any non-GC references from the objects being compacted into this ByteArray#," and our current shortcutting of fromShort when the underlying buffer is pinned (but not explicitly pinned) can probably cause compacting a ByteString to lead to segfaults when it should "only" cause a CompactionFailed exception... I will probably try to concoct a reproducer for such mischief and ask about it on the GHC issue tracker soon. (EDIT: This is now GHC issue 22255.)

@oberblastmeister
Copy link
Author

oberblastmeister commented Oct 4, 2022

when the underlying buffer is pinned (but not explicitly pinned)

What do you mean by this?
(Edit: explained by the ghc issue)

@Bodigrim
Copy link
Contributor

Bodigrim commented Apr 3, 2023

@clyring @oberblastmeister is there any way forward with this PR? It would be great to get something like this for the sake of haskell/text#503.

@clyring
Copy link
Member

clyring commented Apr 4, 2023

I'm not sure. There is definitely some improvement possible without breaking anything, as mentioned in my last comment:

For the case of BS.hGetContents we usually expect to concatenate several (pinned) buffers to produce the final result anyway; it seems likely that the right thing to do for SBS.hGetContents is to concatenate into an not-explicitly-pinned buffer instead of concatenating into a pinned buffer and then copying the result.

But I think we need more progress on GHC's end if we want to avoid copying elsewhere without introducing an obscure footgun about compaction. (And unfortunately, getting GHC to that point is a lot of work, and probably needs multiple proposals. I don't think it's happening particularly soon.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants