Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add peekCString, withCString for converting NUL terminated C strings #254

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 31 additions & 5 deletions src/Data/Text/Foreign.hs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ module Data.Text.Foreign
, useAsPtr
, asForeignPtr
-- ** Encoding as UTF-8
, peekCString
, peekCStringLen
, withCString
, withCStringLen
-- * Unsafe conversion code
, lengthWord16
Expand All @@ -39,12 +41,13 @@ import Control.Monad.ST.Unsafe (unsafeIOToST)
#else
import Control.Monad.ST (unsafeIOToST)
#endif
import Data.ByteString.Unsafe (unsafePackCStringLen, unsafeUseAsCStringLen)
import Data.ByteString (useAsCString)
import Data.ByteString.Unsafe (unsafePackCString, unsafePackCStringLen, unsafeUseAsCStringLen)
import Data.Text.Encoding (decodeUtf8, encodeUtf8)
import Data.Text.Internal (Text(..), empty)
import Data.Text.Unsafe (lengthWord16)
import Data.Word (Word16)
import Foreign.C.String (CStringLen)
import Foreign.C.String (CString, CStringLen)
import Foreign.ForeignPtr (ForeignPtr, mallocForeignPtrArray, withForeignPtr)
import Foreign.Marshal.Alloc (allocaBytes)
import Foreign.Ptr (Ptr, castPtr, plusPtr)
Expand Down Expand Up @@ -153,6 +156,16 @@ asForeignPtr t@(Text _arr _off len) = do
withForeignPtr fp $ unsafeCopyToPtr t
return (fp, I16 len)

-- | /O(n)/ Decode a NUL terminated C string, which is assumed to have
-- been encoded as UTF-8. If decoding fails, a 'UnicodeException' is
-- thrown.
--
-- @since 1.2.5.0
peekCString :: CString -> IO Text
peekCString cs = do
bs <- unsafePackCString cs
return $! decodeUtf8 bs

-- | /O(n)/ Decode a C string with explicit length, which is assumed
-- to have been encoded as UTF-8. If decoding fails, a
-- 'UnicodeException' is thrown.
Expand All @@ -163,9 +176,22 @@ peekCStringLen cs = do
bs <- unsafePackCStringLen cs
return $! decodeUtf8 bs

-- | Marshal a 'Text' into a C string encoded as UTF-8 in temporary
-- storage, with explicit length information. The encoded string may
-- contain NUL bytes, and is not followed by a trailing NUL byte.
-- | /O(n)/ Marshal a 'Text' into a NUL terminated C string encoded as
-- UTF-8 in temporary storage. The 'Text' must not contain any NUL
-- characters.
--
-- The temporary storage is freed when the subcomputation terminates
-- (either normally or via an exception), so the pointer to the
-- temporary storage must /not/ be used after this function returns.
--
-- @since 1.2.5.0
withCString :: Text -> (CString -> IO a) -> IO a
withCString t act = useAsCString (encodeUtf8 t) act
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since text-2 this does redundant copying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since text-2 this does redundant copying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's redundant in both. encodeUtf8 creates a bytestring, which can be used unsafely, as nothing else refers to it.

In text-2.0 one could have unsafe withCString (or just withCStringLen, as there is no NUL at the end) variants though, which would just pass the Ptr to ByteArray contents.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scrap that. text ByteArray's are not (always) pinned, so sometimes copying is unavoidable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least one round of memcpy is unavoidable unless the byte array in Text has that final null.


-- | /O(n)/ Marshal a 'Text' into a C string encoded as UTF-8 in
-- temporary storage, with explicit length information. The encoded
-- string may contain NUL bytes, and is not followed by a trailing NUL
-- byte.
--
-- The temporary storage is freed when the subcomputation terminates
-- (either normally or via an exception), so the pointer to the
Expand Down