This project has retired. For details please refer to its Attic page.
Clownfish::String – C API Documentation
Apache Lucy™

Clownfish::String

parcel Clownfish
class variable CFISH_STRING
struct symbol cfish_String
class nickname cfish_Str
header file Clownfish/String.h

Name

Clownfish::String – Immutable string holding Unicode characters.

Functions

utf8_valid
bool
cfish_Str_utf8_valid(
    char *ptr,
    size_t len
);

Return true if the string is valid UTF-8, false otherwise.

validate_utf8
void
cfish_Str_validate_utf8(
    char *text,
    size_t size,
    char *file,
    int line,
    char *func
);

Throws an error if the string isn’t valid UTF-8.

is_whitespace
bool
cfish_Str_is_whitespace(
    int32_t code_point
);

Returns true if the code point qualifies as Unicode whitespace.

new_from_utf8
cfish_String* // incremented
cfish_Str_new_from_utf8(
    char *utf8,
    size_t size
);

Return a String which holds a copy of the supplied UTF-8 character data after checking for validity.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

new_from_trusted_utf8
cfish_String* // incremented
cfish_Str_new_from_trusted_utf8(
    char *utf8,
    size_t size
);

Return a String which holds a copy of the supplied UTF-8 character data, skipping validity checks.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

init_from_trusted_utf8
cfish_String*
cfish_Str_init_from_trusted_utf8(
    cfish_String *self,
    char *utf8,
    size_t size
);

Initialize a String which holds a copy of the supplied UTF-8 character data, skipping validity checks.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

new_steal_utf8
cfish_String* // incremented
cfish_Str_new_steal_utf8(
    char *utf8,
    size_t size
);

Return a String which assumes ownership of the supplied buffer containing UTF-8 character data after checking for validity.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

new_steal_trusted_utf8
cfish_String* // incremented
cfish_Str_new_steal_trusted_utf8(
    char *utf8,
    size_t size
);

Return a String which assumes ownership of the supplied buffer containing UTF-8 character data, skipping validity checks.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

init_steal_trusted_utf8
cfish_String*
cfish_Str_init_steal_trusted_utf8(
    cfish_String *self,
    char *utf8,
    size_t size
);

Initialize a String which assumes ownership of the supplied buffer containing UTF-8 character data, skipping validity checks.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

new_wrap_utf8
cfish_String* // incremented
cfish_Str_new_wrap_utf8(
    char *utf8,
    size_t size
);

Return a String which wraps an external buffer containing UTF-8 character data after checking for validity. The buffer must stay unchanged for the lifetime of the String.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

new_wrap_trusted_utf8
cfish_String* // incremented
cfish_Str_new_wrap_trusted_utf8(
    char *utf8,
    size_t size
);

Return a String which wraps an external buffer containing UTF-8 character data, skipping validity checks. The buffer must stay unchanged for the lifetime of the String.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

init_wrap_trusted_utf8
cfish_String*
cfish_Str_init_wrap_trusted_utf8(
    cfish_String *self,
    char *utf8,
    size_t size
);

Initialize a String which wraps an external buffer containing UTF-8 character data after checking for validity.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

new_from_char
cfish_String* // incremented
cfish_Str_new_from_char(
    int32_t code_point
);

Return a String which holds a single character.

code_point

Unicode code point of the character.

newf
cfish_String* // incremented
cfish_Str_newf(
    char *pattern
);

Return a String with content expanded from a pattern and arguments conforming to the spec defined by VCatF().

Note: a user-supplied pattern string is a security hole and must not be allowed.

pattern

A format string.

Methods

Cat
cfish_String* // incremented
cfish_Str_Cat(
    cfish_String *self,
    cfish_String *other
);

Return the concatenation of the String and other.

Cat_Utf8
cfish_String* // incremented
cfish_Str_Cat_Utf8(
    cfish_String *self,
    char *utf8,
    size_t size
);

Return the concatenation of the String and the supplied UTF-8 character data after checking for validity.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

Cat_Trusted_Utf8
cfish_String* // incremented
cfish_Str_Cat_Trusted_Utf8(
    cfish_String *self,
    char *utf8,
    size_t size
);

Return the concatenation of the String and the supplied UTF-8 character data, skipping validity checks.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

To_I64
int64_t
cfish_Str_To_I64(
    cfish_String *self
);

Extract a 64-bit integer from a decimal string. See BaseX_To_I64() for details.

BaseX_To_I64
int64_t
cfish_Str_BaseX_To_I64(
    cfish_String *self,
    uint32_t base
);

Extract a 64-bit integer from a variable-base stringified version. Expects an optional minus sign followed by base-x digits, stopping at any non-digit character. Returns zero if no digits are found. If the value exceeds the range of an int64_t, the result is undefined.

base

A base between 2 and 36.

To_F64
double
cfish_Str_To_F64(
    cfish_String *self
);

Convert a string to a floating-point number using the C library function strtod.

Starts_With
bool
cfish_Str_Starts_With(
    cfish_String *self,
    cfish_String *prefix
);

Test whether the String starts with prefix.

Starts_With_Utf8
bool
cfish_Str_Starts_With_Utf8(
    cfish_String *self,
    char *utf8,
    size_t size
);

Test whether the String starts with a prefix supplied as raw UTF-8.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

Ends_With
bool
cfish_Str_Ends_With(
    cfish_String *self,
    cfish_String *suffix
);

Test whether the String ends with suffix.

Ends_With_Utf8
bool
cfish_Str_Ends_With_Utf8(
    cfish_String *self,
    char *utf8,
    size_t size
);

Test whether the String ends with a suffix supplied as raw UTF-8.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

Contains
bool
cfish_Str_Contains(
    cfish_String *self,
    cfish_String *substring
);

Test whether the String contains substring.

Contains_Utf8
bool
cfish_Str_Contains_Utf8(
    cfish_String *self,
    char *utf8,
    size_t size
);

Test whether the String contains a substring supplied as raw UTF-8.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

Find
cfish_StringIterator* // incremented
cfish_Str_Find(
    cfish_String *self,
    cfish_String *substring
);

Return a StringIterator pointing to the first occurrence of substring within the String, or NULL if the substring does not match.

Find_Utf8
cfish_StringIterator* // incremented
cfish_Str_Find_Utf8(
    cfish_String *self,
    char *utf8,
    size_t size
);

Return a StringIterator pointing to the first occurrence of the substring within the String, or NULL if the substring does not match. The substring is supplied as raw UTF-8.

utf8

Pointer to UTF-8 character data.

size

Size of UTF-8 character data in bytes.

Equals
bool
cfish_Str_Equals(
    cfish_String *self,
    cfish_Obj *other
);

Equality test.

Returns: true if other is a String with the same character data as self.

Equals_Utf8
bool
cfish_Str_Equals_Utf8(
    cfish_String *self,
    char *utf8,
    size_t size
);

Test whether the String matches the supplied UTF-8 character data.

Length
size_t
cfish_Str_Length(
    cfish_String *self
);

Return the number of Unicode code points the String contains.

Get_Size
size_t
cfish_Str_Get_Size(
    cfish_String *self
);

Return the number of bytes occupied by the String’s internal content.

Get_Ptr8
char*
cfish_Str_Get_Ptr8(
    cfish_String *self
);

Return the internal backing array for the String if its internal encoding is UTF-8. If it is not encoded as UTF-8 throw an exception. The character data is not null-terminated.

To_Utf8
char*
cfish_Str_To_Utf8(
    cfish_String *self
);

Return a NULL-terminated copy of the string data in UTF-8 encoding. The buffer must be freed by the caller.

To_ByteBuf
cfish_ByteBuf* // incremented
cfish_Str_To_ByteBuf(
    cfish_String *self
);

Return a ByteBuf which holds a copy of the String.

Clone
cfish_String* // incremented
cfish_Str_Clone(
    cfish_String *self
);

Return a clone of the object.

Compare_To
int32_t
cfish_Str_Compare_To(
    cfish_String *self,
    cfish_Obj *other
);

Indicate whether one String is less than, equal to, or greater than another. The Unicode code points of the Strings are compared lexicographically. Throws an exception if other is not a String.

Returns: 0 if the Strings are equal, a negative number if self is less than other, and a positive number if self is greater than other.

To_String
cfish_String* // incremented
cfish_Str_To_String(
    cfish_String *self
);

Return a copy of the String.

Trim
cfish_String* // incremented
cfish_Str_Trim(
    cfish_String *self
);

Return a copy of the String with Unicode whitespace characters removed from both top and tail. Whitespace is any character that has the Unicode property White_Space.

Trim_Top
cfish_String* // incremented
cfish_Str_Trim_Top(
    cfish_String *self
);

Return a copy of the String with leading Unicode whitespace removed. Whitespace is any character that has the Unicode property White_Space.

Trim_Tail
cfish_String* // incremented
cfish_Str_Trim_Tail(
    cfish_String *self
);

Return a copy of the String with trailing Unicode whitespace removed. Whitespace is any character that has the Unicode property White_Space.

Code_Point_At
int32_t
cfish_Str_Code_Point_At(
    cfish_String *self,
    size_t tick
);

Return the Unicode code point located tick code points in from the top. Return CFISH_STR_OOB if out of bounds.

Code_Point_From
int32_t
cfish_Str_Code_Point_From(
    cfish_String *self,
    size_t tick
);

Return the Unicode code point located tick code points counting backwards from the end. Return CFISH_STR_OOB if out of bounds.

SubString
cfish_String* // incremented
cfish_Str_SubString(
    cfish_String *self,
    size_t offset,
    size_t length
);

Return a new substring containing a copy of the specified range.

offset

Offset from the top, in code points.

length

The desired length of the substring, in code points.

Top
cfish_StringIterator* // incremented
cfish_Str_Top(
    cfish_String *self
);

Return an iterator initialized to the start of the string.

Tail
cfish_StringIterator* // incremented
cfish_Str_Tail(
    cfish_String *self
);

Return an iterator initialized to the end of the string.

Inheritance

Clownfish::String is a Clownfish::Obj.