GRAPHEME_NEXT_CHARACTER_BREAK_UTF8(3) - Library Functions Manual

NAME

grapheme_next_character_break_utf8 - determine byte-offset to next grapheme cluster break

SYNOPSIS

#include <grapheme.h>

size_t
grapheme_next_character_break_utf8(const char *str, size_t len);

DESCRIPTION

The grapheme_next_character_break_utf8() function computes the offset (in bytes) to the next grapheme cluster break (see libgrapheme(7)) in the UTF-8-encoded string str of length len. If a grapheme cluster begins at str this offset is equal to the length of said grapheme cluster.

If len is set to SIZE_MAX (stdint.h is already included by grapheme.h) the string str is interpreted to be NUL-terminated and processing stops when a NUL-byte is encountered.

For non-UTF-8 input data grapheme_is_character_break(3) and grapheme_next_character_break(3) can be used instead.

RETURN VALUES

The grapheme_next_character_break_utf8() function returns the offset (in bytes) to the next grapheme cluster break in str or 0 if str is NULL.

EXAMPLES

/* cc (-static) -o example example.c -lgrapheme */
#include <grapheme.h>
#include <stdint.h>
#include <stdio.h>

int
main(void)
{
	/* UTF-8 encoded input */
	char *s = "T\xC3\xABst \xF0\x9F\x91\xA8\xE2\x80\x8D\xF0"
	          "\x9F\x91\xA9\xE2\x80\x8D\xF0\x9F\x91\xA6 \xF0"
	          "\x9F\x87\xBA\xF0\x9F\x87\xB8 \xE0\xA4\xA8\xE0"
	          "\xA5\x80 \xE0\xAE\xA8\xE0\xAE\xBF!";
	size_t ret, len, off;

	printf("Input: \"%s\"\n", s);

	/* print each grapheme cluster with byte-length */
	printf("grapheme clusters in NUL-delimited input:\n");
	for (off = 0; s[off] != '\0'; off += ret) {
		ret = grapheme_next_character_break_utf8(s + off, SIZE_MAX);
		printf("%2zu bytes | %.*s\n", ret, (int)ret, s + off);
	}
	printf("\n");

	/* do the same, but this time string is length-delimited */
	len = 17;
	printf("grapheme clusters in input delimited to %zu bytes:\n", len);
	for (off = 0; off < len; off += ret) {
		ret = grapheme_next_character_break_utf8(s + off, len - off);
		printf("%2zu bytes | %.*s\n", ret, (int)ret, s + off);
	}

	return 0;
}

SEE ALSO

grapheme_next_character_break(3), libgrapheme(7)

STANDARDS

grapheme_next_character_break_utf8() is compliant with the Unicode 15.0.0 specification.

AUTHORS

Laslo Hunhold (dev@frign.de)

suckless.org - 2022-10-06