GRAPHEME_DECODE_UTF8(3) - Library Functions Manual

NAME

grapheme_decode_utf8 - decode first codepoint in UTF-8-encoded string

SYNOPSIS

#include <grapheme.h>

size_t
grapheme_decode_utf8(const char *str, size_t len, uint_least32_t *cp);

DESCRIPTION

The grapheme_decode_utf8() function decodes the first codepoint in the UTF-8-encoded string str of length len. If the UTF-8-sequence is invalid (overlong encoding, unexpected byte, string ends unexpectedly, empty string, etc.) the decoding is stopped at the last processed byte and the decoded codepoint set to GRAPHEME_INVALID_CODEPOINT.

If cp is not NULL the decoded codepoint is stored in the memory pointed to by cp.

Given NUL has a unique 1 byte representation, it is safe to operate on NUL-terminated strings by setting len to SIZE_MAX (stdint.h is already included by grapheme.h) and terminating when cp is 0 (see EXAMPLES for an example).

RETURN VALUES

The grapheme_decode_utf8() function returns the number of processed bytes and 0 if str is NULL or len is 0. If the string ends unexpectedly in a multibyte sequence, the desired length (that is larger than len) is returned.

EXAMPLES

/* cc (-static) -o example example.c -lgrapheme */
#include <grapheme.h>
#include <inttypes.h>
#include <stdio.h>

void
print_cps(const char *str, size_t len)
{
	size_t ret, off;
	uint_least32_t cp;

	for (off = 0; off < len; off += ret) {
		if ((ret = grapheme_decode_utf8(str + off,
		                                len - off, &cp)) > (len - off)) {
			/*
			 * string ended unexpectedly in the middle of a
			 * multibyte sequence and we have the choice
			 * here to possibly expand str by ret - len + off
			 * bytes to get a full sequence, but we just
			 * bail out in this case.
			 */
			break;
		}
		printf("%"PRIxLEAST32"\n", cp);
	}
}

void
print_cps_nul_terminated(const char *str)
{
	size_t ret, off;
	uint_least32_t cp;

	for (off = 0; (ret = grapheme_decode_utf8(str + off,
	                                          SIZE_MAX, &cp)) > 0 &&
	     cp != 0; off += ret) {
		printf("%"PRIxLEAST32"\n", cp);
	}
}

SEE ALSO

grapheme_encode_utf8(3), libgrapheme(7)

AUTHORS

Laslo Hunhold (dev@frign.de)

suckless.org - 2022-10-06