Glib::ustring

Glib::ustring
	Chapter 3. Basics

You might be surprised to learn that gtkmm doesn't use std::string in its interfaces. Instead it uses Glib::ustring, which is so similar and unobtrusive that you could actually pretend that each Glib::ustring is a std::string and ignore the rest of this section. But read on if you want to use languages other than English in your application.

std::string uses 8 bits per character, but 8 bits aren't enough to encode languages such as Arabic, Chinese, and Japanese. Although the encodings for these languages have been specified by the Unicode Consortium, the C and C++ languages do not yet provide any standardized Unicode support for UTF-8 encoding. GTK and GNOME chose to implement Unicode using UTF-8, and that's what is wrapped by Glib::ustring. It provides almost exactly the same interface as std::string, along with automatic conversions to and from std::string.

One of the benefits of UTF-8 is that you don't need to use it unless you want to, so you don't need to retrofit all of your code at once. std::string will still work for 7-bit ASCII strings. But when you try to localize your application for languages like Chinese, for instance, you will start to see strange errors, and possible crashes. Then all you need to do is start using Glib::ustring instead.

Note that UTF-8 isn't compatible with 8-bit encodings like ISO-8859-1. For instance, German umlauts are not in the ASCII range and need more than 1 byte in the UTF-8 encoding. If your code contains 8-bit string literals, you have to convert them to UTF-8 (e.g. the Bavarian greeting "Grüß Gott" would be "Gr\xC3\xBC\xC3\x9F Gott").

You should avoid C-style pointer arithmetic, and functions such as strlen(). In UTF-8, each character might need anywhere from 1 to 6 bytes, so it's not possible to assume that the next byte is another character. Glib::ustring worries about the details of this for you so you can use methods such as Glib::ustring::substr() while still thinking in terms of characters instead of bytes.

Unlike the Windows UCS-2 Unicode solution, this does not require any special compiler options to process string literals, and it does not result in Unicode executables and libraries which are incompatible with ASCII ones.

Reference

See the Internationalization section for information about providing the UTF-8 string literals.


Signals		Mixing C and C++ APIs