A C++11 library for Unicode

About ogonek

Ogonek is a C++11 library that implements various Unicode algorithms.

Ogonek version numbers use the semantic versioning rules 2.0.0.

Current features (0.5.1)

Design goals

Ogonek’s design is driven by the following principles.

Validity and correctness first

Ogonek will value validity and correctness above speed and other concerns. Ideally it shall be impossible to obtain invalid Unicode data after any operation provided by ogonek.

Modern C++

Ogonek will use modern C++ techniques as much as possible, and fully embrace C++11.

Explicit is better than implicit

Ogonek will perform as little as possible implicitly. If the users don’t want to care about some details, they don’t care about correctness, and thus they don’t need ogonek.

Ogonek will not silently perform encoding conversions, and won’t assume any encoding except for char16_t and char32_t (which are clearly intended by the standard as UTF-16 and UTF-32 code units); anything else needs to be made explicit.

Fail fast

Ogonek will not let errors go away silently. When appropriate, the API will accept error handling callbacks; in all other scenarios exceptions will be thrown.

Be a good citizen

Ogonek will work well with the standard library, by providing and using models of the existing standard concepts, like iterators and containers.

History and trivia

The name

Ogonek means “little tail” in Polish. It’s the name of a diacritic used in several European and Native American languages. It exists in the Unicode repertoire as a combining character (U+0328 ᴄᴏᴍʙɪɴɪɴɢ ᴏɢᴏɴᴇᴋ), as an isolated character (U+02DB ᴏɢᴏɴᴇᴋ), or precomposed with Latin alphabet letters (like U+01EA ʟᴀᴛɪɴ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ᴏ ᴡɪᴛʜ ᴏɢᴏɴᴇᴋ).

In Russian (огонёк) it can mean “little flame”, which arguably makes it sound a lot cooler than “little tail”.

The name was picked randomly and it has absolutely no special meaning as the name of this project. It’s just a label the author appropriated for it. In English it is pronounced /ˈoʊɡənɛk/.