Ogonek is a C++11 library that implements various Unicode algorithms.
Ogonek version numbers use the semantic versioning rules 2.0.0.
algorithms that support any range, with any encoding;
character property queries
string with compile-time checked encoding conversions and built-in validation;
Unicode normalization forms;
canonical and compatibility equivalence comparisons;
boundary analysis algorithms (currently grapheme clusters and words);
Ogonek’s design is driven by the following principles.
Ogonek will value validity and correctness above speed and other concerns. Ideally it shall be impossible to obtain invalid Unicode data after any operation provided by ogonek.
Ogonek will use modern C++ techniques as much as possible, and fully embrace C++11.
Ogonek will perform as little as possible implicitly. If the users don’t want to care about some details, they don’t care about correctness, and thus they don’t need ogonek.
Ogonek will not silently perform encoding conversions, and won’t assume any
encoding except for
char32_t (which are clearly intended by the
standard as UTF-16 and UTF-32 code units); anything else needs to be made
Ogonek will not let errors go away silently. When appropriate, the API will accept error handling callbacks; in all other scenarios exceptions will be thrown.
Ogonek will work well with the standard library, by providing and using models of the existing standard concepts, like iterators and containers.
Ogonek means “little tail” in Polish. It’s the name of a diacritic used in several European and Native American languages. It exists in the Unicode repertoire as a combining character (U+0328 ᴄᴏᴍʙɪɴɪɴɢ ᴏɢᴏɴᴇᴋ), as an isolated character (U+02DB ᴏɢᴏɴᴇᴋ), or precomposed with Latin alphabet letters (like U+01EA ʟᴀᴛɪɴ ᴄᴀᴘɪᴛᴀʟ ʟᴇᴛᴛᴇʀ ᴏ ᴡɪᴛʜ ᴏɢᴏɴᴇᴋ).
In Russian (огонёк) it can mean “little flame”, which arguably makes it sound a lot cooler than “little tail”.
The name was picked randomly and it has absolutely no special meaning as the name of this project. It’s just a label the author appropriated for it. In English it is pronounced /ˈoʊɡənɛk/.