16 Commits

Author SHA1 Message Date
Lars Henriksen
7078556f9d Key bindings for UTF-8 encoded characters
Internally characters (keys) have two representations: integers and key
names. Key names are characters strings, usually the name of the
character; e.g., the character A has the representations 65 and "A", and
the tab character the representations 9 and "TAB".

The function keys_int2str() turns the integer representation of a
key/character into the key name.

For display purposes the key names are usually confined to have display
width at most three. Some curses pseudo-keys have longer key names;
e.g., the back-tab character is "KEY_BTAB". A long key name makes a
character difficult to recognize in the status bar menu.

The key name of a multibyte, UTF-8 encoded character is the conventional
Unicode name of the code point; e.g., the character ü has key name
"U+00FC" because ü is the code point 0xFC. Most of these look alike in
the status bar menu.

The patch makes the key name of a multibyte character look like that of
a singlebyte character: the character itself, i.e. the key name of the
character ü is "ü".

The main tool is implementation of a utf8_encode() routine.

Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
2018-06-03 11:26:12 +02:00
Lars Henriksen
431e4a00e7 Rename utf8_ord() to utf8_decode()
Purely for readability and in preparation for the counterpart utf8_encode().

Signed-off-by: Lars Henriksen <LarsHenriksen@get2net.dk>
Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
2018-06-03 11:24:37 +02:00
Lars Henriksen
95c5d576fa Update UTF-8 base code
UTF-8 encodes characters in one to four bytes (since 2003).

Because 0 is a valid code point, the decode function utf8_ord()
should return -1, not 0, on error. As a consequence utf8_width()
should return 0 for a continuation byte (as it did previously).

Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
2017-12-07 09:02:58 +01:00
Lukas Fleischer
273e32d43d Factor out UTF-8 code point decoding
Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
2017-08-30 16:17:28 +02:00
Lukas Fleischer
9f6678bc49 Update copyright ranges
Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
2017-01-12 08:40:30 +01:00
Lukas Fleischer
c34f9aba29 Refactor UTF-8 chopping
Add a function that makes sure a string does not exceed a given display
size. If the string is too long, dots ("...") are appended.

Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
2016-02-26 09:14:40 +01:00
Lukas Fleischer
978d24a9d2 Update copyright ranges
Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
2016-01-30 11:21:53 +01:00
Lukas Fleischer
9ef427693b Update copyright ranges
Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
2015-02-07 11:42:20 +01:00
Lukas Fleischer
ce93fa8adb Use a macro to determine the size of arrays
Use following macro instead of "sizeof(x) / sizeof(x[0])" everywhere:

    #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))

Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
2013-05-04 20:32:26 +02:00
Lukas Fleischer
694d28eb78 Use tabs instead of spaces for indentation
This completes our switch to the Linux kernel coding style. Note that we
still use deeply nested constructs at some places which need to be fixed
up later.

Converted using the `Lindent` script from the Linux kernel code base,
along with some manual fixes.

Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
2013-04-14 00:19:01 +02:00
Lukas Fleischer
a363cb9b91 Fix braces in if-else statements
From the Linux kernel coding guidelines:

    Do not unnecessarily use braces where a single statement will do.
    [...] This does not apply if one branch of a conditional statement
    is a single statement. Use braces in both branches.

Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
2013-02-17 09:19:04 +01:00
Lukas Fleischer
a7944d335e Update copyright ranges
Add 2013 to the copyright range for all source and documentation files.

Reported-by: Frederic Culot <frederic@culot.org>
Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
2013-02-04 20:10:14 +01:00
Lukas Fleischer
cfd8ede2b3 Switch to Linux kernel coding style
Convert our code base to adhere to Linux kernel coding style using
Lindent, with the following exceptions:

* Use spaces, instead of tabs, for indentation.
* Use 2-character indentations (instead of 8 characters).

Rationale: We currently have too much levels of indentation. Using
8-character tabs would make huge code parts unreadable. These need to be
cleaned up before we can switch to 8 characters.

Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
2012-05-21 10:13:05 +02:00
Lukas Fleischer
c9aff6d213 Update copyright ranges
Add 2012 to the copyright range for all source and documentation files.

Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
2012-03-26 14:38:16 +02:00
Lukas Fleischer
496f0d98f8 utf8_width() performance improvements
* Sort character width lookup table by character ranges.

* Use binary search instead of linear search for UTF-8 character width
  lookups which will speed up utf8_width() (O(log n) instead of O(n)).

Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
2011-07-02 10:09:18 +02:00
Lukas Fleischer
271457b7a4 Add basic UTF-8 helper functions
Add utf8_width() and utf8_strwidth() which can be used to calculate the
display width of a single character or a string, respectively. A lookup
table is used to spot double width characters, as well as composing
characters. There currently isn't any code to deal with ambigious
characters.

Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
2011-06-29 15:43:44 +02:00