
Reference data for codepage tests
=================================

SPDX-FileType: DOCUMENTATION
SPDX-FileCopyrightText: NONE
SPDX-License-Identifier: CC0-1.0

This directory contains reference data to test encoding conversions with UTF-8
output encoding. The files were created with GNU iconv(1) like this:

    $ iconv --byte-subst='�' -t UTF-8 -f ISO-8859-3 >ISO-8859-3.utf8 <codepage.bin

The input file "codepage.bin" contains 256 octets with values corresponding to
their positions.

For unassigned codepoints in a codepage, the corresponding reference data file
contains '�' (Unicode codepoint U+FFFD).


Intentional deviations from GNU iconv (modified reference files)
----------------------------------------------------------------
Codepage "Macintosh":
- Codepoint 0xDB mapping was changed from U+00A4 "CURRENCY SIGN to
  U+20AC "EURO SIGN" (Apple changed this in 1998 according to Wikipedia).

Codepage "Windows-1255":
- Codepoint 0xFC was U+05EA "HEBREW LETTER TAV" (should be unassigned) moved to
  codepoint 0xFA that was unassigned (looks like a bug in GNU iconv).

Codepage "Windows-1258":
- Codepoint 0xDE should be U+0303 "COMBINING TILDE" but was composed with the
  former codepoint 0xDD to U+1EEE "LATIN CAPITAL LETTER U WITH HORN AND TILDE".
  It was replaced with its canonical decomposition <U+01AF U+0303>.
