
Reference data for conversion tests
===================================

SPDX-FileType: DOCUMENTATION
SPDX-FileCopyrightText: NONE
SPDX-License-Identifier: CC0-1.0

This directory contains reference data to test encoding conversions with UTF-8
output encoding.


Input files ("*.raw")
---------------------
Input data that is feeded into the library by the test suite.

The files "ISO-646-*.raw" contain an escape sequence (to switch to the
corresponding character set), followed by 128 bytes with increasing values
(one for each assigned codepoint), followed by the unassigned codepoint 0x80.
The japanese variant has an escape sequence at the end to switch back to
ISO-646-US.

The file "ISO-2022-JP_fox.raw" contains an escape sequence to switch to
JIS X 0208-1983, followed by the english pangram "The quick brown
fox jumps over the lazy dog" ("素早い茶色の狐はのろまな犬を飛び越える。"),
followed by an escape sequence to switch back to ISO-646-US.
The library must accept this file without additional flags.

The file "ISO-2022-JP_fox_broken.raw" has an error in the penultimate kuten
(ku == 0x01, below the offset).

The file "ISO-2022-JP_fox_crlf.raw" contains an invalid CRLF linebreak
(without escape sequence to switch back to ISO 646 in front of it).
This will force the decoder in resync mode (search for next escape sequence).

The file "ISO-2022-JP_fox_garbage.raw" contains a single byte trailing garbage
before the escape sequence to switch back to ISO-646-US.

The file "ISO-2022-JP_fox_ns.raw" has no escape sequence at the end.
This is not allowed by RFC 1468 and the decoder should create correct output
data, but report the error.

The file "ISO-2022-JP_fox_nul.raw" has a NUL control character at the end,
after the escape sequence that switches back to ISO-646-US, followed by
"123". Such data should be accepted by the decoder with JPIC0_ICONV_IGNORE_NULL
flag.

The file "ISO-2022-JP_fox_trunc.raw" is truncated in the middle of the last
kuten. The output data up to the penultimate glyph (inclusively) should be
correct.

The file "ISO-2022-JP_fox_ua.raw" has the penultimate kuten replaced with
an unassigned codepoint (0x7E/0x7E with offset).


Reference files ("*.utf8")
--------------------------
Unicode reference files. Used to compare with the output of the library.
For unassigned codepoints, these files contain '�' (Unicode codepoint U+FFFD).
