Reference Manual > Files > Fixed format data files

We recommend using UTF-8 CSV data files.

This section describes the types of fixed format character data files.

IMPORTANT: the way UNI files are treated is different to early versions.

There are two basic types of fixed format character data files: ASC and UNI. If a correct extension is not used then ASC is assumed.

For fixed format data files each question is allocated a specific set of data locations. The difference between the two data file types relates to the way that data locations are used.

ASC (file extension .asc)

Data locations refer to bytes within the data file record. For example a character question in data location 12 width 4 will use bytes 12, 13 ,14 and 15 in the data record.

This character question can hold up to 4 English (ANSII) characters (including blanks) which can include numbers and normal punctuation.

If this character question contains a text that is not English then the number of characters that it can hold will vary depending on the language and the encoding. For example it will only hold 2 Korean characters because each character will use 2 bytes.

If you are not using English then you should allow enough data locations for character questions depending on the encoding used.

If a data file only contains ANSII characters then it does not matter whether it is ASC or UNI because every character will only use one byte.

UNI (file extension .uni)

Data locations refer to characters within the data file record. For example a character question in data location 12 width 4 will use characters 12, 13 ,14 and 15 in the data record.

This character question can hold up to 4 characters (including blanks) in any language. The actual number of bytes used in the data record will depend on the encoding used.

Encoding in fixed format character data files

The way the information is stored in each record depends on the encoding, see Encoding.

ASC and UNI files can use any of the following encoding types:

•Locale (MBCS)

•UTF-8 (with or without a BOM)

•UTF-16

UTF-8 encoding is recommended.

Some languages, such as Thai and Indian, should avoid using UTF-16 data files because some characters will need 2 UTF-16s to represent a single character. UTF-8 CSV data files are strongly recommended for these languages.

Converting types

Provided you have specified all the character questions in the project you can convert from one type to another, and change the encoding, see Convert fixed data.