Data file type descriptions

<< Click to Display Table of Contents >>

Navigation:  Reference > Raw data >

Data file type descriptions

.ASC

This is an ordinary ASCII character data file often known as "fixed format ASCII". This type of file may be produced by a line editor or word processor, or output from a variety of software packages, such as scanning (OMR and OCR) software, databases or statistics packages.

IMPORTANT: in an ASCII file multi-coded columns may be stored as a set of 0 and 1 codes spaced along the fixed format line. If codes V and X are used in ASCII data, they will be displayed as & and - (minus) respectively.

If you have entered data a .CBA data file for details on how to output an ASCII file, see section Exporting data from CL

.UNI

Similar to .ASC but can contain Unicode characters.  May be encoded UTF8 or UTF16 LE, both with BOM (Byte Order Mark).

.CSV

You must set RCP CSVDATA to read CSV data.

A comma Separated Value file.  It must have a header row that identifies the questions in the columns.  The header name is used when defining variables with the same name.  If the file is UTF8 encoded it must have a BOM.

If a column is an mvar it must have the item numbers separated by spaces.

This file type is called a "portable CSV file" in CL.

To refer to the data in the column the syntax =$* is used.

For svar and mvar definitions the $* is followed by a string definition, for example:

dm $Q1=$*/1-5,r,

will pick up values 1-5 from the column headed Q1.

For ivar and wvar definitions just the $* is needed, for example:

di $Value3=$*,

will put the value from column headed Value3 into the variable $Value3.

For cvars you must specify the character width for the variable, for example:

dc $name(50),

dc $name=$*,

will put the text from the column headed NAME (or name) into the variable $name.

If a column headed SERIAL or IOBS contains the serial numbers, you should include the SERIAL command at the top of the data stage:

serial number,

.CBA

This is the CL default binary data file type.

Column binary data is stored at the rate of one column per two bytes. The columns are split into two masks - the first mask VX0123 is stored in the first byte and the second mask 456789 in the second byte. These masks are then placed in the least significant six bits in each byte giving each character a value between 0 (no codes) and 63 (all six codes). A blank (32) is then added to each character to give a value between 32 (no codes) and 95 (all six codes). Each card will produce a line of up to 160 valid printable ASCII characters which means that .CBA files can be handled by all the standard file handling programs. The standard sort program can be used with a suitable front end to translate column numbers (1 - 80) to character positions (1 -160).

.CBC

This data file type is similar to .ASC except that multi-coded columns are allowed and are expanded by listing the codes one by one and surrounding the multi-coded column with forward slashes (/). So a column coded 1 and 4 and 6 becomes the five character sequence /146/. This type of data can only have up to 80 columns per card and if every column has all 12 codes then the line would be 1120 (80 times 14) characters long. In practice lines are restricted by some operating systems, to 512 characters so this method cannot be used for heavily multi-coded data.

.CBD

This data file type is similar to .ASC except that each line is exactly 80 characters and there are no line termination characters. Each card uses 80 bytes.

.CBE

This data file type is similar to .ASC except that multi-coded columns are allowed and replaced by an asterisk. If there are any multi-coded columns then the last column on the card is followed by a DEL character (ASCII 127) which is followed by a pair of characters for each asterisk. This pair of characters is the same as that used for every column in a .cbt file. Columns which contain ASCII 32 through 126 (except 42) are not treated as multi-coded, the relevant character is placed in the column.

This file type is often used by other market research programs.

.CBG

This data file type is similar to .CBT except the code positions are reversed, so that 3210XV and 987654 are used instead of VX0123 and 456789 respectively.

.CBI

In this data file type column binary data is stored at the rate of one column per two bytes. The columns are split into two masks - VX0123 in the first byte and 456789 in the second byte. These masks are then placed in the least significant six bits in each byte giving each character a value between 0 (no codes) and 63 (all six codes). A standard column binary file will be either a .CBI or a .CSI.

.CBJ

In this data file type column binary data is stored at the rate of one column per two bytes. The columns are split VX012345 and 6789. The first mask is then placed in the first byte and the second mask in the most significant four bits of the second byte.

.CBR

In this data file type column binary data is stored at the rate of four columns per six bytes; data is packed up so that no bits are left unused. Codes are stored in the order V-9. Each card image uses 120 bytes.

.CBT

This file type is similar to .CBA except that an "@" (64) is added to each character giving a value between 64 (no codes) and 127 (all six codes). Unfortunately the value for all six codes (127) forms the DEL character which may prevent the transfer of .cbt files over a computer link.

.CBU

In this data file type column binary data is stored at the rate of one column per two bytes. The columns are split VX01 and 23456789. The first mask is then placed in the least significant four bits of the first byte and the second mask in the second byte.