<< Click to Display Table of Contents >> Navigation: Reference > Raw data > Data files for use with CL |
Data files contain information; in CL this information is usually that given by respondents to the entries in a questionnaire. There are many different types of data file format (the way in which the data is stored in the file varies). CL uses two data file types - binary with the extension .CBA, and ASCII with the extension .ASC. For questionnaires with a lot of multi-response questions we recommend using the binary data format (,CBA), because multi-coded responses are held in a more compressed form than in an ASCII file - thus keeping the size of the data file to a minimum. The CL default data file format is .CBA.
Data can be generated from a data entry package or other software or data may be entered by using CL itself. Data files may consist of ordinary text (character) or they can be column binary (card image) in a variety of different forms.
Ordinary character (text) files can be produced and altered by a word-processor, a line editor, or the Windows notepad.
Column binary files hold data in the form of (usually 80) column card images, where each column contains up to 12 codes (VX0123456789 or &-0123456789). If any column contains more than one code, this is called multi-coding or multi-punching. Column binary files can only be produced and altered using special software designed to handle these file types.
SUFFIX DATA FILE DESCRIPTION CONTENTS STRUCTURE
.ASC ASCII character (MBCS)
.UNI Unicode character (UTF8/UTF16 with BOM)
.CSV Comma separated value (MBCS or UTF8/UTF16 with BOM)
.CBA Blank-added Card image Line
.CBT At-added Card image Line
.CBG Gyrated Card image Line
.CBC Column expanded Card image Line
.CBE Column extended Card image Line
.CBI IBM format Card image Raw
.CBJ Unpacked (top 12) Card image Raw
.CBU Unpacked (bottom 12) Card image Raw
.CBR Raw (120 Bytes) Card image Special
.CBD Direct ASCII data Character Special
.CSA Byte swapped Blank-added Card image Line
.CST Byte swapped At-added Card image Line
.CSG Byte swapped Gyrated Card image Line
.CSI Byte swapped IBM format Card image Raw
.CSJ Byte swapped unpacked (top 12) Card image Raw
.CSU Byte swapped unpacked (bottom 12) Card image Raw
All file types listed in the table above (and described in the following sections) may be used directly with CL. The data file should be copied into the current project directory you should use the selection [Data] [Convert raw data] from the "Main" window to convert the file to a .CBA data file.
For more information see section Data - convert raw data
If you are not sure of the type of data file that has been supplied to you, it is most likely to be one of the following:
.ASC - Fixed format ASCII - this type of data is often exported from a database program.
.CSV - Comma separated value (also know as column delimited) this type of file is normally output from a spreadsheet, database or statistical package.
.CSI - A form of column binary (card image) file. If a card image file is supplied to you and the total byte size of the data file is exactly divisible by 160 it is likely to be a .CSI file.
In line structure each card is held as a line. Each line ends with two termination characters. For card image files each column is manipulated to form one or more (usually 2) ASCII characters. Lines vary in length as blank columns at the end of card are not written to the file.
In data files with line structure each line ends with CR (carriage return) and a LF (line feed). Line structure data files from UNIX or XENIX have a LF (line feed) only at the end of each line.
In a raw data file, each record usually occupies 160 bytes (characters).
Special structure data file types are described individually in section Data file descriptions.
In byte swapped data file structures, each pair of bytes (characters) will have been swapped on the file, so instead of bytes appearing as 12345678, they will appear as 21436587. (This can accidentally happen when moving files between different types of computer.) For .CBI files, if the bytes are swapped codes VX0123 are swapped with codes 456789.