Raw data structure

<< Click to Display Table of Contents >>

Navigation:  User Guide > Data > Handling raw data >

Raw data structure

Companion

The raw data structure is set in [Project global settings].

Unless you have to lay out the data to match a predetermined data location layout (map) you can use the default data structure of "One line per questionnaire" and "Character" data.

For CATI surveys and those with a lot of data (including repeated or continuous surveys) you may wish to increase the serial number width to 7, 8 or 9 digits.

Storage method

There are various methods of raw data storage:

CSV

For details about CSV files, see earlier section Spreadsheet file introduction.

The Companion collects data in CSV format.

CSV data files should have a column labelled "Serial" and do not contain a job number (see below).

Character

Fixed format character data storage is commonly used, and the data is stored as recognisable characters (numbers and letters) in a Text Document file (.txt) with the extension ASC or UNI.

IMPORTANT: When working with ASCII data you must ensure the order of codes set under [Project global settings] begins with a code 1.

Character data can be stored in a raw data file as:

ASC (character fixed format)

UNI - the same as ASC except that UTF-8 Unicode encoding is used.  At present this type of file will need to be converted to ASC before analysis.

Binary

Fixed format binary data storage was sometimes used for compatibility with older market research programs.

Each data location can contain letters and numbers, but can also contain up to 12 independent codes usually called VX0123456789.  There are other names for the X and V codes, for example YX0123456789 or BA0123456789.  Letters are stored as a particular set of codes.

The V and X codes on their own are actually the characters "&" and "-" respectively.

Binary data can be more efficient for multi-coded questions because up to 12 answers can be placed in each data location.

Binary data can be stored in a raw data file as:

CBA (Classic binary)

CBE (Quantum binary)

CSI (360 column binary, fixed record length)

The method chosen does not alter the contents as far as raw data structure and the answers to questions are concerned.  It only alters way this data is stored in the raw data file.

Job number

In fixed format data a few characters can be placed on every line of data to identify the survey.  The survey to which a data file belongs can then be identified by inspecting the job number.  A job number is not necessary if data files are named appropriately.

Serial Number

Every survey should have serial numbers - all questionnaires should be allocated a unique number.

For paper surveys the serial number is used to match the data record with the paper document.

A serial number is placed on every line of data to identify the questionnaire to which the data belongs.

Other fixed format structure details

One line per questionnaire

The raw data for each record (questionnaire) is stored on one line.

Each line should have a unique serial number, usually in the first few data locations.

Sometimes there will be job number that is repeated on every line.

The data locations for questions are in a "fixed format"; they are in the same place for every questionnaire.

This method can be used with Character data storage and Binary data storage.

Multiple lines per questionnaire

The data for each record (questionnaire) is split into "cards".  Each "card" has 80, 99 or 999 columns.

Data locations do not run in a continuous sequence.

The first few columns on each line (card) are usually reserved for identification purposes:

job number (optional) - is the same on every card in the data file

serial number - is the same for every line relating to a particular questionnaire

card number - identifies which card this is (between 1 and 999)

IMPORTANT: no questions can be split over cards except single-coded and multi-coded questions with "Individual locations" for responses.

This method can be used with Character data storage and Binary data storage.

Card Number

Each line has a card number to identify which line (or card) the line represents.

Card types

A project can use up to 999 cards (lines) per questionnaire.  These can be fixed or optional.  Optional cards can be missing if there is no data on them.