The Mudcat Café TM
Thread #115539   Message #2479568
Posted By: JohnInKansas
29-Oct-08 - 11:27 PM
Thread Name: Tech: converting binary data to ASCII
Subject: RE: Tech: converting binary data to ASCII
In common use, the term "binary file" just means that the file contains no - or few - formatting bits, tags, labels, and other "non-essential" information.

"All computer files are binary."

In the broadest sense, a hard drive contains ONE BINARY FILE. The file is broken up by placing "markers" at fixed intervals within the file, to identify "clusters" whose sole real purpose are to let the computer know "where it is" within the "file."

An "index" is used to identify a specific cluster (location on the disk) where an "individual file" begins, and usually each cluster contains additional "markers" that identify which file it contains, which cluster precedes it in that file, and which cluster follows it in that file.

Most "files" that are associated with a particular program, or that conform to some "standard format" also contain additional "format information" that gives the program used to read and display the file information on the specific way(s) the bits should be read. While these still are - for the technical purist - binary files, the more specific name that identifies a particular format is generally used, and the "file extensions" provide some bit of "shorthand" for the identification of a particular format.

The additional "format bits" allow a program that reads "binary files of format type xxxxx" to interpret the bits in appropriate ways, and to change the way the bits are interpreted within a particular "span of bits" if that is needed for proper display or other use.

When a "transducer" or other kind of instrument is used to measure some physical parameter, it's usual for the measuring device to "send" only a "number" corresponding to the value of the parameter being measured. If only one, or a few, "numbers" are needed it should be fairly easy to convert the numbers - each sent as a binary number or as a "character encoded formatted" number like hexadecimal, BCD (binary coded decimal) or an ASCII/ANSI/Unicode "character."

If "lots of numbers" are needed, particularly in the case of a "continuous stream of readings," it's common to use a "logging program" to keep the numbers orderly. In one common "format" a specific number of "numbers" (values) is "written" in a single column or in a single row, and the computer receiving the information (i.e. the logging program) "attaches" some bit of identifying information so you'll be able later to know "which numbers came when."

In the case of "intermittent readings" each received number/value may have a "timestamp" attached to each reading, when the reading is "logged" into the database file by the "logging program." In some cases just a "sequence number" might be sufficient.

If data is "streamed" at some fixed rate, the logging program may timestamp only each "row/column" of numbers, or for high speed data only each "nth row." For a data stream arriving at moderate to high frequencies, the "timestamp" may give GMT or some other "universal time" click for every nth row, and only an "offset from the last universal time" for each row until the next "universal" stamp.

The periodic universal time may be called a "synch" marker, while the "offset times" are just "identifiers" of individual "data points" within the span between synchs.

If the data values are not individually "marked," the computer (or the user manually) takes 4, 8 or 16 bits and assumes that's a "value" for a data point. If a single bit is "dropped" or one is added by noise in the experimental setup, every value that follows will be mis-read, since the conversion to "useful form" will start and end at the wrong points in the "stream of bits," and each reported value will actually be a mixture of bits from what should have been two different original data values. The "synch marker" can be used to discard data that was misread, and to restart "where the next value starts" so that subsequent information is correctly read.

In a University or moderately sized "research facility" it's common for the organization having control of the computer where data will be logged and made accessible for "interpreted use" to provide the "logging program" that individual researchers are expected to use. Such programs appear to commonly allow the user to specify things like byte length, whether data will be in rows or in columns and how many of each, frequency of "tags" and "synch tags" and perhaps other details appropriate for a specific data set.

I haven't found much in the way of logging interfaces that are accessible for individual use. Most isolated experimenters appear to write their own. Where a single data kind must be handled, this doesn't appear to be a particularly complex programming task, since the extra features to "generalize for many users with varied data types" can be omitted - but it is a programming task that probably should be done for anything but "trivially simple" data.

The benefits of "organizing" your data via an appropriate "logging interface" would justify trying to find someone who's actually "done it" and has the access that would permit "loaning" (or stealing1 for you) a known good program.

1 Within the bounds of ethical conduct, of course.

John