BinHex 4.0 Definition - Peter N Lewis, Aug 1991. For a long time BinHex 4.0 has been the standard for ASCII encoding of Macintosh files. To my knowledge, there has never been a full definition of this format. Info-Mac had an informal definition of the format, but this lacked a description of the CRC calculation, as well as being vague in some areas. Hopefully this document will fully define the BinHex 4.0 standard, and allow more programmers to fully implement it. Note, however, that this definition is how I see the BinHex standard, and since I had no part whatsoever in defining it initially, this document can have no real claim to being the one true definition. If anyone feels that I have not got the facts straight, or that it is ambiguous in any details, please contact me at the address at the bottom of this document. Format: It is necessary to distinguish between the encoding format and decoding format since we wish to allow all decoders to read all versions of the BinHex format, while trying to reduce the variation in encoding. All numbers are decimal unless they have the format 0xFF, in which case they are hex. is a tab, character value 9. is a linefeed, character value 10. is a carriage return, character value 13. is a space, character value 32. means to the encoder: The sequence . Either (but not both) may be omitted. Use whatever is appropriate for your system ( for Mac, for Unix, for MS-DOS). means to the decoder: Any sequence of , , , . (The and are required because some old programs produced these characters). A hqx file begins with a description which should be ignored by the decoder (and generally left blank by encoding software). The hqx file proper consists of the sequence: (This file must be converted with BinHex 4.0):: When encoding, DO NOT mess with the "(This file..." string. There are a large number of automated programs that use it, and they may stumble over anything other than this exact string. When decoding, you should only check the string up to "with BinHex", then skip until either a or , then skip characters, and check for the colon. Also, be careful with the , this can be either a character, or the start of the file. Some old programs produced an extra exclamation mark (!) immediately before the final colon (:). When decoding, after all data is read, skip any characters, and then allow a single optional exclamation mark (and then skip again) before checking for the terminating colon. Don't check for a after the colon. Note also that when decoding you may choose to ignore "(This file..." and simply start when you get ":" (with the line possibly finishing in a colon). is a sequence of 6-bit encoded characters. The top 6 bits of the first byte (bits 7 to 2) are encoded in the first character, the bottom 2 bits and the top 4 bits of the second byte are encoded into the second character, and so on. Numbers longer than one byte (eg, the 4-byte fork lengths) are stored from most significant to least significant. For example 513 (long-word, decimal) would be stored in four bytes: 0,0,2,1 in that order. When encoding, a should be inserted after every 64 characters. The first character should follow immediately after the first colon (without a ), and the first line should be 64 characters long, including the colon. The final colon should go on the same line as the last character and there should be a return after it. Thus, the final line must be between 2 and 65 (inclusive) characters long. When decoding, lines of any length should be accepted, and characters should be ignored everywhere (before and after the first colon, between any two hqx characters, and before the trailing colon. This string defines the valid characters, in order: !"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr (ie, ! is 0, " is 1, ..., r is 63). When decoding, any character other than those 64 and the characters indicates an error and the user should be notified of such. The format also supports run length encoding, where the character to be repeated is followed by a 0x90 byte then the repeat count. For example, FF9004 means repeat 0xFF 4 times. The special case of a repeat count of zero means it's not a run, but a literal 0x90. 2B9000 => 2B90. *** Note: the 9000 can be followed by a run, which means to repeat the 0x90 (not the character previous to that). That is, 2B90009005 means a 2B9090909090. Before encoding into RLE, 6-bits (or after if decoding), the file is formatted into a header, data fork, and resource fork. All three are terminated with a two byte CRC. The header consists of a one byte name length, then the file name, then a one byte 0x00. The rest of the header is 20 bytes long, and contains the file type, creator, file flags, data and resource lengths, and the two byte CRC value for the header. When encoding, the flags should be copied directly from the Finder Info. When decoding, the OnDesk, Invisible, and Initted flags should be cleared. Also, when decoding, the file name should be validated for the given OS. For example, a Mac program should replace a full stop (.) at the front of a file name with a bullet (option-8), and should replace colons (:) with dashes (-) and, if running under A/UX, slashes should be replaced by dashes (-). The data fork and resource fork contents follow in that order. If a fork is empty, there will be no bytes of contents and the checksum will be two bytes of zero. So the decoded data between the first and last colon (:) looks like: 1 n 4 4 2 4 4 2 (length) +-+---------+-+----+----+----+----+----+--+ |n| name... |0|TYPE|AUTH|FLAG|DLEN|RLEN|HC| (contents) +-+---------+-+----+----+----+----+----+--+ DLEN 2 (length) +--------------------------------------+--+ | DATA FORK |DC| (contents) +--------------------------------------+--+ RLEN 2 (length) +--------------------------------------+--+ | RESOURCE FORK |RC| (contents) +--------------------------------------+--+ CRCs: BinHex 4.0 uses a 16-bit CRC with a 0x1021 polynomial and the initial register value is 0x0000 (sixteen bits of zero). The general bitwise algorithm is to take data 1 bit at a time and process it through the following: 1) Take the old CRC (use 0x0000 if there is no previous CRC) and shift it to the left by 1. 2) Put the new data bit in the least significant position (right bit). Note that this is most-significant-bit first. 3) If the bit shifted out in (1) was a 1 then xor the CRC with 0x1021. 4) Loop back to (1) until all the data has been processed. Or in pseudo pascal: var crc:integer; { init to zero at the beginning of each of the three forks } procedure CalcCRC (v: integer); { 0<=v<=255 } var temp: boolean; i: integer; begin for i := 1 to 8 do begin temp := (crc AND 0x8000) <> 0; crc := (crc << 1) OR (v >> 7); if temp then crc := crc XOR 0x1021; v := (v << 1) AND 0xFF; end; end; When encoding, include two bytes of 0x00 where the CRC should be, and use them in the calculation of the CRC before writing it to the file. Similarly, when decoding, calculate the CRC over all the bytes in the fork (header, data, or resource) except the last two CRC bytes, then continue the CRC calculation with two 0x00 bytes, then compare it to the CRC stored in the file. Parts: If you wish to support segmented files in the same way that comp.binaries.mac files are segmented, use the following line to note the end of a hqx part: --- end of part NN --- Note: When decoding, it is only necessary to check for "--- end of part". Recommence decoding (either later in this file, or in the next alphabetical (or numerical) file) when you encounter the line: --- Note: In this case, when decoding, demand either a or both immediately before, and immediately after the ---. It is unfortunate that this sequence is in fact valid hqx data, but there should not in general be any problems with this. Sources to refer to: FTP from sumex-aim.stanford.edu info-mac/source/pascal/dehqx info-mac/unix/xbin info-mac/unix/mcvert A file called hqx-format.txt which I can no longer find. Author: Peter N Lewis 10 Earlston way, Booragoon 6154 WA, AUSTRALIA Contributors: Roland Mansson Steve Dorner Sak Wathanasin Dave Green Tom Coradeschi Howard Haruo Fukuda Michael Fort Dave Johnson Note: I attempted to contact all these people, but only a few replied. Some of them may not wish to be associated with this document, and certainly they should not be seen as endorsing it in any way. Obviously, any errors in this document are my own!