Uli's Web Site

Other Sites:	Stories Pix Abi 2000 Stargate: Resurgence
Lost?	Site Map!

blog

	Archive

	Blog Topics

15 Most Recent [RSS]

	Less work through Xcode and shell scripts 2011-12-16 @600

	iTunesCantComplain released 2011-10-28 @954

	Dennis Ritchie deceased 2011-10-13 @359

	Thank you, Steve. 2011-10-06 @374

	Cocoa Text System everywhere... 2011-03-27 @788

	Blog migration 2011-01-29 @520

	All you need to know about the Mac keyboard 2010-08-09 @488

	Review: Sherlock 2010-07-31 @978

	Playing with Objective C on Debian 2010-05-08 @456

	Fruit vs. Obst 2010-05-08 @439

	Mixed-language ambiguity 2010-04-15 @994

	Uli's 12:07 AM Law 2010-04-12 @881

	Uli's 1:24 AM Law 2010-04-12 @874

	Uli's 6:28 AM Law 2010-04-12 @869

	Uli's 3:57 PM Law 2010-04-12 @867

More...

Saving files correctly

Recently, someone asked whether it was OK to use the ANSI standard library's fread and fwrite functions to write stuff to disk. The question was in the context of a Cocoa Objective C application, but I'll keep it a bit more general because the problem is not unique to Cocoa. The answer is:

It depends.

If you take account of some differences, fread() or any other read/write method that uses raw bytes is OK. However, most object-oriented frameworks and even procedural libraries provide more high-level file access that serializes whole object graphs into streams of data.

If you're saving objects in an object-oriented language, they are effectively structs for this discussion. In many languages, the root object's instance variables are private, so you can't save those, and need to re-create the right subclass after loading an object, too. Framework classes like NSKeyedArchiver and Co. take care of that for you. So, if you're not doing something that needs to be loaded partially and needs to be random access, an archiver is usually less hassle.

Why? Well, the endian-ness (order in which the individual bytes of a multi-byte type like an int are stored) differs between some platforms, and if you save on one platform and read on the other (e.g. PowerPC Mac -> Intel Mac), your numbers get 'reversed' and you get nonsense numbers.

The usual solution is to either save everything in one computer's endian-ness and have all others convert, or to store a byte-order marker at the start of the file (e.g. the number 1 as a short, which is 0x0100 on Intel, and 0x0001 on PPC), and then to convert if the number is not 1.

Of course, if you save a whole struct, padding bytes that get inserted into structs to force better performance by aligning fields on their native boundaries can screw you over, because those can differ depending on CPU. I think on OS X most of the types are the same, but if you want it to be possible.

Also, some data types are not defined to be a certain, fixed size. E.g. in C and its descendents, 'int' can be anything from 2 bytes to 8 bytes (2 bytes was common on 680x0 Macs, 4 bytes is common on 32-bit Macs, 8 bytes is what they are on 64-bit Macs when compiling a 64-bit application). This means the same code always gets the native, 'fastest' data type to do its work with, but it also means saving a struct to disk will not be cross-platform safe.

So, your saving code will have to take care of bridging those differences: I generally don't save whole structs, and rather each field separately, as you'd do with NSArchiver. While doing that, you can also perform endian-swapping as needed, expand smaller ints to bigger ones, all transparently using preprocessor macros or other compile-time facilities if you design things right.

Reader Comments: (RSS Feed)
Ahruman writes:
The 64-bit Mac architectures do not use 64-bit ints, although 64-bit Windows does. While we�re type-lawyering, �byte� doesn�t necessarily mean 8 bits, either in a general context or in the special definition used for C. :-)
Jean-Daniel Dupas writes:
@Ahruman: Neither OS X, nor Windows use 64 bits 'int'. On OS X (LP64 model) 'long' and 'long long' are both 64 bits (in 64 bits code) and on Windows 64 (LLP64 model), only the 'long long' type is 64 bits.
Blake C. writes:
Regarding 32 to 64-bit data sizes on OS X only- the change is simply that longs and pointers got bumped from 4 to 8 bytes. Legacy code that uses ints to represent pointers will break in 64-bit mode, as will code that assumes void* == 4 bytes, among other obvious issues.
Or E-Mail Uli privately.