Uli's Web Site
[ Zathras.de - Uli's Web Site ]
Other Sites: Stories
Pix
Abi 2000
Stargate: Resurgence
Lost? Site Map!
 
 
     home | blog | moose | programming | articles >> blog

 Blog
 
 Archive
 
 Blog Topics
 

15 Most Recent [RSS]

 Less work through Xcode and shell scripts
2011-12-16 @600
 
 iTunesCantComplain released
2011-10-28 @954
 
 Dennis Ritchie deceased
2011-10-13 @359
 
 Thank you, Steve.
2011-10-06 @374
 
 Cocoa Text System everywhere...
2011-03-27 @788
 
 Blog migration
2011-01-29 @520
 
 All you need to know about the Mac keyboard
2010-08-09 @488
 
 Review: Sherlock
2010-07-31 @978
 
 Playing with Objective C on Debian
2010-05-08 @456
 
 Fruit vs. Obst
2010-05-08 @439
 
 Mixed-language ambiguity
2010-04-15 @994
 
 Uli's 12:07 AM Law
2010-04-12 @881
 
 Uli's 1:24 AM Law
2010-04-12 @874
 
 Uli's 6:28 AM Law
2010-04-12 @869
 
 Uli's 3:57 PM Law
2010-04-12 @867
 

More...

Saving files correctly

Recently, someone asked whether it was OK to use the ANSI standard library's fread and fwrite functions to write stuff to disk. The question was in the context of a Cocoa Objective C application, but I'll keep it a bit more general because the problem is not unique to Cocoa. The answer is:

It depends.

If you take account of some differences, fread() or any other read/write method that uses raw bytes is OK. However, most object-oriented frameworks and even procedural libraries provide more high-level file access that serializes whole object graphs into streams of data.

If you're saving objects in an object-oriented language, they are effectively structs for this discussion. In many languages, the root object's instance variables are private, so you can't save those, and need to re-create the right subclass after loading an object, too. Framework classes like NSKeyedArchiver and Co. take care of that for you. So, if you're not doing something that needs to be loaded partially and needs to be random access, an archiver is usually less hassle.

Why? Well, the endian-ness (order in which the individual bytes of a multi-byte type like an int are stored) differs between some platforms, and if you save on one platform and read on the other (e.g. PowerPC Mac -> Intel Mac), your numbers get 'reversed' and you get nonsense numbers.

The usual solution is to either save everything in one computer's endian-ness and have all others convert, or to store a byte-order marker at the start of the file (e.g. the number 1 as a short, which is 0x0100 on Intel, and 0x0001 on PPC), and then to convert if the number is not 1.

Of course, if you save a whole struct, padding bytes that get inserted into structs to force better performance by aligning fields on their native boundaries can screw you over, because those can differ depending on CPU. I think on OS X most of the types are the same, but if you want it to be possible.

Also, some data types are not defined to be a certain, fixed size. E.g. in C and its descendents, 'int' can be anything from 2 bytes to 8 bytes (2 bytes was common on 680x0 Macs, 4 bytes is common on 32-bit Macs, 8 bytes is what they are on 64-bit Macs when compiling a 64-bit application). This means the same code always gets the native, 'fastest' data type to do its work with, but it also means saving a struct to disk will not be cross-platform safe.

So, your saving code will have to take care of bridging those differences: I generally don't save whole structs, and rather each field separately, as you'd do with NSArchiver. While doing that, you can also perform endian-swapping as needed, expand smaller ints to bigger ones, all transparently using preprocessor macros or other compile-time facilities if you design things right.

Reader Comments: (RSS Feed)
Ahruman writes:
The 64-bit Mac architectures do not use 64-bit ints, although 64-bit Windows does. While we’re type-lawyering, “byte” doesn’t necessarily mean 8 bits, either in a general context or in the special definition used for C. :-)
Jean-Daniel Dupas writes:
@Ahruman: Neither OS X, nor Windows use 64 bits 'int'. On OS X (LP64 model) 'long' and 'long long' are both 64 bits (in 64 bits code) and on Windows 64 (LLP64 model), only the 'long long' type is 64 bits.
Blake C. writes:
Regarding 32 to 64-bit data sizes on OS X only- the change is simply that longs and pointers got bumped from 4 to 8 bytes. Legacy code that uses ints to represent pointers will break in 64-bit mode, as will code that assumes void* == 4 bytes, among other obvious issues.
Comment on this article:
Name:
E-Mail: (not shown, hashed for Gravatar)
Web Site URL: (optional)
Comment: (plain text only)
Please Enter the following word:
Or E-Mail Uli privately.

 
Created: 2007-12-29 @862 Last change: 2014-04-16 @708 | Home | Admin | Edit
© Copyright 2003-2014 by M. Uli Kusterer, all rights reserved.