The Mudcat Café TM
Thread #100469   Message #2016901
Posted By: JohnInKansas
04-Apr-07 - 11:13 PM
Thread Name: Tech: Saving a Website
Subject: RE: Tech: Saving a Website
1. Saving from hard drive to CD (or DVD) can result in some filename changes, because the rules are different for the two. A couple of characters that are legal in filenames on your hard drive aren't on a CD, and vice-versa. The big difference is that every "filename" on a CD has to be written as a complete "path+file," and the number of characters that can be included (the length of the name) is only about half as many as can be used on a hard drive.

Both of the CD programs I use, Nero and Roxio, will tell you if they have to change a name, but it's not always easy to figure out whether the change is "acceptable" and especially with Nero (Burning ROM) it can be extremely difficult to figure out exactly which file is being changed, especially if you've got a couple of thousand small files in the burn.

For a simple data backup, you can usually figure out which file you need when you read the stuff back from the CD; but if a file "calls" other files, which is common with html documents, the file can't find a linked file except by its original name, so the links may be demolished when you burn to a CD.

1.a. The suggestion to zip the folder is probably a good one. When file compression was more of a "new thing" there were several competing compression methods, and files compressed with one program might not uncompress with a different program. The methods most used are now sufficiently "standard" that it's seldom a problem finding a program to uncompress with. If you use one of the "complete" zippers, like WinZip, you can make the file "self-extracting" so that the unzip formula is built into the file. Just double-click it and it unzips itself.

The file you get if you make it self-extracting is an .exe rather than a .zip, which seems to confuse some people; but it shouldn't be a problem in your own archive. The self-extracting form also is slightly larger than the raw .zip as well, but the difference isn't usually a problem.

1.b. Another "safety" method one might use relies on the ability of most browsers to "save as" a web archive (.mht) file, in which all of the linked bits are embedded within the file, so the links are "protectd" from any filename changes during the CD burn. Restoring the file to a .htm format if a full recovey is needed would of course require some extra steps if this method is used, but it should make a "safe" - if slightly inconvenient - archive on a CD.

2. The old drive compression programs relied mostly on rewriting so that the entire drive/partition is a single file. Since every file on a hard drive has to start in a new cluster, an average of half a cluster is "wasted" for each file. When disks started getting lots of clusters, and the clusters got "really big" that amounted to lots of waste space. The "compression" program simply made it's own index of where each "file" started within the "compressed partition" so that the whole drive/partition could ignore the cluster breaks. Just as you don't have to have a separate file for each page of a document, with this method you didn't have to have a separate file for each file.

Unfortunately, if the index got lost or corrupted, nothing was readable, and this tended to happen fairly often(?).

A few of these "disk-compression" programs may have applied some additional actual "zip style" compression; but it was the elimination of the "cluster slop" that provided most of the space saving.

3. Not that all this matters much with our new TB sized drives, of course.

John