Removing trailing zeroes from ISOs

Problem with the site? Got a suggestion? Got feedback? Post here and the staff will discuss it with you.
Post Reply
mrpijey
User avatar
Administrator
Posts: 9193
Joined: Tue Feb 12, 2008 5:28 pm
Contact:

Removing trailing zeroes from ISOs

Post by mrpijey »

This topic was created from this topic as it merits its own discussion and not part of the errata topic.
DOS wrote: Something I've been thinking of for a while, now seems like an appropriate time to mention it: If at some point you create a list or database of all the files on the FTP site (not the compressed versions), I think it would be nice to not only provide the hash of the file, but also the hash of the file after removing all the trailing zero bytes!
mrpijey wrote:To do that database thing I would need to find a way to trim the end zeroes from the files, and I just wonder if it's worth it. After all, the contents of the disc will be indexed as well...
DOS wrote: I think it's useful because it's one hash to compare to verify that the discs are essentially identical, vs. a whole bunch of hashes to compare to verify that they have the same set of files which doesn't necessarily prove that it's the same disc, and comparing the single hash is a lot easier too.

I imagine it would be trivial to write a tool to do the trimming. I imagined writing something that I could use in a Linux pipeline:

trim_trailing_zeros < file.iso | sha1sum

so I'm not actually storing a second copy of the file on disk. I'd be happy to write something like this if it'd be useful.
It's a good idea, but are the trailing zeroes the ISOs original? Ie, was it added by Microsoft in some of their ISOs or did some disc dumpers just add them? The reason I ask is because if they are original (by Microsoft) they should be kept as-is, and hashed as-is. If they are not original then all discovered ISOs with trailing zeroes should be trimmed and repacked.
Image
Official guidelines: Contribution Guidelines
Channels: Discord :: Twitter :: YouTube
Misc: Archived UUP

claunia
Posts: 49
Joined: Tue Aug 07, 2012 3:08 pm

Re: Removing trailing zeroes from ISOs

Post by claunia »

Most tools add padding zeros to a 2048 bytes multiple, some tools add padding zeros to a 2 seconds multiple, and when you make an iso from a recordable disc you usually get 2 seconds of zeros (aka 150 sectors).

DOS
User avatar
Posts: 206
Joined: Sun Mar 16, 2014 6:56 am

Re: Removing trailing zeroes from ISOs

Post by DOS »

mrpijey wrote:This topic was created from this topic as it merits its own discussion and not part of the errata topic.
Thanks!
It's a good idea, but are the trailing zeroes the ISOs original? Ie, was it added by Microsoft in some of their ISOs or did some disc dumpers just add them? The reason I ask is because if they are original (by Microsoft) they should be kept as-is, and hashed as-is. If they are not original then all discovered ISOs with trailing zeroes should be trimmed and repacked.
I had a look at .isos I've downloaded directly from Microsoft and they end in zeros. So I wouldn't say that the hash of the file with the zeros removed is the "correct hash" of the "actual image" or anything, it's just a tool for helping to detect images which aren't actually identical but are effectively identical.

Certainly if you gave me two .iso files, this technique would tell me that they're both effectively identical, but it wouldn't tell me which one was the original one from Microsoft. I don't know that there's any way to figure that out.

So I definitely wouldn't suggest that the "hash of the file with trailing zeros removed" be the only hash you record, just an extra one that is sometimes useful.

Thanks for the interesting information, claunia! I can certainly see that the size of the .iso image from MS I'm looking at is a multiple of 2KiB.

I wonder if various tools would complain if you were to trim all the zeros off the end of an .iso file? I certainly wouldn't try trimming the zeros off myself, it doesn't seem like a useful thing to do, particularly if it's only going to save me up to 2KB per file :)

mrpijey
User avatar
Administrator
Posts: 9193
Joined: Tue Feb 12, 2008 5:28 pm
Contact:

Re: Removing trailing zeroes from ISOs

Post by mrpijey »

Well, in this case I am not sure what would be best. But it sure adds to my opposition of ISO as it doesn't seem to be any good way to make sure an ISO is complete or not.

If ISO managers in general works with trimmed zeroes then everything should be trimmed. But MS originals come with the zeroes, which means they would become less original if we trim it, thus going against what we are trying to accomplish....

In the end it seems better to just hash the ISO as it is, and also keep track of the contents. What we could do however is to has the header instead, if the header is identical between a trimmed and non-trimmed ISO then we can safely assume the contents should be the same as well...
Image
Official guidelines: Contribution Guidelines
Channels: Discord :: Twitter :: YouTube
Misc: Archived UUP

Post Reply