Post subject: Using delta encoding/compression on my files... Posted: Wed Jun 20, 2012 1:32 am
1337 Beta Collector
Joined Thu Nov 29, 2007 11:33 pm
Posts 3238
Location Where do you want to go today?
Favourite OS All Microsoft operating systems!
...well, I am now trying to archive as many modern Windows builds here as possible, since I'm about to leave for our other home, where no Internet connection is set up, meaning that I'll have to visit a certain public access site several times a week.
Fortunately, though, their connections are extremely fast, as far as I know, and the systems themselves are very modern (from either 2010 or 2011, with Windows 7 Service Pack 1 installed on them), however, my problem is that I'll eventually need to start downloading the more modern Windows builds (Longhorn/Windows Vista/Server 2008 and Windows 7/Server 2008 R2, as well as redownloading the Windows 8/Server 2012 builds), meaning that I'll otherwise run the risk of being cut off upon having to return home.
Otherwise, that is, except that I'm now trying to archive as many of the modern Windows's as possible because of this.
Remember when mrpijey discussed how, in an effort at saving space, we would start using delta compression to store our different versions of Windows XP, Windows Server 2003, Windows Vista/Server 2008, Windows 8/Server 2012, and so on, once they're added to the FTP server (upon being declared as "abandonware")? He said that it would be necessary to save space, not to mention that he had some amazing results trying it out himself.
Well, I've decided to try using it myself on the copies of Windows 7 that I'm downloading, just to see how well it works on my own files. I found out about some utility available for Windows (found out about it through Wikipedia, that is), known as xdelta, and written by someone named Joshua MacDonald. Once I'm finished with all but the one file that wants to wait eight hours to download for some reason, I'll work with this (apparently quite useful) tool, and see what I can do.
Basically, I'll take the latest build that I happen to have at the moment, and use that as a basis for the patch files required to restore the archives of the earlier builds.
I'll post the progress of this technique once I start trying it out just to see how well it works.
_________________ Main operating system: Windows 8 Enterprise (Evaluation) Windows 8 real life sightings (not counting Windows Phone 8): 2 (Client)
Post subject: Re: Using delta encoding/compression on my files... Posted: Wed Jun 20, 2012 2:23 am
1337 Beta Collector
Joined Wed Sep 28, 2011 9:31 am
Posts 1199
Favourite OS Windows 8 Pro MCE
Is delta compression better than say 7z/LZMA? In size thats achieved, time isn't as big of an issue. I suppose it depends on filetypes involved etc. when I did tests some compression types worked better for text, others for pictures. I'll need to read about it I guess, and test it maybe.
The more compressed the files on the FTP are the better I think (except for quality maybe), since it uses less space on FTP, less quota when downloading, and we don't need to archive it ourselves on our side.
PS. offtopic. Someone said on another thread about compressing with recovery sections, is there a way to add somesort of recovery sectors to an archive in case it gets corrupted?
Just read that xdelta is for binary files, while I suppose LZMA would not be made for binary files.
Last edited by john11 on Wed Jun 20, 2012 3:14 am, edited 1 time in total.
Post subject: Re: Using delta encoding/compression on my files... Posted: Wed Jun 20, 2012 3:01 am
1337 Beta Collector
Joined Thu Nov 29, 2007 11:33 pm
Posts 3238
Location Where do you want to go today?
Favourite OS All Microsoft operating systems!
Well, I'm now up to 106 MB on the patchfile for Windows 7 Build 6469. I'm going to wait and see how it goes. If it compresses well, then I'll keep it as it is, but if not, then I'll try decompressing the actual .ISO first
I'll continue to keep everyone updated on my progress with the whole experiment.
_________________ Main operating system: Windows 8 Enterprise (Evaluation) Windows 8 real life sightings (not counting Windows Phone 8): 2 (Client)
Post subject: Re: Using delta encoding/compression on my files... Posted: Wed Jun 20, 2012 3:59 pm
1337 Beta Collector
Joined Thu Nov 29, 2007 11:33 pm
Posts 3238
Location Where do you want to go today?
Favourite OS All Microsoft operating systems!
Andy wrote:
I don't know why you're doing this yourself when mrpijey is already in the process of doing it. You would be better off waiting.
mrpijey, as far as I know, is only doing it for the RTM Windows 7 releases once they become abandonware, and not for any of the beta releases.
Also, I'm doing it myself now, because now is when I need it done (for a project of mine), whereas even if mrpijey was in the process of doing it himself, it most likely wouldn't be done for many more months, due to many other projects and obligations of his that he needs to attend to.
_________________ Main operating system: Windows 8 Enterprise (Evaluation) Windows 8 real life sightings (not counting Windows Phone 8): 2 (Client)
Post subject: Re: Using delta encoding/compression on my files... Posted: Wed Jun 20, 2012 4:19 pm
1337 Beta Collector
Joined Tue Dec 14, 2010 4:02 pm
Posts 5407
I think you're missing the point... He's doing it on many editions that are available. You're doing it on entirely different builds... Of course the difference between files will be big. It's really more usable on different edition of one build than on different builds.
Post subject: Re: Using delta encoding/compression on my files... Posted: Wed Jun 20, 2012 4:33 pm
Site Moderator
Joined Sat Feb 24, 2007 4:14 pm
Posts 5933
Location United Kingdom
Favourite OS Server 2012
Precisely. If you have two completely different files, delta encoding will save you sod all. Delta encoding is essentially little more than a file which says "with these exceptions, same as this other file". So by doing it with two different files, all you get is essentially "with the exception of the entire file, it's the same as this other file", which is why mrpijey is doing it only between releases with a degree of commonality - such as releases that differ only in SKU or language.
I do believe his intention is to roll it out across the entire FTP, rather than just for certain future releases as you claim.
Post subject: Re: Using delta encoding/compression on my files... Posted: Wed Jun 20, 2012 4:34 pm
Site Administrator
Joined Fri Aug 18, 2006 11:47 am
Posts 11467
Location Merseyside, United Kingdom
Favourite OS Microsoft Windows 7 Ultimate x64
Correct we do want to do it over the entire FTP where possible. The problem arises when people try to download such archives. This is still being worked out.
Post subject: Re: Using delta encoding/compression on my files... Posted: Wed Jun 20, 2012 8:24 pm
1337 Beta Collector
Joined Thu Nov 29, 2007 11:33 pm
Posts 3238
Location Where do you want to go today?
Favourite OS All Microsoft operating systems!
Well, I was able to save about 300 MB on Windows 7 Build 6469. However, I've since decided that it's just not worth it, since not only does it take up time (something that I have very little of at the moment, due to leaving for our other home, where I'll have to use a public access site to access the Internet, possibly being problematic for downloading large files such as the more modern Windows builds, which is why I'm archiving them in the first place), but I also found out the real cause of the loss of space on one of my hard drives (apparently, it was various things from my work with virtual machines, which I no longer needed and could have just as easily been safely deleted).
Well, I'm now trying to archive several other things for this project of mine (where I basically test every build that I happen to have available), but after that, I'll also start trying to archive the Windows 8/Server 2012 Release Preview/Release Candidate.
_________________ Main operating system: Windows 8 Enterprise (Evaluation) Windows 8 real life sightings (not counting Windows Phone 8): 2 (Client)
Post subject: Re: Using delta encoding/compression on my files... Posted: Wed Jun 20, 2012 10:50 pm
Amateur Beta Collector
Joined Sat Feb 12, 2011 12:12 pm
Posts 206
Favourite OS Windows 8 RTM 9200
But what if a wants to download a file? He'd need to download the base iso + the correct patch file. It does increase bandwidth slightly but does indeed save a lot of space.
Could it be possible to do on the flight patching? Like when the download system does: Let the user download the file the usual way. It takes some data, packs it into a network package and sends it to the user. BUT when the data that goes into the package is listed as different in the delta file: send the data from delta instead.
It's clear that this isn't possible with ftp, but is this even possible with http? You'd have to replace the whole (or part of it) system on the server that handles http traffic from server to user.
Another possibility is to build the iso on the server and delete it again after the download is complete, but that's recourse expensive. Maybe a BA download manager can give an outcome?
Post subject: Re: Using delta encoding/compression on my files... Posted: Thu Jun 21, 2012 1:57 am
Site Administrator
Joined Tue Feb 12, 2008 5:28 pm
Posts 3085
First of all, I am still experimenting with it, and my results can be seen both here and here. And you won't save much between various builds since Microsoft tends to update the files between builds, which means they will be different enough to make delta compression useless. It would only work best within the same build. I am going to test it on a few beta builds we have (builds with have many SKUs that is) and see if we can make any savings, but this delta compression is mainly for the RTM products as they do not change anymore.
The first post was just an experiment using the english RTM builds, the second post is about ALL languages as I understand that we want to preserve all the languages, and thus have to find an optimized way of saving them and saving space at the same time. But understand, delta compression isn't some magic way to replace traditional compression, exactly as Hounsell meant delta compression is "these two files are identical except for......" and then only the differences and their binary positions are stored in what's called a "patch". Basically you patch a file with a difference file to create a new file. So yes, if you download a delta patch you also need the "source" file so you can recreate the other file. This method rarely works on betas unless very few files has changed between builds.
Also, decompressing the ISO will make all of this pointless since if you lose the structure of the original file then there's no point keeping it all. Not to mention the "originality" of it vanishes with it.
Stannieman: No, that will be impossible with our current FTP system, sure it could be implemented on a file system basis on the server but then we would have to change the OS, filesystem and basically everything else which is something we will not do. Otherwise we would have to create some kind of custom download service that could do on the fly delta patching. And yes, it could be done on the server prior to download, but that would mean using up even more precious space, not to mention resources. It would also require a completely different file structure and a lot of custom scripts. It's just not worth it when the user just have to download the archive and do the patching himself with a batch file that I will include with every archive containing delta patches. I can even include the xdelta binary just to simplify things. But the user will always have to download the source file if he wants to recreate the remaining files. If his intentions are to save the multiple SKUs then he will still do a lot of saving by downloading source+patches than all the full sources individually. That's the whole reason I looked into the delta compression system at all. What's better, downloading 200GB (source+patches) or 1200GB (individual ISO files) if you happen to want all the SKUs? If you only want one particular SKU and it's a patch then I am sorry, you will have to live with the extra download to grab the source file first. It's unavoidable. It's either that or no ISOs at all as keeping the individual ISOs is unrealistic due to the massive amount of space needed.
Using delta compression was just something I investigated to make considerable space savings, but I am not ready to redesign the entire forum and BA server around it. That's simply unnecessary work for something that the end user can do himself by double clicking a file . If we implement delta compression archives on the FTP it will be announced on the forum, and detailed instructions will be given on how to handle these on the three most common platforms: Windows, Linux and OS X. But I assure you, it will not be more than running an included script or batch file.
Post subject: Re: Using delta encoding/compression on my files... Posted: Thu Jun 21, 2012 2:24 pm
Site Administrator
Joined Tue Feb 12, 2008 5:28 pm
Posts 3085
Yeah, but that requires that the contents are basically the same (as they are between VOL and RTL etc). Atm I am packing up SBS2003R2 and I've made 0 savings since there are no similiar files anywhere, so it's 4GB packs per language.
Post subject: Re: Using delta encoding/compression on my files... Posted: Thu Jun 21, 2012 3:35 pm
Amateur Beta Collector
Joined Sat Feb 12, 2011 12:12 pm
Posts 206
Favourite OS Windows 8 RTM 9200
I wouldn't use it between languages (and surely not between architectures), but for example vista sp2 retail dvd to enterprise dvd of same lang and arch can be done in ~120MB. Same edition but different languages for wim based isos is possible, but you'll end up with a patch of almost 2GB, which is not worth it.
But since (except for enterprise) the difference between windows 7 editions is only in ei.cfg there will be a massive saving for that.
I wonder what windows 7 -> server 2008R2 does...
So I'd say: for each buildnumer/architecture/language combination 1 base iso and all sku's made by xdelta.
EDIT: I have no idea if such a thing exists, but maybe there are filesystems that implement delta compression? Then the source file can not only be a file, but any data stored on the disk. Say if file A is actually file B concatenated with file C (with both B and C stored in the FS already) then there would be 0 extra storage. With xdelta (and only 1 source file) always either the B or C part has to be stored. Then if B or C is deleted then of course that part of the data that's also in A isn't really deleted, only the reference that file B or C exists as a file is removed.
You see the point?
That would of course mean that there should be some index table linking to certain frequently used data paterns, or otherwise the whole disk needs to be scanned every time new data is written.
But imagine how much space this would save say at MS servers, where ALL SKU's for ALL languesges for winXP for example are stored.
If you know that different sku's only differ by max 12MB and different languages about 300MB, that can save a looot of space without having to worry about using xdelta. The filesystem does everything for you. I've seen that creating the delta files requires some CPU power (cause the files need to be binary compared), but restoring the files is blazing fast, so I could see this being used on servers such as MSDN download.
This is also interestiong: http://cis.poly.edu/suel/papers/delta.pdf On page 4 there is something about an xdelta file system, and I think the document also tells about networking. In the networking thing only data that the receiver doesn't have already is send over the network.
EDIT: Apparently it already exists for a long time and it's even introduced as a new feature in server 2012. It's called data deduplication.
Users browsing this forum: No registered users and 2 guests
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum
All views expressed in these forums are those of the author and do not necessarily represent the views of the BetaArchive site owner.