BetaArchive Logo
Navigation Home Screenshots Image Uploader Server Info FTP Servers Wiki Forum RSS Feed Rules Please Donate
UP: 6d, 11h, 49m | CPU: 37% | MEM: 2140MB of 4493MB used
{The community for beta collectors}

Post new topic Reply to topic  [ 23 posts ] 
Author Message
 PostPost subject: What about download.microsoft.com, support.microsoft.com?        Posted: Wed May 24, 2017 11:15 pm 
Reply with quote
FTP Access
Offline

Joined
Wed Jan 11, 2017 12:37 pm

Posts
58

Favourite OS
Whistler or Longhorn?
I just discovered that the Wayback Machine coverage for these domains is not really amazing.

I tried to download many files linked to the Windows Live ID system (Windows Live ID Client 1.0 SDK Alpha Refresh, Windows Live ID Delegated Authentication SDK, Windows Live ID Web Authentication SDK,...) and none of them had been saved in the Wayback Archive. The download page can be found but the actual download files are not available...

It seems that globally, Microsoft web content is destroyed and not well preserved. The same conclusion can be made for the MSDN documentation, the KB articles (and their URLs changed many times during the life of the support site rendering them hard to find in the WBA), ... For now, it seems only the old Windows Updates (hotfixv4.microsoft.com, download.windowsupdate.com) are still available but for how long?

Do you know of any initiative to backup those sites?


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Wed May 24, 2017 11:54 pm 
Reply with quote
FTP Access
Offline

Joined
Sun Mar 05, 2017 10:13 pm

Posts
13
You can try https://web-beta.archive.org/web/*/(DOMAIN HERE)/* if you haven't already. It scans everything archive.org has related to it.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Thu May 25, 2017 12:15 am 
Reply with quote
FTP Access
Offline

Joined
Wed Jan 11, 2017 12:37 pm

Posts
58

Favourite OS
Whistler or Longhorn?
Thanks. Yes, I tried and discovered the capture of these domains is really limited... for the hotfixv4.microsoft.com, there is 589 files :(

For download.microsoft.com, I tried all these files and only 2 were saved (scary, to say the least!):

http://download.microsoft.com/download/ ... rtsSDK.msi
http://download.microsoft.com/download/ ... itySDK.zip
http://download.microsoft.com/download/ ... entSDK.msi
http://download.microsoft.com/download/ ... th-1.0.msi
http://download.microsoft.com/download/ ... va-1.0.zip
http://download.microsoft.com/download/ ... 1.0.tar.gz
http://download.microsoft.com/download/ ... 1.0.tar.gz
http://download.microsoft.com/download/ ... 1.0.tar.gz
http://download.microsoft.com/download/ ... 1.0.tar.gz
http://download.microsoft.com/download/ ... cs-1.2.msi
http://download.microsoft.com/download/ ... va-1.2.zip
http://download.microsoft.com/download/ ... 1.2.tar.gz
http://download.microsoft.com/download/ ... 1.2.tar.gz
http://download.microsoft.com/download/ ... 1.2.tar.gz
http://download.microsoft.com/download/ ... vb-1.2.msi
http://download.microsoft.com/download/ ... 1.2.tar.gz
http://download.microsoft.com/download/ ... ebauth.msi
http://download.microsoft.com/download/ ... va-1.0.zip
http://download.microsoft.com/download/ ... 1.0.tar.gz
http://download.microsoft.com/download/ ... 1.0.tar.gz
http://download.microsoft.com/download/ ... 1.0.tar.gz
http://download.microsoft.com/download/ ... 1.0.tar.gz
http://download.microsoft.com/download/ ... th-1.1.msi
http://download.microsoft.com/download/ ... va-1.1.zip
http://download.microsoft.com/download/ ... 1.1.tar.gz
http://download.microsoft.com/download/ ... 1.1.tar.gz
http://download.microsoft.com/download/ ... 1.1.tar.gz
http://download.microsoft.com/download/ ... 1.1.tar.gz
http://download.microsoft.com/download/ ... cs-1.2.msi
http://download.microsoft.com/download/ ... va-1.2.zip
http://download.microsoft.com/download/ ... 1.2.tar.gz
http://download.microsoft.com/download/ ... 1.2.tar.gz
http://download.microsoft.com/download/ ... 1.2.tar.gz
http://download.microsoft.com/download/ ... 1.2.tar.gz
http://download.microsoft.com/download/ ... vb-1.2.msi
http://download.microsoft.com/download/ ... 008CTP.zip
http://download.microsoft.com/download/ ... 008CTP.zip


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Wed Sep 06, 2017 5:27 pm 
Reply with quote
FTP Access
Offline

Joined
Wed Sep 06, 2017 4:29 pm

Posts
4
You can also use the Wayback CDX Server API to get a space-delimited list of captures and metadata. This has the advantage of allowing you to filter by MIME type, status code, uniqueness, and so on.

For example, to get a list of 1000 unique files from everything the IA has captured for download.microsoft.com:
Code:
http://web.archive.org/cdx/search/cdx
?url=download.microsoft.com/download/
&matchType=prefix
&collapse=digest
&filter=statuscode:200
&limit=1000

Once you've filtered that list down to what you need, you can use the metadata to build a list of URLs to pass to wget:
Code:
https://web.archive.org/web/<timestamp>/<url-minus-protocol>

This isn't much help if the Wayback Machine never captured the files in the first place, unfortunately.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Sat Sep 09, 2017 3:00 pm 
Reply with quote
FTP Access
User avatar
Offline

Joined
Thu May 25, 2017 2:20 pm

Posts
135

Location
Somewhere in the USA
You can also use the wayback_machine_downloader Ruby gem. Downloading an entire website will take time so you may want to let it run overnight (and/or download to a spare external hard drive, based on the size of the website).


Sent from my iPhone using Tapatalk

_________________
Image
     TuneableSumo876


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Mon Sep 11, 2017 10:19 am 
Reply with quote
FTP Access
Offline

Joined
Wed Jan 11, 2017 12:37 pm

Posts
58

Favourite OS
Whistler or Longhorn?
Yes, I tried all these methods. Unfortunately, if the file has not been saved nothing much can done.

It seems no one really cares about saving the fils on those servers.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Mon Sep 11, 2017 6:59 pm 
Reply with quote
Offline

Joined
Tue May 26, 2015 7:28 am

Posts
25

Favourite OS
Win7
way back machine USED to save a complete list of download.microsoft.com, because I remember exploring the thousands upon thousands of files. But when they implemented robot.txt, it effectively lost the complete list of files located there, I wish they had just respected that from that time forward but they didnt. I just wish I had the foresight to have captured the listing.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Tue Sep 12, 2017 11:29 am 
Reply with quote
FTP Access
User avatar
Offline

Joined
Thu May 25, 2017 2:20 pm

Posts
135

Location
Somewhere in the USA
If the Wayback Machine didn't respect robots.txt someone would file a lawsuit against them, for not respecting robots.txt.

Therefore we have this issue.


Sent from my iPhone using Tapatalk

_________________
Image
     TuneableSumo876


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Tue Sep 12, 2017 5:15 pm 
Reply with quote
FTP Access
Offline

Joined
Wed Sep 06, 2017 4:29 pm

Posts
4
You might try capturing that listing again. As far as I'm aware, the IA has never actually removed anything in response to changes in robots.txt. On the contrary, they have recently started to relax their observance. It's true that for a time many domains became unavailable because of robots.txt abuse, but I don't seem to get the dreaded "Page cannot be crawled or displayed due to robots.txt" anymore.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Sun Sep 24, 2017 1:10 am 
Reply with quote
Donator
User avatar
Offline

Joined
Sun Aug 12, 2012 4:33 pm

Posts
1684

Location
Czechia

Favourite OS
MinWin
TuneableSumo876 wrote:
If the Wayback Machine didn't respect robots.txt someone would file a lawsuit against them, for not respecting robots.txt.

Therefore we have this issue.


Sent from my iPhone using Tapatalk

Not respecting robots.txt was not a criminal offense last time I checked.

_________________
Image
AlphaBeta, stop brainwashing me immediately!


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Sun Oct 22, 2017 7:16 pm 
Reply with quote
FTP Access
User avatar
Offline

Joined
Wed May 02, 2012 12:57 am

Posts
314

Favourite OS
Windows NT 3.x
I recently read a blog post about the old KB articles. Microsoft recently purged a lot of old KB articles going back to NT/2000/XP/95/98/ME and MS-DOS days. It's really a shame, because these KB articles contain information generally not found anywhere else. At least the Web Archive still works for those (but for how long?)

It was mentioned in that blog post that someone should form a collective to preserve old Knowledge Base articles so they don't get lost forever. It turns out, Microsoft used to publish KB articles on CD for a very long time, as part of Visual C++ or in the MSDN Library set of CD, or in the ancient days (1980s) as part of the Microsoft Programmer's Library. Some should also be found in the Microsoft FTP archive. The only work would be collecting them all, extracting them to a complete set and then archiving them.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Sun Oct 29, 2017 1:54 am 
Reply with quote
FTP Access
User avatar
Offline

Joined
Sun Mar 16, 2014 6:56 am

Posts
132

Favourite OS
DOS
tristanleboss wrote:
I tried to download many files linked to the Windows Live ID system (Windows Live ID Client 1.0 SDK Alpha Refresh, Windows Live ID Delegated Authentication SDK, Windows Live ID Web Authentication SDK,...) and none of them had been saved in the Wayback Archive. The download page can be found but the actual download files are not available...


I noticed Internet Archive has a 66GB dump of ftp.microsoft.com from 2015, I wonder if any of those things would be in there? I'm curious about what is in there but I don't know if I want to download 66GB!

3155ffGd wrote:
I recently read a blog post about the old KB articles. Microsoft recently purged a lot of old KB articles going back to NT/2000/XP/95/98/ME and MS-DOS days. It's really a shame, because these KB articles contain information generally not found anywhere else. At least the Web Archive still works for those (but for how long?)


I imagine you can't find them all on the web, as some of them are Microsoft promoting OS/2, and I think they decided to get rid of them *hehe*

There have been a few blog posts on this topic recently since three different bloggers were involved:

https://virtuallyfun.com/2017/10/17/mic ... es-online/
http://www.os2museum.com/wp/ms-kb-articles/
https://www.pcjs.org/blog/2017/10/13/ - has a link to where some recovered KB articles are being hosted

Quote:
The only work would be collecting them all, extracting them to a complete set and then archiving them.


It's not trivial, as MSDN CDs use various formats with proprietary extensions, and the Microsoft Programmer's Library's file format isn't documented, but some of this work is in progress.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Sun Oct 29, 2017 9:22 am 
Reply with quote
FTP Access
User avatar
Offline

Joined
Wed May 02, 2012 12:57 am

Posts
314

Favourite OS
Windows NT 3.x
DOS wrote:
and the Microsoft Programmer's Library's file format isn't documented

You can probably extract that with HELPMAKE.EXE included with Microsoft C 5.1/6.0 (haven't tried it myself). Surprised no one figured it out.

I'm also working on a Knowledge Base Archive. I already downloaded the FTP archive and much to my disappointment, it doesn't just stop in 1999, it's also missing several older articles, especially those related to MS-DOS and other DOS applications (Word for DOS, Visual Basic 16-bit, etc.), but also a few Windows 95/NT articles with no apparent pattern behind it. It will be a lot of work restoring everything. It also doesn't contain any really useful downloads, if you were wondering about that, just a lot of old junk.

Someone estimated at some point the Knowledge Base had 200,000 articles. Right now I have 62,000 articles. Just so you know the scope of this.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Sun Oct 29, 2017 10:32 am 
Reply with quote
FTP Access
User avatar
Offline

Joined
Sun Mar 16, 2014 6:56 am

Posts
132

Favourite OS
DOS
3155ffGd wrote:
DOS wrote:
and the Microsoft Programmer's Library's file format isn't documented

You can probably extract that with HELPMAKE.EXE included with Microsoft C 5.1/6.0 (haven't tried it myself).


No, that doesn't work. Also, if I recall correctly I had a look at the file and it doesn't look anything like the description of the Advisor help file format (or Windows 3.x .HLP file format).

Quote:
I'm also working on a Knowledge Base Archive.


Have you considered working with Jeff from pcjs.org since he's already made one public?


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Sun Oct 29, 2017 5:07 pm 
Reply with quote
FTP Access
User avatar
Offline

Joined
Wed May 02, 2012 12:57 am

Posts
314

Favourite OS
Windows NT 3.x
DOS wrote:
No, that doesn't work.

Hrm. That sucks.

DOS wrote:
Have you considered working with Jeff from pcjs.org since he's already made one public?

Is he actually planning to expand his library beyond what's already there? It was my impression that he was keeping just those few articles, but if you know more please tell me.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Mon Oct 30, 2017 8:53 am 
Reply with quote
FTP Access
User avatar
Offline

Joined
Sun Mar 16, 2014 6:56 am

Posts
132

Favourite OS
DOS
There's some more at https://jeffpar.github.io/kbarchive/, and I've been in discussion with him about extracting information from old MSDN CDs.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Wed Nov 01, 2017 10:12 pm 
Reply with quote
FTP Access
User avatar
Offline

Joined
Wed May 02, 2012 12:57 am

Posts
314

Favourite OS
Windows NT 3.x
That's interesting, didn't know about that.

I downloaded the MSDN January 2000 DVD and extracted the Knowledge Base articles. It was super easy since Microsoft used standard CHM files which you can easily extract with HTML Help Workshop, and the individual KB articles are even neatly sorted by KB number and topic. Certainly much more pleasant to deal with than the things that came before (.MXS/.HLP) and after (.HXS). The MSDN DVD contained on the order of 80,000 Knowledge Base articles, after copying everything over to my collection and using some script magic to identify duplicates, I now have exactly 107,055 Knowledge Base articles.

The hardest thing now is downloading every MSDN Library CD set and identifying Knowledge Base articles that are still missing, as apparently articles tended to disappear pretty randomly. I downloaded the first MSDN Library from 1992 and it contains a LAN Manager category that is completely missing in my collection so far. Even harder will be identifying and finding Knowledge Base articles Microsoft never put on MSDN, especially those involving their former "Microsoft Home" and "Microsoft Games" departments. I have no idea if those appeared anywhere, the MS FTP archive had some entries here and there and there's also MNY.EXE also on the FTP under the Softlib, but not much more.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Thu Jan 25, 2018 5:47 pm 
Reply with quote
FTP Access
User avatar
Offline

Joined
Wed May 02, 2012 12:57 am

Posts
314

Favourite OS
Windows NT 3.x
So I just wanted to give a quick heads-up.

I've been working on this up to just before this Christmas, after that the project got a little stale unfortunately. Right now I have a total of 202735 unique knowledge base articles spanning until around the end of 2007. This is only a work in progress though, I just stopped in the middle of my work and if I finish I'll probably end up around ~205000 to 210000 knowledge base articles.

I just wanted to know, is there actually any interest in me publishing this little archive? The folder is a little big, right now it's 1.14 GB uncompressed and even when compressed in a solid RAR archive it only goes down to 150 MB, so it will be a bit difficult to distribute. I've also been thinking of maybe making a HTML Help file out of this, if it is technically possible and feasible; it would require a lot of work though because right now I have a wild mix of .htm and .txt files coming from different sources and thus having completely different structures.

I'll finish the project anyway at some point (once I get some motivation and less stress with other things) but what's slightly depressing is that even with this amount of KB articles there are still many glaring gaps in the collection (Windows NT/2000 KB articles post-2003 are missing completely as well as anything gaming-related, especially Xbox/Zune). I don't know what to do about this, of course I could hunt for knowledge base articles in the Internet Archive but that's gonna involve a lot of work.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Fri Jan 26, 2018 2:38 pm 
Reply with quote
FTP Access
User avatar
Offline

Joined
Sun Mar 16, 2014 6:56 am

Posts
132

Favourite OS
DOS
3155ffGd wrote:
Right now I have a total of 202735 unique knowledge base articles spanning until around the end of 2007.


Nice! Is that from MSDN Library, TechNet, both, and/or other sources?

Quote:
I just wanted to know, is there actually any interest in me publishing this little archive? The folder is a little big, right now it's 1.14 GB uncompressed and even when compressed in a solid RAR archive it only goes down to 150 MB, so it will be a bit difficult to distribute.


I'm interested!

Quote:
a wild mix of .htm and .txt files coming from different sources and thus having completely different structures.


I'm surprised that you don't have RTF from decoding Multimedia Viewer files?

Quote:
I'll finish the project anyway at some point


You're doing better than me at least :)


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Fri Jan 26, 2018 5:44 pm 
Reply with quote
FTP Access
User avatar
Offline

Joined
Wed May 02, 2012 12:57 am

Posts
314

Favourite OS
Windows NT 3.x
By now I have worked through all the MSDN CDs from July 2007 to October 1994, and I have started work on one of the TechNet CDs. TechNet takes a lot longer to process because it contains so many categories that are not covered by the MSDN CDs, especially things like games, Microsoft Bob, Word for DOS, Microsoft Works etc. A few KB articles also come from other sources like MSPL 1.3 or the Windows NT 3.1 KB archive that's on shareware CDs.

Basically what I did was:

* Extract all Help2 HTML files with hxcomp.exe
* Extract all Help HTML files with HTML Help Compiler
* Recompile ivtlist.exe because the source code has a bug causing the program to fail on the MSDN CDs, then use it to exact the IVT HTML files
* Recompile helpdeco.exe because that also has a bug in the source code causing it to fail on some CDs, then use it to extract the .MVB files

As for the RTF files - I learned the hard way that Microsoft Word 2007 has a hardcoded 512 MB limit for RTF files, and will refuse to open files larger than that. It also cannot support more than 32,767 pages without having to turn off page view mode. WordPad from Windows 7 x64 did work but was so horribly slow that it was unusable. So I had to resort to a different solution - open the RTF files in a text editor and use some clever search & replace to remove all RTF specific formatting to end up with a plain text file.

I used a few helpful scripts - one that removes all .txt files which already have a corresponding .htm file, and one that automatically searches through .RTF files and reports all KB articles by number which are not yet present in the database.

The hard work is mostly with manually extracting the individual KB articles from the RTF files to individual .txt files. It probably could be scripted but I don't have the necessary scripting knowledge to do that.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Sat May 12, 2018 5:24 am 
Reply with quote
FTP Access
User avatar
Offline

Joined
Sun Mar 16, 2014 6:56 am

Posts
132

Favourite OS
DOS
3155ffGd wrote:
* Recompile helpdeco.exe because that also has a bug in the source code causing it to fail on some CDs, then use it to extract the .MVB files

Is this the "Allocation of 0 bytes failed. File too big." issue (covered by https://sourceforge.net/p/helpdeco/bugs/1/)? If so, is there any chance you could share the patch?


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Sat May 12, 2018 9:32 am 
Reply with quote
FTP Access
User avatar
Offline

Joined
Wed May 02, 2012 12:57 am

Posts
314

Favourite OS
Windows NT 3.x
Could be that one, I don't remember, too long ago. Basically what I did is in this if:

Code:
if((groups||multi)&&(browsenums>1))


comment out the check for multi, like this:

Code:
if((groups/*||multi*/)&&(browsenums>1))


Symptoms of this bug is that helpdeco will choke right after getting to the [GROUPS] section. This change has no side effects from what I noticed.


Top  Profile
 PostPost subject: Re: What about download.microsoft.com, support.microsoft.com        Posted: Sat May 12, 2018 11:23 am 
Reply with quote
FTP Access
User avatar
Offline

Joined
Sun Mar 16, 2014 6:56 am

Posts
132

Favourite OS
DOS
Thanks! Unfortunately that doesn't fix the issue I'm hitting :(

Edit: I remembered that I forgot to recompile *hehe* It did help, thanks!


Top  Profile
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ] 




Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

All views expressed in these forums are those of the author and do not necessarily represent the views of the BetaArchive site owner.

Powered by phpBB® Forum Software © phpBB Group

Copyright © 2006-2018

 

Sitemap | XML | RSS