Skip to main content

View Post [edit]

Poster: Etemen Date: Nov 4, 2013 5:31am
Forum: forums Subject: How to short "dir/*" search with Wayback machine



Respected Archive.org,

One can only sing the praise how much I’m grateful for you existence. Wayback is (if you know how) so much more then just a window to past and for that I thank you, well, almost on daily basis - each day.

But something troubles me, if you would be so kind to explain.

When searching with Wayback, targeting specific extension within search, or to be precise dir/*

As you know. depending of the size of targeted domain, one can exceed list resource, and due to volume, as result, easily crash your browsing session .

Is there a way like with DOS equivalent "dir /p, to reduce Wayback’s dir/* list results to more acceptable size, allowing you to continue,for next results?

Thank you very much.

Reply [edit]

Poster: Nemo_bis Date: Nov 6, 2013 1:34am
Forum: forums Subject: Re: How to short 'dir/*' search with Wayback machine

Don't you manage to reach the field which allows you to filter results by URL with wildcards? Admittedly it's JavaScript filtering so not really less resource-intensive.

Reply [edit]

Poster: Etemen Date: Nov 6, 2013 8:28am
Forum: forums Subject: Re: How to short 'dir/*' search with Wayback machine


Hello Nemo_bis,

First of all, thanks for your reply.

As you indicate in your answer, being JS, it would simply not do in some cases. You probably never exhausted domain to such extent, when WM simply no longer handles amount of data received. In such case, whether you are in FF/Opera/Ie, would simply make no difference. Prospect of actually receiving the filed is nonexistent. Specially when even WM lets you know that you have exceeded your findings. This is the reason, I wanted to refine my search methodology with understanding of better switches at the very start. Surely there must be some list of more advanced search switches, where you could initially limit the number of available findings and then list at your will further. Because, this is the only possible way with domains with huge data.

Reply [edit]

Poster: aibek Date: Nov 6, 2013 11:02pm
Forum: forums Subject: Re: How to short 'dir/*' search with Wayback machine

To solve the browser-crash problem, you may use a download manager. Another benefit would be that you can then pause the download (IA servers supports the HTTP ‘Range’ header needed for that†), see if you have found what you are searching for, and if not found it, resume the download. You can open the partly downloaded file in a web-browser too. If the browser crashes, you should open the html file in a simple text editor like Notepad (on Windows). (On Linux, one would obviously use grep to search.) † IA servers support the ‘bytes-’ format in the HTTP ‘Range’ header (needed to resume the download), but I have not been successful when trying ‘bytes1-bytes2’ format (when you want only some part in the middle of a webpage).
This post was modified by aibek on 2013-11-07 07:02:14

Reply [edit]

Poster: Etemen Date: Nov 7, 2013 12:08am
Forum: forums Subject: Re: How to short 'dir/*' search with Wayback machine


Hello aibek,

Left to my own devices, DN manager is what I partly voyage to seek some comfort in maintaining results. As you can imagine, not the most elegant solution.

If someone would finds this useful at later date, I’ve tried every browser (Mac/PC) and although Mozzilla related, Seamonkey remains the most stabile while enduring, even the most stringent overload possible. The worst you will face with Seamonkey is "Unresponsive script’ warning, but then, just let him continue, and you will get your results.

I hope someone from WM staff would share some light firsthand to my original question and search switches.

regards

Reply [edit]

Poster: aibek Date: Nov 7, 2013 12:28am
Forum: forums Subject: Re: How to short 'dir/*' search with Wayback machine

Try browsing the pages after disabling JavaScript. That should take care of the ‘unresponsive script’ problem. (Because there would be no scripts anymore.)

Reply [edit]

Poster: Etemen Date: Nov 7, 2013 7:22am
Forum: forums Subject: Re: How to short 'dir/*' search with Wayback machine

Indeed I do, "No Script" is essential add-on - on each browser. I've mention that, for those who have no habit or understanding of JS blocking.

Reply [edit]

Poster: aibek Date: Nov 7, 2013 12:57am
Forum: forums Subject: Re: How to short 'dir/*' search with Wayback machine

IA provides many APIs for accessing webcapture data. Three are listed on [1], one more on [2].
[1] http://archive.org/help/wayback_api.php
[2] http://wwwoh-access.archive.org/wwwoh/waybackapi.htm

The most advanced is the Wayback CDX Server API listed in [1]. That allows for all sorts of thing, including regex search.

Reply [edit]

Poster: Etemen Date: Nov 23, 2013 5:24am
Forum: forums Subject: Re: How to short 'dir/*' search with Wayback machine


Aibek,

Please forgive my late reply.

I’ve missed notification of your last reply until today.

Thanks for your links. Well, this is getting us somewhere, something useful for sure