Skip to main content

View Post [edit]

Poster: aibek Date: Nov 8, 2013 6:45am
Forum: forums Subject: Re: MP3.com archive

I looked around a bit and found that: Wayback Machine’s mp3.com archives contain no mp3 (or any other audio file). For every request by a webbrowser, the webserver sends, along with the content asked for, an HTTP ‘Content-Type’ header which tells the mime-type of the content. It is using this header that the webbrowser determines what type of data the server is sending us. If the header says ‘audio/mpeg’, the browser knows that it is an mp3 file, and thus it launches a media player and passes the file to it. [1] [2] Wayback Machine saves the Content-Type header corresponding to all the requests it makes. It means, essentially, that if the server is to be trusted, you know the Content-Type (aka mimetype, aka file-type) of every single file the Wayback Machine has saved. The Content-Type is saved in the ‘mimetype’ field of the CDX record. On searching for ‘audio/mpeg’ mimetype in the whole of mp3.com domain, for all the records Wayback Machine has, nothing is found [3]. For a comparison see [4]. For searching, see [6]. The above analysis is complete in itself. But I further tried to see if Wayback Machine records any nice and clean download link for mp3 files. The ‘Download’ or ‘Play’ links on the pages all reference CGI scripts (note the /cgi-bin/ in such urls.) So I wanted to see whether after jumping through all the hoops, mp3.com presented any direct download link. The result is again negative [5]. [1] “If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource.” [~rfc2616] [2] A mimetype list: https://en.wikipedia.org/wiki/Mime_type#Type_audio [3] http://web.archive.org/cdx/search/cdx?url=mp3.com&matchType=host&output=json&limit=30&filter=mimetype:audio/.* [4] http://web.archive.org/cdx/search/cdx?url=mp3.com&matchType=host&output=json&limit=30&filter=mimetype:image/.* [5] http://web.archive.org/cdx/search/cdx?url=mp3.com&matchType=host&output=json&limit=50&filter=original:.*\.mp3[^.]*&filter=!statuscode:404 [6] https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server
This post was modified by aibek on 2013-11-08 14:45:02