Skip to main content
SHOW DETAILS
eye
Title
Date Archived
Creator
Accelovation Crawl
Accelovation Crawl
collection
1,324
ITEMS
91.8M
VIEWS
collection

eye 91.8M

Web crawl snapshots generously donated from Accelovation . This data is currently not publicly accessible. From the site : Accelovation is pioneering the delivery of Insight Discovery™ software solutions that help companies move from innovation idea to product reality faster and with more success. Our solutions are used by leading firms in the Fortune 500 and beyond – companies from a diverse set of industries ranging from consumer packaged goods to high tech, foods to chemicals, and...
Ferguson Tweets
Ferguson Tweets
collection
212
ITEMS
2.2M
VIEWS
collection

eye 2.2M

IDs of tweets that mention Ferguson, Missouri between August 10th and August 27th, 2014 subsequent to the death of Michael Brown . Tweets collected by Ed Summers. He subsequently extracted the URLs from these tweets, and they were crawled by the Internet Archive. Please read Summers's article at inkdroid.org , with an update here , for more information. Photo: " Memorial to Michael Brown " by Jamelle Bouie
Mercator Crawl
Mercator Crawl
collection
1
ITEMS
89
VIEWS
collection

eye 89

Crawl done with the DEC/HP-labs 'Mercator' crawler and converted to ARC format. This data is currently not publicly accessible.
Rescue Crawls
Rescue Crawls
collection
2
ITEMS
682
VIEWS
collection

eye 682

Rescue crawls conducted by the public for sites that have announced that they are closing.
web_dar
web_dar
collection
112
ITEMS
9.1M
VIEWS
collection

eye 9.1M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_con
web_con
collection
1,507
ITEMS
74.5M
VIEWS
collection

eye 74.5M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa 2000 Election Crawl
Alexa 2000 Election Crawl
collection
4
ITEMS
351,241
VIEWS
collection

eye 351,241

2000 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl F2
Alexa Crawl F2
collection
1
ITEMS
285
VIEWS
collection

eye 285

Crawl F2 from Alexa Internet. This data is currently not publicly accessible.
Alexa Traffic
Alexa Traffic
collection
89
ITEMS
1,044
VIEWS
collection

eye 1,044

Traffic files from Alexa Internet that are sanitized-- just base urls (no parameters) and time/date. This data is currently not publicly accessible. Covers the period from December 2001 to February 2009.
NDIIPP Youtube Crawl
NDIIPP Youtube Crawl
collection
90
ITEMS
3.2M
VIEWS
collection

eye 3.2M

Youtube crawl performed by Internet Archive on behalf of the National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible.
collection

eye 261,935

Product DB
Product DB
collection
1
ITEMS
9,104
VIEWS
collection

eye 9,104

Product DB data collected by Alexa Internet. This data is currently not publicly accessible.
Open Sky
Open Sky
collection
1
ITEMS
3,129
VIEWS
collection

eye 3,129

Demo crawl of scientific data. This data is currently not publicly accessible.
Swiss National Library
Swiss National Library
collection
12
ITEMS
433,059
VIEWS
collection

eye 433,059

Data collected by Internet Archive on behalf of the Swiss National Library. This data is currently not publicly accessible.
Standards
Standards
collection
1
ITEMS
1,058
VIEWS
collection

eye 1,058

Standards crawl data collected by Internet Archive. This data is currently not publicly accessible.
2004 Election
2004 Election
collection
178
ITEMS
14.5M
VIEWS
collection

eye 14.5M

2004 Election crawl performed by Internet Archive. This data is currently not publicly accessible.
National Library of Ireland Crawls
National Library of Ireland Crawls
collection
2,623
ITEMS
35.3M
VIEWS
collection

eye 35.3M

Crawls performed by Internet Archive on behalf of the National Library of Ireland. This data is currently not publicly accessible.
collection

eye 21,923

Alexa Crawl DX
Alexa Crawl DX
collection
1,442
ITEMS
180.7M
VIEWS
collection

eye 180.7M

Crawl DX from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl AUG
Alexa Crawl AUG
collection
80
ITEMS
50.9M
VIEWS
collection

eye 50.9M

Crawl AUG from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl Test
Alexa Crawl Test
collection
6
ITEMS
15.1M
VIEWS
collection

eye 15.1M

Crawl Test from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl Short
Alexa Crawl Short
collection
5
ITEMS
8M
VIEWS
collection

eye 8M

Crawl Short from Alexa Internet. This data is currently not publicly accessible.
web_ind
web_ind
collection
91
ITEMS
8.8M
VIEWS
collection

eye 8.8M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
To Crawl
To Crawl
collection
1
ITEMS
143,711
VIEWS
collection

eye 143,711

Data collected by Internet Archive. This data is currently not publicly accessible.
web_eg
web_eg
collection
32
ITEMS
4.3M
VIEWS
collection

eye 4.3M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_ma
web_ma
collection
1,085
ITEMS
77.3M
VIEWS
collection

eye 77.3M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Target Product Crawl
Target Product Crawl
collection
4
ITEMS
465
VIEWS
collection

eye 465

Target product crawl data collected by Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl DZ
Alexa Crawl DZ
collection
1,207
ITEMS
153.7M
VIEWS
collection

eye 153.7M

Crawl DZ from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl EH
Alexa Crawl EH
collection
1,218
ITEMS
183.1M
VIEWS
collection

eye 183.1M

Crawl EH from Alexa Internet. This data is currently not publicly accessible.
Edu & Gov Crawl, June 2010
Edu & Gov Crawl, June 2010
collection
704
ITEMS
22.5M
VIEWS
collection

eye 22.5M

TEST COLLECTION: Crawl of .edu and .gov sites started in June 2010.
Topic: crawldata
Bibliotheque Nationale de France Domain Crawls
Bibliotheque Nationale de France Domain Crawls
collection
1,653
ITEMS
193.2M
VIEWS
collection

eye 193.2M

Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
web_wk
web_wk
collection
9,973
ITEMS
323.7M
VIEWS
collection

eye 323.7M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_leg
web_leg
collection
58
ITEMS
9.3M
VIEWS
collection

eye 9.3M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_el
web_el
collection
925
ITEMS
67.9M
VIEWS
collection

eye 67.9M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
UK Government Site Crawl
UK Government Site Crawl
collection
107
ITEMS
6.3M
VIEWS
collection

eye 6.3M

Collaborative closure crawl of British government sites performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : GOV.UK is a United Kingdom public sector information website, created by the Government Digital Service to provide a single point of access to HM Government services.
2004 Indian Ocean earthquake and tsunami
2004 Indian Ocean earthquake and tsunami
collection
42
ITEMS
7M
VIEWS
collection

eye 7M

Data related to the 2004 Indian Ocean earthquake and tsunami collected by Internet Archive. This data is currently not publicly accessible.
VOX.com Crawl September 2010
VOX.com Crawl September 2010
collection
28
ITEMS
1.3M
VIEWS
collection

eye 1.3M

Crawl of vox.com, September 2010. This was an attempt to preserve vox.com content as much as possible in the wake of service closure, September 30, 2010.
Topic: webwidecrawl
web_tran
web_tran
collection
4,192
ITEMS
137.5M
VIEWS
collection

eye 137.5M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Nigerian Election
Nigerian Election
collection
1
ITEMS
40,107
VIEWS
collection

eye 40,107

Data related to Nigerian elections, 2001 collected by Internet Archive. This data is currently not publicly accessible.
Brookings Institute Crawl
Brookings Institute Crawl
collection
1
ITEMS
167,672
VIEWS
collection

eye 167,672

Crawl data gather by Internet Archive on behalf of the Brookings Institute. This data is currently not publicly accessible.
Alexa Crawl DH
Alexa Crawl DH
collection
141
ITEMS
44.6M
VIEWS
collection

eye 44.6M

Crawl DH from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl GR
Alexa Crawl GR
collection
74
ITEMS
16.8M
VIEWS
collection

eye 16.8M

Crawl GR from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl TS
Alexa Crawl TS
collection
1
ITEMS
10,768
VIEWS
collection

eye 10,768

Crawl TS from Alexa Internet. This data is currently not publicly accessible.
collection

eye 1.1M

NDIIPP Reality
NDIIPP Reality
collection
1
ITEMS
6,029
VIEWS
collection

eye 6,029

Immersive gaming environments R&D project for National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible. from Wikipedia : The National Digital Information Infrastructure and Preservation Program (NDIIPP) is an archival program led by the Library of Congress to archive and provide access to digital resources. The U.S. Congress established the program in 2000. The Library was chosen because of its role as one of the leading providers of...
University of Michigan
University of Michigan
collection
5
ITEMS
1.8M
VIEWS
collection

eye 1.8M

Data collected by Internet Archive on behalf of University of Michigan. This data is currently not publicly accessible. from Wikipedia : The University of Michigan, frequently referred to as simply Michigan, is a public research university located in Ann Arbor, Michigan, United States. It is the state's oldest university and the flagship campus of the University of Michigan.
web_oso
web_oso
collection
150
ITEMS
13.2M
VIEWS
collection

eye 13.2M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Hurricane Katrina
Hurricane Katrina
collection
112
ITEMS
11.1M
VIEWS
collection

eye 11.1M

Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
collection

eye 240,456

FS Fed US
FS Fed US
collection
3
ITEMS
19,000
VIEWS
collection

eye 19,000

Data collected in 2005 by Internet Archive. This data is currently not publicly accessible.
Alexa 2002 Election Crawl
Alexa 2002 Election Crawl
collection
24
ITEMS
20.3M
VIEWS
collection

eye 20.3M

2002 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl TO
Alexa Crawl TO
collection
1
ITEMS
2.2M
VIEWS
collection

eye 2.2M

Crawl TO from Alexa Internet. This data is currently not publicly accessible.
Mayoral Crawls
Mayoral Crawls
collection
1
ITEMS
285,656
VIEWS
collection

eye 285,656

Mayoral crawls performed by Internet Archive. This data is currently not publicly accessible.
Alexa 1996 Election Crawl
Alexa 1996 Election Crawl
collection
1
ITEMS
46,360
VIEWS
collection

eye 46,360

1996 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl DL
Alexa Crawl DL
collection
413
ITEMS
102M
VIEWS
collection

eye 102M

Crawl DL from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl ARC
Alexa Crawl ARC
collection
79
ITEMS
25.6M
VIEWS
collection

eye 25.6M

Crawl ARC from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl ST
Alexa Crawl ST
collection
1
ITEMS
931,446
VIEWS
collection

eye 931,446

Crawl ST from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl Robot
Alexa Crawl Robot
collection
1
ITEMS
104,445
VIEWS
collection

eye 104,445

Crawl Robot from Alexa Internet. This data is currently not publicly accessible.
September 11th
September 11th
collection
1
ITEMS
895,482
VIEWS
collection

eye 895,482

Data related to September 11th, 2001 collected by Internet Archive. This data is currently not publicly accessible. from Wikipedia : The September 11 attacks (also referred to as September 11, September 11th, or 9/11 were a series of four coordinated terrorist attacks launched by the Islamic terrorist group al-Qaeda upon the United States in New York City and the Washington, D.C. areas on September 11, 2001.
Yahoo! Video Crawl
Yahoo! Video Crawl
collection
4,484
ITEMS
54,557
VIEWS
collection

eye 54,557

Pages captured from Yahoo! Video prior to removal of user uploads. Crawl Started February 2011. This data is currently not publicly accessible. from Wikipedia : Yahoo! Video is a video sharing website on which users could upload and share videos. The service is owned and created by Yahoo! Yahoo! Video began as an internet-wide video search engine and added the ability to upload and share video clips in June 2006. A re-designed site was launched in February 2008 that changed the focus to...
National Science Digital Library
National Science Digital Library
collection
3
ITEMS
56,190
VIEWS
collection

eye 56,190

Demo crawl for the National Science Digital Library. This data is currently not publicly accessible. from Wikipedia : The United States' National Science Digital Library (NSDL) is an open-access online digital library and collaborative network of disciplinary and grade-level focused education providers. NSDL's mission is to provide quality digital learning collections to the science, technology, engineering, and mathematics (STEM) education community, both formal and informal, institutional and...
web_osi
web_osi
collection
677
ITEMS
32.4M
VIEWS
collection

eye 32.4M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl BK
Alexa Crawl BK
collection
1
ITEMS
104,115
VIEWS
collection

eye 104,115

Crawl BK from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl DJ
Alexa Crawl DJ
collection
341
ITEMS
86.5M
VIEWS
collection

eye 86.5M

Crawl DJ from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl CRC
Alexa Crawl CRC
collection
32
ITEMS
28M
VIEWS
collection

eye 28M

Crawl CRC from Alexa Internet. This data is currently not publicly accessible.
Alexa MP3.com Crawl
Alexa MP3.com Crawl
collection
43
ITEMS
142,907
VIEWS
collection

eye 142,907

MP3.com Crawl from Alexa Internet. This data is currently not publicly accessible.
collection

eye 88,945

Demo crawl for National Oceanic and Atmospheric Administration (NOAA). This data is currently not publicly accessible. from Wikipedia : The National Oceanic and Atmospheric Administration (NOAA) is a scientific agency within the United States Department of Commerce focused on the conditions of the oceans and the atmosphere. NOAA warns of dangerous weather, charts seas and skies, guides the use and protection of ocean and coastal resources, and conducts research to improve understanding and...
web_is_m
web_is_m
collection
1
ITEMS
13,811
VIEWS
collection

eye 13,811

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_mon
web_mon
collection
3,809
ITEMS
151.7M
VIEWS
collection

eye 151.7M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_pop
web_pop
collection
13
ITEMS
3.7M
VIEWS
collection

eye 3.7M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_sm_or
web_sm_or
collection
16
ITEMS
3.6M
VIEWS
collection

eye 3.6M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_sm_prin
web_sm_prin
collection
1
ITEMS
145,425
VIEWS
collection

eye 145,425

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Wikipedia Dumps
by Wikipedia
web

eye 78

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 80

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
The Aaron Swartz Collection
movies

eye 158

favorite 0

comment 0

February 29, 2008 Topics: Open Library display/UI, Scan on Demand, Digital ILL, Print on Demand
crawl_DZ
crawl_DZ
collection
0
ITEMS
41
VIEWS
collection

eye 41

Crawl data. This data is currently not publicly accessible.
Wikipedia Dumps
by Wikipedia
web

eye 98

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 107

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 32

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 116

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 105

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 102

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wide Crawl started September 2010
Wide Crawl started September 2010
collection
332
ITEMS
16.6M
VIEWS
collection

eye 16.6M

Web wide crawl with initial seedlist and crawler configuration from September 2010
Alexa Crawl Slash
Alexa Crawl Slash
collection
1
ITEMS
5.6M
VIEWS
collection

eye 5.6M

Crawl Slash from Alexa Internet. This data is currently not publicly accessible.
Wikipedia Dumps
by Wikipedia
web

eye 106

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 101

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 101

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 103

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 93

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 106

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 123

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 92

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 130

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 104

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 122

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
Wikipedia Dumps
by Wikipedia
web

eye 100

favorite 0

comment 0

Retrieved from wikipedia.org on April 8, 2010
web_is
web_is
collection
5
ITEMS
2.4M
VIEWS
collection

eye 2.4M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_sup
web_sup
collection
88
ITEMS
9.3M
VIEWS
collection

eye 9.3M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_sm_sing
web_sm_sing
collection
3
ITEMS
1.5M
VIEWS
collection

eye 1.5M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
web_iq
web_iq
collection
2,637
ITEMS
269.7M
VIEWS
collection

eye 269.7M

Crawl performed by Internet Archive. This data is currently not publicly accessible.