Corpora Text Archives Distributors Concordance
Word List Collocation Parser POS Tagger
Misc. Tools Useful Sites

List of Corpora


Text Archrive - Text Center
  • Alex: A Catalogue of Electronic Texts on the Internet  : Alex allows users to find and retrieve the full-text of documents on the Internet.  It currently indexes over 700 books and shorter texts by author and title, incorporating texts from Project Gutenberg, Wiretap, the On-line Book Initiative, the Eris system at Virginia Tech, the English Server at Carnegie Mellon University, and the on-line portion of the Oxford Text Archive.  For now it includes no serials.  Alex does include an entry for itself.
  • American Memory is the online resource compiled by the Library of Congress National Digital Library Program. With the participation of other libraries and archives, the program provides a gateway to rich primary source materials relating to the history and cultural developments of the United States. Over one million items from our historical collections are currently available online.
  • Berkeley Digital Library The Berkeley Digital Library SunSITE builds digital collections and services while providing     information and support to others doing the same. We are sponsored by The Library, UC Berkeley and Sun Microsystems, Inc.
  • The Brown University Women Writers Project: is creating a full-text database of women's writing in English from the period 1330-1830. Texts are encoded in TEI SGML
  • CCAT (Center for Computer Analysis of Texts) at the University of Pennsylvania has one of the biggest archives of ready-to-use e-texts. [downloadable]
  • Center for Electronic Texts serves all U.S. scholars, researchers and teachers involved with the creation and use of electronic text applications in the humanities.
  • Center for Electronic Text in the Law: CETL currently produces two text databases that can be accessed from the Internet. The first is the University of Cincinnati's portion of DIANA, a unique database of human rights materials. The second database, the Securities Lawyer's Deskbook, provides electronic acc ess from the Internet to the text of the Securities Act of 1933 and the Securities Exchange Act of 1934, together with the rules and forms necessary for compliance with these statutes.
  • Christian Classics Ethereal* Library  Classic Christian books in electronic format, selected for your edification. There is enough good reading material here to last you a lifetime, if you give each work the time it deserves! All of the books on this server are believed to be in the public domain in the United States unless otherwise specified.
  • The Complete Moby(tm) Shakespeare: The complete unabridged works of Shakespeare [2.3MB]
  • The Complete Works of William Shakespeare (
  • Electronic Text Center at the University of Virginia has combined an on-line archive of thousands of SGML-encoded electronic texts [downloadable]
  • The English Server, The English Server is a cooperative which has been publishing humanities texts online since 1990. Today it offers over eighteen thousand works, covering a wide range of interests. [downloadable]
  • Eris Project at Virginia Tech  :
  • The History of Mathematics archive :
  • Humanities Text Initiative - University of Michigan (HTI) is an umbrella organization for the acquisition, creation, and maintenance of electronic texts. [downloadable]
  • The Internet Classics Archive: an award-winning, searchable collection of over 400 classical Greek and Latin texts (in English translation)
  • The Labyrinth: The Labyrinth is a global information network providing free, organized access to electronic resources in medieval studies through a World Wide Web server at Georgetown University.
  • Online Book Initiative: The OBI is a project to make a large collection of freely redistributable text available in a common format for others to do with as they like. [downloadable]
  • The Online Book Page: The On-Line Books Page is a directory of books that can be freely read right on the Internet. [downloadable]
  • Oxford Text Archrive: The Archive has been collecting electronic texts for some twenty years from a wide variety of sources, and its holdings reflect the diversity of this medium. [downloadable]
  • Project Gutenberg The Project Gutenberg Philosophy is to make information, books and other materials available to the general public in forms a vast majority of the computers, programs and people can easily read, use, quote, and search. (FTP site)
  • Universal Library Books :

  • Word List

  • Concordance
  • POS Taggers Parsers Misc Tools

    Useful Sites