Wikipedia:Google Books and Wikipedia

The document helps to educate why we prefer not to use Google Books (GB) where better options exist.

Why Google Books isn't goodEdit

  1. GB is a commercial book seller. Its parent company is in the business of making money. The following problems source to this essential truth.
  2. GB links are unstable and prone to disappear. An estimated 15% of GB links on Wikipedia are dead (404). An even larger percentage the page preview no long work and redirects to the "About this book" page. Google is not a library nor archive for long term preservation, books can and do disappear any time.
  3. Book previews are not equally accessible to all internet users. People living in other countries may not be able to view a page in "preview" that you were able to when you used it as a citation.
  4. Due to digital restrictions management, geoblocking and other barriers,[1] archiving Google Books is hard and sometimes impossible: Wayback Machine and other web archives often fail to archive even the Google page previews specifically linked from articles.
    • This is also a problem for accessibility.[2] Libraries like the Internet Archive have specific services for the visually impaired.
  5. In general, Google Books is not free software and requires the user to run proprietary JavaScript. All its users are monitored for various purposes and privacy concerns regarding Google apply.
    • The ostensible reason for user monitoring is to allow Google to respect the contracts it has with publishers, which require Google to make life miserable for readers; however, some such requirements are Google's own creation, see next point.
  6. Google Books makes governments and public entities sign contracts which go against the public domain by stating that Google has an exclusive right on the scans for a number of years.[3]
  7. Google Books tries to make users register a Google account and access books only while logged in, both to make user monitoring easier and to direct users to its paid Google Play offering. It's impossible to know whether in the future an URL which currently works for everyone will become subject to registration or payment: for instance, certain buttons to download a (public domain) book in PDF or EPUB format have changed their positions and requirements multiple times.
  8. GB shifts content while you are not looking. For example, a link to a 1985 edition might in the future become the 2019 edition because the publisher released a new edition. This is good for book sellers and publishers, because the newest edition of the book is always at the same Google Book ID. For Wikipedia, this causes havoc with page references and citations.
  9. GB books have free preview for some books but as a commercial book seller they have no interest in freely lending books with CDL (Controlled Digital Lending) like other providers do.
  10. GB in 2020 started offering "new" GB which ignores significant portions of existing URLs resulting in different final results.

Why we use it anywayEdit

A 48-hour EventStream poll showed about 400 new Google Books links being added per day to the English Wikipedia (Feb 2020).

  1. The core strength of Google is search and this is true with Books. It is easy to find a citation for a given search term.
  2. Market power: especially when searching rare keywords, Google Search links Google Books very prominently. More than 90% of users use Google Search.
  3. Force of habit and the network effect—the more Google Books links we have, the more we will have.
  4. Problems are hidden. Most users are unaware of these issues.
  5. In-line search-term highlighting is very nice.
  6. For public domain books, Google Books sometimes provides cleaner and smaller PDFs than other providers.

Why we should stop when possibleEdit

  • WP:AFFILIATE. Commercial book seller vs. non-profit archival library.
  • WP:Verifiability. Links that break create problems with verification.
  • WP:Link rot. Links should be reliable and stable.
  • Lacks Controlled Digital Lending. Free previews are great, but viewing the complete book for free is better.

AlternativesEdit

  • Internet Archive is creating one to three thousand new book scans a day (as of 2019). It is their stated goal to scan every book cited on Wikipedia. Most of the new books being scanned are modern, but they already have more Public Domain books than Google. More info at Wired Magazine and other sources. Internet Archive is a non-profit library and archive; its URLs and links never change, and books are freely available in-full with registration.
    • Internet Archive also offers a full text search which is often superior to that of Google Books, because it indexes content which is restricted by Google and because the context of the matches is easier to understand.
  • Hathi Trust is a non-profit archive with stable links. Most of the PD books at Google are also available there, and at Internet Archive.

How to helpEdit

  • Search the Internet Archive for books and periodicals, it has over 20 million comparable in size to Google Books and is larger in some collections.
    • A simple way to search the Internet Archive via Google: <search term> site:archive.org (example)
    • More in-depth searching: At the https://archive.org homepage, enter a search term into the search box (not wayback machine search, the other search box). Choose the radio button "search inside text".
  • It needs to be easy for the user to link the better alternatives, or to get their links converted and corrected. This is largely a matter of developing the user interface and tools, but the existing wikitext can be improved as well: adding unique identifiers to citations always helps.
  • Expand the libraries!

NotesEdit

  1. ^ Especially what copyright laws euphemistically call technical measures.
  2. ^ Giannoumi, G. Anthony; Land, Molly; Beyene, Wondwossen Mulualem; Blanck, Peter (May 31, 2017). "Web accessibility and technology protection measures: Harmonizing the rights of persons with cognitive disabilities and copyright protections on the web". Cyberpsychology: Journal of Psychosocial Research on Cyberspace. 11 (1). doi:10.5817/CP2017-1-5.
  3. ^ For example the 2010 contract with the Italian ministry.