Books | UK Encyclopedia of Law     Just another Lawiorguk site
What do you need to know about law? Search in more than 1.500.000 entries

Word Data

Word Data by the Lawi Project

Lawi Project’ Word Data is calculated from a detailed analytical study on all of the English language books on many non-fiction books published between 1500 and 1945. The analysis is performed fully by our own software and algorithms.

Each word is converted to lowercase, punctuation stripped, and compared against an English dictionary of 657,748 English words. The dictionary is a compound of multiple other dictionaries, includes medical, legal and scientific words, as well as systematic and binomial names in Latin, Greek and other languages; which for the purpose of this study are considered to be English terms as they are used in English language books.

Words not in the dictionary are excluded from the study entirely as they are assumed to be computer errors in optical character recognition. Stopwords, of which there are 311, are recorded for consistency in the frequency calculations but are not analyzed further. Both unigrams (individual words, e.g. “blue”) and bigrams (double words, e.g. “blue sky”) are recorded, bigrams consisting each of 2 unigrams. Ngrams (any word or phrase, e.g. both previous examples and “fair blue sky”) of longer than 2 words are not calculated due to processing constraints.

Each ngram is indexed along with all the following information: the book it came from, the page, the position on the page from the first word, the position on the page by coordinates, the category of the book, and the year of publication. This information can then be used to generate graphs and reports giving details related to all aspects of the word and its relationship with every other word, page, page position, book, picture, category and year of publication. This web site features many such charts and graphs, particularly word frequency by category and year, bigrams of unigrams, and unigrams commonly associated with each ngram. Our data has the potential for much further use and could answer many questions hitherto unanswered.

Books with the highest number of unique words are dictionaries.

Top 100 bigrams (excluding stopwords)

  • 1. united states
  • 2. new york
  • 3. other hand
  • 4. years ago
  • 5. our own
  • 6. much more
  • 7. one hundred
  • 8. some time
  • 9. great britain
  • 10. only one
  • 11. very much
  • 12. many years
  • 13. most important
  • 14. how many
  • 15. how much
  • 16. some other
  • 17. jesus christ
  • 18. one another
  • 19. took place
  • 20. new england
  • 21. our lord
  • 22. one side
  • 23. very little
  • 24. supreme court
  • 25. you know
  • 26. many other
  • 27. one day
  • 28. pointed out
  • 29. ten years
  • 30. other words
  • 31. govern ment
  • 32. difference between
  • 33. great deal
  • 34. high school
  • 35. sir john
  • 36. once more
  • 37. short time
  • 38. carried out
  • 39. large number
  • 40. years old
  • 41. new jersey
  • 42. next day
  • 43. some cases
  • 44. twenty years
  • 45. very large
  • 46. far more
  • 47. last year
  • 48. york city
  • 49. long time
  • 50. one year
  • 51. present time
  • 52. new testament
  • 53. young man
  • 54. very small
  • 55. very great
  • 56. most part
  • 57. one time
  • 58. young men
  • 59. cut off
  • 60. greater part
  • 61. laid down
  • 62. many cases
  • 63. some years
  • 64. real estate
  • 65. years after
  • 66. north carolina
  • 67. other side
  • 68. takes place
  • 69. much less
  • 70. years before
  • 71. little more
  • 72. some one
  • 73. very good
  • 74. hundred years
  • 75. other things
  • 76. civil war
  • 77. one hand
  • 78. present day
  • 79. set out
  • 80. days after
  • 81. several years
  • 82. after having
  • 83. good deal
  • 84. how far
  • 85. human nature
  • 86. passed through
  • 87. you see
  • 88. one thing
  • 89. years later
  • 90. sir william
  • 91. reason why
  • 92. thirty years
  • 93. very different
  • 94. feet high
  • 95. north america
  • 96. right hand
  • 97. old testament
  • 98. nothing more
  • 99. other parts
  • 100. holy ghost

Top 100 unigrams (excluding stopwords)

  • 1. one
  • 2. more
  • 3. other
  • 4. our
  • 5. you
  • 6. some
  • 7. time
  • 8. only
  • 9. very
  • 10. made
  • 11. great
  • 12. out
  • 13. new
  • 14. after
  • 15. most
  • 16. under
  • 17. many
  • 18. man
  • 19. before
  • 20. much
  • 21. see
  • 22. men
  • 23. years
  • 24. state
  • 25. part
  • 26. work
  • 27. life
  • 28. over
  • 29. good
  • 30. general
  • 31. without
  • 32. through
  • 33. found
  • 34. day
  • 35. god
  • 36. between
  • 37. own
  • 38. place
  • 39. case
  • 40. make
  • 41. long
  • 42. people
  • 43. little
  • 44. john
  • 45. year
  • 46. church
  • 47. how
  • 48. against
  • 49. like
  • 50. water
  • 51. given
  • 52. present
  • 53. law
  • 54. old
  • 55. power
  • 56. large
  • 57. might
  • 58. having
  • 59. number
  • 60. another
  • 61. order
  • 62. states
  • 63. country
  • 64. right
  • 65. last
  • 66. whole
  • 67. however
  • 68. form
  • 69. use
  • 70. house
  • 71. called
  • 72. used
  • 73. small
  • 74. lord
  • 75. never
  • 76. less
  • 77. himself
  • 78. city
  • 79. give
  • 80. far
  • 81. public
  • 82. act
  • 83. among
  • 84. land
  • 85. down
  • 86. world
  • 87. above
  • 88. name
  • 89. because
  • 90. therefore
  • 91. history
  • 92. point
  • 93. court
  • 94. government
  • 95. cases
  • 96. others
  • 97. war
  • 98. side
  • 99. school
  • 100. following

Example

Word Data for Great Britain

Compare this ngram to:

This page contains research word data for the bigram Great Britain, calculated from a statistical analysis of all words in all of the books on the Lawi Project. Details of the method.

Great Britain is the 9th most common bigram in politics, the 452nd most common bigram in law, and the 13th most common bigram across all books. Great Britain is more commonly used in politics books.

Also see the root unigrams: great and britain.