Monday, January 14, 2013

Eight- and nine-letter words in base-26 pi: II

The 32 eight- and nine-letter words from Peter Norvig's Google word-count file that were found in half a billion letters of the base-26 π-code representation of the real digits of π have now been supplemented by finds from larger data sets (not restricted by Norvig's at-least-100000-mentions cutoff criterion).

Specifically, my search resulted in an additional 55 eight-letter and 9 nine-letter hits. Of the eight-letter finds, I decided to reject goyetian, rosaruby, avanious, fleyland, commoney, scambler, and tortuose. I immediately recognized the nine-letter beakerman as a word I had come across in December 2003 (at the time, I had saved a picture of muppet Beaker and had given it that name) but I have no corresponding Mathematica notebook to document the find and have only a vague recollection of extending my year-2000 calculation. At any rate, I had struggled back then with recognizing beakerman as a legitimate word and I did so again now. (I have kept it.) So, 32+55-7+9 = 89 words:

  3095146  Armagnac
  5204508  reformist
  5446573  fabledom
 12767754  pediatry
 23893131  keratoma
 26460749  plastics
 30620629  Batavian
 34355657  sailorly
 38729316  hatbrush
 46803099  Gemmingia
 49292523  raisonné
 52221111  beakerman
 52374041  infandous
 62288036  Altamont
 68386037  handsome
 77174448  piquance
 80344659  spraints
 85983887  ticktock
 95489940  freewill
104799581  glassful
119398927  obligate
122636295  derriere
144023162  tarragon
145410250  Pannonic
148864411  aphicide
160285943  conveyer
168667826  hockshin
179537813  caraboid
186970055  lineages
194941942  symbolic
203750087  drawling
204682494  subreguli
213927339  aquiform
220130527  pajamaed
223387624  blurbist
227698058  Gederite
232625291  moromancy
233706360  Brockway
238312955  homicide
241593178  aularian
244832756  coenzyme
245790734  clinamen
248977229  offenses
253217633  somewise
258077020  masslike
265316858  draftily
270498733  puncheon
290930240  friction
291953969  Judentum
296560665  torpidity
298503676  eddyroot
308820127  engaging
309864510  octapody
310692296  Alabaman
317941229  outgrown
324802306  dartlike
326873656  hayfield
327954809  jamboree
330311394  grubbily
331195875  monodont
334661344  venially
339119974  panderly
341079873  magneton
358147952  benzamide
362326813  autopsic
378333440  bookings
379470966  assenter
400726498  cardanic
414326761  immotive
426642188  slubbery
428186515  noblesse
433412589  inertial
440674037  ephebeum
442091394  unkilled
443277601  bioplasm
444201817  Crataeva
452027527  driftlet
454659011  pineland
460082749  loathness
467631243  prickish
468685858  pyroboric
475910828  Mersenne
476984745  stigmatic
479595795  Vallarta
480168788  sunblink
483460192  atmiatry
487934346  copyists
488079020  Assyrian
499784890  southron

A ten-letter word is not found in this range — unless we are willing to allow backwords:


At index 115577805 is the string remonobons, which is snobonomer in reverse. William Makepeace Thackeray used this word in his satirical writing: "Some telescopic philosopher will arise one day, some great Snobonomer, to find the laws of the great science which we are now merely playing with, and to define, and settle, and classify that which is at present but vague theory, and loose, though elegant assertion."

Saturday, January 12, 2013

Eight- and nine-letter words in base-26 pi

In my previous post, I provided some English number words that appear in Mike Keith's base-26 π-code representation of the real digits of π. This entry is about other words.

In 2000, I found the eight-letter armagnac at index 3095146. Now, with half a billion strung-together letters (one hundred times the "real estate") at my disposal, I expected to find many more eight-letter words and, hopefully, some larger ones as well. I used eight- and nine-letter words culled from Peter Norvig's Google word-count file made available in his recent English Letter Frequency Counts essay.

A search resulted in 35 eight-letter and 2 nine-letter hits. I dismissed gruening, schreber, brentano, and hillquit for being surnames only. (I kept mersenne because of its adjectival usefulness in mathematics.) I also excluded thoufand — an alternate, incorrect version of thousand resulting from the difficulty of distinguishing a long s from an f in old-English typography. That thoufand had 158819 mentions in Norvig's data set amply demonstrates his list's limitations (and questions his conclusions).

In the following, I have capitalized the words (including a German one) that I felt needed capitalization and added an accent on one of the three French words.

  3095146  Armagnac
  5204508  reformist
 26460749  plastics
 30620629  Batavian
 49292523  raisonné
 62288036  Altamont
 68386037  handsome
 95489940  freewill
119398927  obligate
122636295  derriere
144023162  tarragon
160285943  conveyer
186970055  lineages
194941942  symbolic
203750087  drawling
233706360  Brockway
238312955  homicide
244832756  coenzyme
248977229  offenses
290930240  friction
291953969  Judentum
308820127  engaging
317941229  outgrown
327954809  jamboree
378333440  bookings
428186515  noblesse
433412589  inertial
475910828  Mersenne
476984745  stigmatic
479595795  Vallarta
487934346  copyists
488079020  Assyrian

32 words: It's a start.

Wednesday, January 09, 2013

A googol in pi

In 1999, Mike Keith worked on a base-26 representation of the number π, wherein the digit 0 is replaced with the letter a, the digit 1 with the letter b, .., the digit 25 with the letter z:

π = d.drsqlolyrtrodnlhnqtgkudqgtuirxneqbckbszivqqvgdmelmuexroiqiyalvuz..

First, I will set up indexing terminology so that there is no misunderstanding about exactly where things are. I am partial (dictatorial, because of my work with continued fractions) to dropping, here, the whole number d and indexing the subsequent, fractional drs.. {1,2,3,..}. Thus the expression lol is (begins) at index 5.

Mike and I looked for large words in the base-26 π expansion and number words of all sizes. In 2000, it took me 127 hours to calculate 5 million terms. My present setup accomplishes it in 13 seconds! So it didn't take me long to up the ante and I can now report that I have found a googol at index 454315613. Already appeared by then are the English number words for 1, 4, 2, 6, 10, 5, 9, 0, 7, 3, 50, 40, 8, 60, 11, 90, 80, 12, 30 (first appearances, respectively at indices 10087, 11324, 13463, 14295, 15276, 64838, 175372, 389247, 786244, 1556763, 2300987, 8879098, 9202330, 9946442, 33027856, 126003234, 126348794, 238426469, 389952198). Here are all twenty first appearances, as one would see them in situ:


The word google does not appear in the first half billion terms.

Tuesday, January 08, 2013

Letter frequencies


Monday, January 07, 2013

The principle of Laplace

Marcello Truzzi's "an extraordinary claim requires extraordinary proof" is generally credited to Pierre-Simon Laplace (via Carl Sagan) in the form of "the weight of evidence for an extraordinary claim must be proportioned to its strangeness". I did a little Google searching in an attempt to source this and (finally) came up with William McDougall's Outline of Abnormal Psychology (1926), which has (on page 508):

"We are so far from knowledge of all natural agencies and of their diverse modes of action, that it would not be philosophic to deny any phenomena simply because they are inexplicable in the present state of our knowledge. But we ought to examine them with an attention the more painstaking, the more difficult it may seem to accept them as real."

McDougall says the quote was made "by the great exponent of the mechanical view of the universe, Laplace" and that a Théodore Flournoy would call it "the principle of Laplace" and state it briefly as "the weight of the evidence should be proportional to the strangeness of the alleged facts". Ironically then, it would appear that the quote generally attributed to Laplace is actually a quote of Flournoy, a one-time believer of telekinesis, telepathy, and clairvoyance!

Friday, January 04, 2013

Mate the royal couple

This was the title of Valentin Albillo's 1997 unsolved position #91. In 2003, Vincent Lejeune had ChessMaster 9000 declare e4 as a mate-in-11. I wanted to see what a modern, off-the-shelf chess engine would do with this position.

3qk3/8/8/8/8/8/PPPPPPPP/RNBQKBNR w

The program I used was HIARCS Chess Explorer. The analysis engine is Deep HIARCS 14 WCSC. Because I am running several number-crunching applications concurrently, I did not expect to see full performance on my late-2012 iMac. Nevertheless, in its default hash-table-size setting, it managed a mate-in-11 for e4 in a few hours.

I restarted the 32-bit chess engine with an increased hash-table-size of 2 GB (the maximum possible). Unexpectedly, it did not discover the e4 mate as quickly, but (as compensation, perhaps) noted that Nc3 was mate-in-11 as well. I have kept the evaluation running to see what the other eighteen lines might produce. Eventually, they are mate-in-12 for twelve (d4, e3, h4, a4, b3, c3, Nh3, Nf3, h3, d3, b4, c4) of them, mate-in-13 for five (a3, Na3, g4, g3, f4), and mate-in-14 for f3. HIARCS is not a mate solver, so these numbers should be seen as mate-in-at-most with a slight improvement possible, though not entirely expected.

Tuesday, January 01, 2013


Never a fan of social anything, I have nevertheless decided to join Google+.