Saturday, September 22, 2007

Word numbers programming puzzle

Reading Reddit last week I came across an interesting programming puzzle.

http://www.itasoftware.com/careers/puzzles07.html

I've written a straightforward Common Lisp solution which is pasted below. What is interesting about this problem, and what would make it a good interview question, is that coding up the basic solution as I have here only poses more problems.

None of the Lisp environments I've tried have enough heap space to complete this problem, even though in terms of time complexity it is O(n). Judging by how long it takes to run on a few hundred thousand numbers, and then several million, it would take about 20 hours on my 2Ghz PC to solve for 1 billion numbers.

Even if I used the file system to hold the numbers, assuming each number fits in 50 chars of text, 4 bytes for the value and a further 4 to hold the length of the string, that is still in the order of 50Gb. You would then need to sort that file a bit at a time in memory, using a merge sort, and finally do a linear run through the file until you get the 51,000,000,000nth number.

Some clever folk have presented a more intelligent solution here...

http://conway.rutgers.edu/~ccshan/wiki/blog/posts/WordNumbers1/

Here is my solution so far:


#|
Word Numbers

This is a partial solution to the problem found at http://www.itasoftware.com/careers/puzzles07.html
by Justin Heyes-Jones

I say partial because it does actually work, and if you had a lisp environment with enough memory, it would finish
in about a day. But the real solution seems to be either to spot patterns so you don't have to generate and sort
all 1 billion numbers, or to use the file system to cope with what your computer memory cannot.

"If the integers from 1 to 999,999,999 are written as words, sorted alphabetically, and concatenated, what is the 51 billionth letter?"

To be precise: if the integers from 1 to 999,999,999 are expressed in words
(omitting spaces, 'and', and punctuation[1]), and sorted alphabetically so that the first six integers are

* eight
* eighteen
* eighteenmillion
* eighteenmillioneight
* eighteenmillioneighteen
* eighteenmillioneighteenthousand

and the last is

* twothousandtwohundredtwo

then reading top to bottom, left to right, the 28th letter completes the spelling of the integer "eighteenmillion".

The 51 billionth letter also completes the spelling of an integer. Which one, and what is the sum of all the integers to that point?

[1] For example, 911,610,034 is written "ninehundredelevenmillionsixhundredtenthousandthirtyfour"; 500,000,000 is written "fivehundredmillion"; 1,709 is written "onethousandsevenhundrednine".

|#

; (load (compile-file "wordnumbers.lisp"))
; (solve 999999999 51000000000) ; unlikely to finish unless you have a massive memory heap
; (solve 10 26) ; will work, but may not get you the job ;-)

;;;; Utilities

(defmacro with-string-words((str word) &body body)
"Utility macro to iterate over a string and return each word (anything between spaces)"
`(do* ((start 0 (if end (1+ end) nil))
(end
(position #\Space ,str :start 0)
(if end (position #\Space ,str :start (1+ end)) nil))
(,word (subseq ,str start end) (if start (subseq ,str start end) nil)))
((null start))
,@body))

;;;; Numbers are stores as the number in words, the length of this string and finally the numeric value

(defun get-words(lst)
(first lst))

(defun get-length(lst)
(second lst))

(defun get-value(lst)
(third lst))

(defun remove-and(str)
"remove occurences of 'and' from a string"
(let ((new-str (make-array 0 :element-type 'character :fill-pointer 0 :adjustable t)))
(with-string-words (str word)
(if (string/= "and" word)
(and
(setf new-str (concatenate 'string new-str word))
(setf new-str (concatenate 'string new-str " ")))))
new-str))

(defun char-space-or-hyphen-p(c)
(if (or (char= #\Space c) (char= #\- c))
t
nil))

(defun remove-spaces-and-hyphens(str)
"remove spaces and hyphens from a string"
(remove-if #'char-space-or-hyphen-p str))

(defun get-number-as-words(n)
"Use common lisps built in English text number output"
(format nil "~r" n))

(defun get-numbers-as-word-list(n)
"get the numbers from 1 to n and return as a list of strings and the lengths"
"of each string as a list of three items, words, length of word string and"
"actual numeric value"
(loop for n from 1 to n collect
(let* ((str (get-number-as-words n)) (len (length str)))
(list (convert-text str) len n))))

(defun compare-word-and-len(a b)
"given a string, length pair compare on alphabetical order"
(string< (get-words a) (get-words b)))

(defun sort-number-word-list-alphabetically(lst)
(sort lst #'compare-word-and-len))

(defun convert-text(str)
(remove-spaces-and-hyphens (remove-and str)))

(defun get-number-from-letter-index(lst target-index)
(do
((number 0 (1+ number))
(index 0 (+ index (get-length (nth number lst)))))
((> number (1- (length lst))))
(if (<= target-index (+ index (get-length (nth number lst))))
(return-from get-number-from-letter-index number)))
nil)

(defun sum-to-n(lst n)
(if (>= n 0)
(+ (get-value (car lst))
(sum-to-n (cdr lst) (1- n)))
0))

(defun solve(num n)
"Solve the problem for 'num' numbers, finding character position n"
(let ((lst
(sort-number-word-list-alphabetically
(get-numbers-as-word-list num))))
(format t "Made list~%")
(let ((number
(get-number-from-letter-index lst n)))
(format t "Found number~%")
(let ((value (get-value (nth number lst))))
(format t "Done.~%Number at character pos ~a is ~a. Sum to that number is ~a~%" n value (sum-to-n lst number))))))