Tuesday, June 14, 2011

Emacs progress indication

When programming in emacs lisp, there is an easy way to show progress feedback to the user when a task will take some time. Here's a block of code from the elisp manual showing how to do it.

(let ((progress-reporter
(make-progress-reporter "Collecting mana for Emacs..."
0 500)))
(dotimes (k 500)
(sit-for 0.01)
(progress-reporter-update progress-reporter k))
(progress-reporter-done progress-reporter))

I've incorporated this into my duplicate files code, linked below...

http://code.google.com/p/justinhj-emacs-utils/source/browse/trunk/duplicates.el


Sunday, June 5, 2011

More on duplicate file handling in emacs dired

I had some good feedback on my last post, that it would be useful to be leave just the superfluous duplicate files marked in the dired buffer. After doing that you can then copy them to another folder, or delete them.

I've added a function to do this `dired-mark-duplicate-files', and updated the google code site with the changes.

To copy to another folder use R and select a folder to move the dupes to. In order to delete them hit D.



Wednesday, June 1, 2011

Finding duplicate files in a dired buffer



picture by Donald MacLeod

This is a an example of programming emacs in emacs-lisp just to give an idea of what you can put together in an hour or two. I was looking at a dired buffer with a bunch of photos in, and some were the same photo that I'd downloaded twice. So I started thinking about writing a utility in emacs to automatically find and remove the duplicate files. In this post I'll just show the code for finding the files and display their filenames in a buffer.

I've put the source on google code.

After downloading you can load the source into emacs and call `eval-buffer', then open up a dired buffer to try it out. For this to be useful you need some duplicated files, so make some if you need to.

Mark the files you want to check for duplicates. For example to mark all jpg files you would type %m to mark files matching a regexp and type .*\.jpg

Now execute the command `dired-show-marked-duplicate-files' and after a short delay (in my test 80 jpg photos took about 5 seconds) you'll see a buffer called 'Duplicated files' which contains a list of the files which have the same contents.

Next steps for this little project will be to give you an interactive way to delete the duplicated files. I haven't decided quite how I'd like that to work, drop me an email if you have an idea. I've been thinking about perhaps resetting which files are marked so that only the duplicates are marked. At that point you can then hit R to move them to another spot, or delete them with x.

Now some comments about the code involved...

Most of the work is done in the function dired-show-marked-duplicate-files. First line " (interactive)" makes it an interactive function, meaning the user of emacs can invoke it.

"(if (eq major-mode 'dired-mode)" will check that we're in the right kind of buffer, because it makes no sense to run this in another mode.

In order to find the duplicate files I just need to walk the list of marked files, generate the md5 value of the contents of each one and add it to hash table. The keys in the hash table will be the md5, and the values will be a list of files with that md5. Once we've done that, finding duplicates is a simple matter of walking the hash table keys and displaying any where the value has multiple entries.

"(let ((md5-map (make-hash-table :test 'equal :size 40)))" Creates the hash table, making sure we use 'equal to match our filenames.

"(let ((filenames (dired-get-marked-files)))" this gets the marked files as a list of filenames

The next little bit of code is just to store the item in the hash table after getting the md5. There's no function in emacs to get the md5 of a file, but you can get the md5 of a string, so I wrote a helper function for getting the contents of a file into a temporary buffer first.

(defun md5-file(filename)
"Open FILENAME, load it into a buffer and generate the md5 of its contents"
(interactive "f")
(with-temp-buffer
(insert-file-contents filename)
(md5 (current-buffer))))

Finally I want to display the results, so I create a buffer and then use maphash (walks the keys of a hash table executing a function on each) with a helper function `show-duplicate' which simply writes the values of the hash table entry into that buffer.