Slurping a file in Common Lisp

It should be easy to slurp a file. I could think of plenty of ways to do it, but it took a while to come up with something comparable with perl slurping.

Here's a rough evolution of my slurp-stream function. You can do anything you like with the code. It's there so you don't have to go through the discovery process I went through.

[Source and copyright www.emmett.ca/~sabetts/slurp.html]

As a test file I used my 5M kernel file.

Character by character

My gut told me this was just plain stupid.

(defun slurp-stream1 (stream)
  (with-output-to-string (out)
    (do ((x (read-char stream nil stream) (read-char stream nil stream)))
        ((eq x stream))
      (write-char x out))))

Results

;   63.24 seconds of real time
;   446,879,200 bytes consed.

Notes

This isn't worth the time I spent writing it.

Line By Line

Clearly reading the file in by blocks is the answer. That's how hardrives works. Line by line seemed like a good place to start.

(defun slurp-stream2 (stream)
  (with-output-to-string (out)
    (loop
     (multiple-value-bind (line nl) (read-line stream nil stream)
       (when (eq line stream)
         (return))
       (write-string line out)
       (unless nl
         (write-char #Newline out))))))

results

;   1.03 seconds of real time
;   37,492,232 bytes consed.

It turned out to be surprisingly fast. 60 times faster than char by char!

Notes

This is problematic in several ways. read-line strips the newline so I have to manually put one back. Plus binary files don't have newlines. So lines are a lousy way to break up the input.

In 1024 byte chunks

Now we're getting closer. Read in 1024 characters at a time, accumulating them. I remember slurping files in C like this.

(defun slurp-stream3 (stream)
  (with-output-to-string (out)
    (let ((seq (make-array 1024 :element-type 'character
                          :adjustable t
                          :fill-pointer 1024)))
      (loop
       (setf (fill-pointer seq) (read-sequence seq stream))
       (when (zerop (fill-pointer seq))
        (return))
       (write-sequence seq out)))))


Results

;   1.54 seconds of real time
;   23,085,744 bytes consed.

Notes

This solves the problems of the line by line function. Surprisingly its slower. Presumably it'd be faster reading text files since the chunks would be larger. On the upside it conses a lot less.

Is this actually very good?

I was pretty pleased with myself. 1s to load 5M. That's pretty good, right? Apparently not. I wrote a perl program to slurp a file and was shocked.

#!/usr/bin/perl

use strict;
local $/;
my $s = <>;
print length($s);
print "n";

Results

real   0m0.049s

Notes

Ouch. Back to the drawing board.

All at once

After trying 4k and 16k blocks and not getting any better results, I thought: why not read it in with one call to read-sequence? Then the magic of read-sequence can take care of reading the file in blocks.

(defun slurp-stream4 (stream)
  (let ((seq (make-string (file-length stream))))
    (read-sequence seq stream)
    seq))

Results

;   0.03 seconds of real time
;   5,849,496 bytes consed.

Notes

Smaller, faster, simpler. Why didn't I think of this in the first place? And it's satisfyingly faster than the perl program, although that's probably because perl has to parse the program as well.

To determine if my CL function was faster than my Perl function, I decided to try a 189670832 byte file.

Perl: | real 0m11.533s

CL: | ; 10.41 seconds of real time | ; 189,672,328 bytes consed.

I ran the test several times. The results fluxuated probably because the kernel cached the file. I recorded the shortest times I got.

All at once for multibyte character files

slurp-stream4 doesn't work if the file contains multibyte characters and your lisp interpretes them into a single character. A minor change fixes that problem, but wastes some space. The price for speed, I suppose. It works on the assumption that the number of characters read will never be greater than the number of bytes.

(defun slurp-stream5 (stream)
  (let ((seq (make-array (file-length stream) :element-type 'character :fill-pointer t)))
    (setf (fill-pointer seq) (read-sequence seq stream))
    seq))

Results

Conclusion

Common Lisp is still the rad.