Slurping a file in Common Lisp
It should be easy to slurp a file. I could think of plenty of ways to do it, but it took a while to come up with something comparable with perl slurping.
Here's a rough evolution of my slurp-stream function. You can do anything you like with the code. It's there so you don't have to go through the discovery process I went through.
[Source and copyright www.emmett.ca/~sabetts/slurp.html]
As a test file I used my 5M kernel file.
Character by character
My gut told me this was just plain stupid.
(defun slurp-stream1 (stream) (with-output-to-string (out) (do ((x (read-char stream nil stream) (read-char stream nil stream))) ((eq x stream)) (write-char x out)))) Results ; 63.24 seconds of real time ; 446,879,200 bytes consed.
This isn't worth the time I spent writing it.
Line By Line
Clearly reading the file in by blocks is the answer. That's how hardrives works. Line by line seemed like a good place to start.
(defun slurp-stream2 (stream) (with-output-to-string (out) (loop (multiple-value-bind (line nl) (read-line stream nil stream) (when (eq line stream) (return)) (write-string line out) (unless nl (write-char #Newline out)))))) results ; 1.03 seconds of real time ; 37,492,232 bytes consed.
It turned out to be surprisingly fast. 60 times faster than char by char!
This is problematic in several ways. read-line strips the newline so I have to manually put one back. Plus binary files don't have newlines. So lines are a lousy way to break up the input.
In 1024 byte chunks
Now we're getting closer. Read in 1024 characters at a time, accumulating them. I remember slurping files in C like this.
(defun slurp-stream3 (stream) (with-output-to-string (out) (let ((seq (make-array 1024 :element-type 'character :adjustable t :fill-pointer 1024))) (loop (setf (fill-pointer seq) (read-sequence seq stream)) (when (zerop (fill-pointer seq)) (return)) (write-sequence seq out))))) Results ; 1.54 seconds of real time ; 23,085,744 bytes consed.
This solves the problems of the line by line function. Surprisingly its slower. Presumably it'd be faster reading text files since the chunks would be larger. On the upside it conses a lot less.
Is this actually very good?
I was pretty pleased with myself. 1s to load 5M. That's pretty good, right? Apparently not. I wrote a perl program to slurp a file and was shocked.
#!/usr/bin/perl use strict; local $/; my $s = <>; print length($s); print "n"; Results real 0m0.049s
Ouch. Back to the drawing board.
All at once
After trying 4k and 16k blocks and not getting any better results, I thought: why not read it in with one call to read-sequence? Then the magic of read-sequence can take care of reading the file in blocks.
(defun slurp-stream4 (stream) (let ((seq (make-string (file-length stream)))) (read-sequence seq stream) seq)) Results ; 0.03 seconds of real time ; 5,849,496 bytes consed.
Smaller, faster, simpler. Why didn't I think of this in the first place? And it's satisfyingly faster than the perl program, although that's probably because perl has to parse the program as well.
To determine if my CL function was faster than my Perl function, I decided to try a 189670832 byte file.
Perl: | real 0m11.533s
CL: | ; 10.41 seconds of real time | ; 189,672,328 bytes consed.
I ran the test several times. The results fluxuated probably because the kernel cached the file. I recorded the shortest times I got.
All at once for multibyte character files
slurp-stream4 doesn't work if the file contains multibyte characters and your lisp interpretes them into a single character. A minor change fixes that problem, but wastes some space. The price for speed, I suppose. It works on the assumption that the number of characters read will never be greater than the number of bytes.
(defun slurp-stream5 (stream) (let ((seq (make-array (file-length stream) :element-type 'character :fill-pointer t))) (setf (fill-pointer seq) (read-sequence seq stream)) seq)) Results
Common Lisp is still the rad.