pregexp over streams, and other random hackery

By: on October 11, 2005

Over at eighty-twenty I recount a couple of recent random excursions into various bits and pieces of code.

Besides those developments, I also spent some time on Sunday morning modifying Dorai Sitaram‘s pregexp version 20050502 to operate
over streams as well as strings, so that I could use it for lexing
arbitrary character sources (for instance, with the packrat parser library I’ve been developing).

The basic interface, after the patch, is now either the standard

(pregexp-match )

or the new

(pregexp-match )

where streams are created with

(pregexp-make-stream )

I’ve also added a procedure pregexp-match-head, which is like
pregexp-match-positions except it only matches at the very beginning of
the input string or stream; pregexp-match-head behaves like Python’s
re.match, where pregexp-match-positions behaves like Python’s

I haven’t modified pregexp-split, pregexp-replace, or pregexp-replace*,
partly because I have no need for them for my application and partly
because I’m not sure what their behaviour should be: should
pregexp-replace, when given a stream, answer a stream, or a string? In
the case of pregexp-split, since it has to examine the entire input in
any case, supporting streams seems unnecessary. (Perhaps I should have
included a pregexp-stream->string utility, though.)

The patch against version 20050502 is downloadable here.

Thanks to Dorai for a great library!

Update: I noticed a bug in the first revision of the patch. I’ve updated the links in the article above to point to the new patch.


Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>