PERL CVS PARROT 14 CVS COMMIT PARROT LANGUAGES REGEX LIB REGEX DRIVER PM
Date: 21 Feb 2005 23:12:03 -0000

Subject: cvs commit: parrot/languages/regex/lib/Regex Driver.pm
From: sfink@no-spam (Steve Fink)

cvsuser 05/02/21 15:12:03

Modified: languages/regex README regex.pl test.pl languages/regex/lib/Regex Driver.pm Log:
Make test.pl be for testing only; move main routine generation into Regex::Driver, and access it from the regex.pl command line. Also tone down the README.
Revision Changes Path 1.17 +45 -45 parrot/languages/regex/README Index: README ===================================================================
RCS file: /cvs/public/parrot/languages/regex/README,v retrieving revision 1.16
retrieving revision 1.17
diff -u -r1.16 -r1.17
--- README 25 Nov 2004 08:37:30 -0000 1.16
+++ README 21 Feb 2005 23:12:02 -0000 1.17
@@no-spam -14,6 +14,9 @@no-spam perl regex.pl '(a*|b)a'
+(Note that on some platforms, including Windows, you may need to use +double quotes in place of the single quotes.)
+
To turn off optimization so you tell what's going on a little better:
perl regex.pl --no-optimize '(a*|b)a'
@@no-spam -49,16 +52,25 @@no-spam to generate them.
The above commands only generate subroutines for matching regular -expressions. They don't provide fun test programs to try out. For -that, use the test.pl script with the -c flag to generate a test.pbc -file that will accept an input string on the command line:
+expressions. You will need to provide your own code to call the +matching routine with some input. Or, just to try it out, you can use +the --main argument to regex.pl to generate a sample driver program +that accepts an input string on the command line:
- perl test.pl -c --expr='a*x'
- ../../parrot test.pbc baax + perl regex.pl --main -o mytest.imc 'a*x'
+ ../../parrot mytest.imc baax (output)
Match found 0: 1..3
+Or for the Perl5 backend:
+
+ perl regex.pl --language=perl5 --main -o mytest.pl 'a*x'
+ perl mytest.pl baax +
+Note that the output in this case is a bunch of nasty Perl data +structures encoding the matching information.
+
New stuff: now you can use the compiler as an embedded Parrot compiler. Run 'make regex-compiler.pbc' to generate regex-compiler.pbc. Then run ../../parrot 01_basic.imc, ../../parrot @@no-spam -87,7 +99,7 @@no-spam and run (on Unix):
- perl regex.pl --language=perl5 --file=expr.rx > expr.pl + perl regex.pl --main --language=perl5 --file=expr.rx > expr.pl perl expr.pl '3+(45*2)'
All of the <?foo> things are there so that the rules get nicely @@no-spam -98,20 +110,18 @@no-spam of setting the value of a rule, so they are of limited usefulness. An example:
- perl regex.pl --language=perl5 'a* { print "Matched $MATCH{0}\n" }'
+ perl regex.pl --language=perl5 --main 'a* { print "Matched some as\n" } b'
+
+In some cases, you can also access the current state of the match:
+
+ perl regex.pl --language=perl5 --main 'a* { print "Matched $MATCH{0}\n" } b'

+
+The first example should now also work with --language=pir (the +default) as well. The second example wouldn't, because it isn't valid +PIR.
-The above should now also work with --language=pir as well (actually,
-that's the default). Except that I have everything tangled up. If you -want a Perl5 program that takes input on stdin and reports the results -of matching against it, you can use regex.pl. If you want the same for -PIR code, you'll need to use test.pl.
-
-regex.pl, when using PIR, only generates the core matching -subroutine(s). The idea is that you would use this script for -embedding this compiler into your own compiler. Examples of doing this -on the fly are in 01_basic.imc and 02_date.imc. test.pl, on the other -hand, is really intended for writing PIR tests of a particular format,
-and doesn't support the Perl5 backend yet.
+Examples of using regex.pl to generate code that is compiled on the +fly are in 01_basic.imc and 02_date.imc.
If you use the -d flag, you'll get a very verbose description of the matching progress as you run.
@@no-spam -274,40 +284,31 @@no-spam compilers/pge. This engine (in languages/regex, referred to in the following text as l/rx) predates pge by a year or two, but never managed to generate sufficient interest to get anyone else involved.
-Patrick must have been aware of this engine, since I told him about it -both personally and in a message to perl6-internals, but he has never -acknowledged its existence nor explained why he felt the need to start -from scratch. I have to conclude that he either looked at it and -didn't like the design or the implementation; or he just wanted to -start from scratch so that he could fully understand the system he was -working on. All of which are perfectly good reasons, so I bear no ill -will towards the official effort.
-I am assuming that pge is going to get the momentum of the community +I am assuming that PGE is going to get the momentum of the community behind it, so I would advise anyone interested in working on a rule engine to look there first. (Look for discussion on the perl6-internals and perl6-compiler mailing lists.) However, I still intend to work on this engine for a while longer, and welcome any interested participants. (Send any requests/comments/suggestions either to perl6-internals or directly to me at steve@no-spam So -far, I have only briefly looked at pge, but I think this +far, I have only briefly looked at PGE, but I think this languages/regex engine has enough of a different approach that it is still valuable for gathering lessons -- and may still make the most sense in the long run.
-That last statement demands a bit of explanation, so here's an excerpt -of a mail I sent out after my first look at pge (remember, pge has -probably advanced past this point by now):
+My take on the comparison between l/rx and PGE as of the first public +release of PGE:
It sounds like l/rx handles pretty much exactly the same things as -pge, probably a few more and a few less. I haven't actually looked at +PGE, probably a few more and a few less. I haven't actually looked at the code, but from the description I'd guess that the main differences are:
- - pge is implemented in C; l/rx in Perl5. Both are reaching towards + - PGE is implemented in C; l/rx in Perl5. Both are reaching towards the "bootstrap point", when they'll be implemented in PIR.
- - pge generates PIR; l/rx has both PIR and Perl5 backends - - pge uses coroutines and continuations; I have always been too + - PGE generates PIR; l/rx has both PIR and Perl5 backends + - PGE uses coroutines and continuations; I have always been too wary of their stability, so I use plain subs (with a 'mode'
parameter to tell it whether to try to match or backtrack)
- Both allow you to "continue" a match to find all other possible @@no-spam -315,19 +316,18 @@no-spam implementation (you have to keep all that state around somehow anyway)
- l/rx uses match objects (dynclasses/match.pmc) and automatically - generates a parse tree out of them - - pge has built-in "dump out the matching info" routines; I make my + generates a parse tree out of them. PGE has a dynamically created + Parrot class "PGE::Match" that I assume does something similar.
+ - PGE has built-in "dump out the matching info" routines; I make my test harnesses generate their own. I'm jealous.
- The feature sets are nearly identical. Makes sense, I suppose --
low-hanging fruit and all that.
- It sounds like the internal design is rather different. I try hard - to compile down to very minimalistic PIR ops. It sounds like pge + to compile down to very minimalistic PIR ops. It sounds like PGE uses lots of higher-level operations, to do things like processing - a whole chunk at a time. (Although on the other hand, pge uses - more native Parrot flow control mechanisms than I do.)
+ a whole chunk of input at a time. (Although on the other hand, PGE + uses more native Parrot flow control mechanisms than I do.) (And + I really haven't looked closely enough to substantiate this.)
- Closely related to the above, I have a number of optimizations - already implemented, but I suspect pge will end up with a very + already implemented, but I suspect PGE will end up with a very different set of optimizations.
- - I have on average about 5 hours a week to work on l/rx; Patrick - has quite a bit more :-) (Which does NOT mean that I work faster;
- my engine is at least a year older than pge.)
1.16 +4 -0 parrot/languages/regex/regex.pl Index: regex.pl ===================================================================
RCS file: /cvs/public/parrot/languages/regex/regex.pl,v retrieving revision 1.15
retrieving revision 1.16
diff -u -r1.15 -r1.16
--- regex.pl 25 Nov 2004 08:37:30 -0000 1.15
+++ regex.pl 21 Feb 2005 23:12:02 -0000 1.16
@@no-spam -14,6 +14,7 @@no-spam my $output;
my $subname;
my $language = "pir";
+my $main;
while (@no-spam {
$_ = shift;
@@no-spam -24,6 +25,8 @@no-spam my $opts = $1;
$tree_opt = ($opts =~ /t/i);
$list_opt = ($opts =~ /l/i);
+ } elsif (/--main/ || $_ eq '-m') {
+ $main = 1;
} elsif (/--debug/ || $_ eq '-d') {
$debug = 1;
} elsif (/--output-file=(.*)/) {
@@no-spam -66,6 +69,7 @@no-spam $options{'no-list-optimize'} = 1 if ! $list_opt;
$options{'DEBUG'} = 1 if $debug;
$options{subname} = $subname if $subname;
+$options{emit_main} = 1 if $main;
if ($language eq 'pir') {
$options{module} = "Regex::CodeGen::IMCC";
1.21 +2 -57 parrot/languages/regex/test.pl Index: test.pl ===================================================================
RCS file: /cvs/public/parrot/languages/regex/test.pl,v retrieving revision 1.20
retrieving revision 1.21
diff -u -r1.20 -r1.21
--- test.pl 25 Nov 2004 03:51:25 -0000 1.20
+++ test.pl 21 Feb 2005 23:12:02 -0000 1.21
@@no-spam -140,8 +140,7 @@no-spam my $ctx = { };
my $trees = Regex::expr_to_tree($pattern, $ctx, DEBUG => $DEBUG);
- my $driver = Regex::Driver->new('pir');
- $driver->output_header(*PIR);
+ my $driver = Regex::Driver->new('pir', emit_main => 1);
print PIR <<"END";
# Regular expression test @@no-spam -150,61 +149,7 @@no-spam END - print PIR <<'END';
-.sub _main @no-spam - .param pmc args - .local string input_string - input_string = args[1]
-
- $P0 = loadlib "match_group"
-
- .local pmc regex_sub - .local pmc result - .local int matched - .local pmc stack - stack = new PerlArray - regex_sub = newsub _default - result = regex_sub(1, input_string, 0, stack)
- matched = result["!RESULT"]
- if matched goto printResults -
-printMatchFailed:
- print "Match failed\n"
- goto done -printResults:
- print "Match found\n"
- .local int num_groups - .local int match_num - .local int ii - .local int valid_flag - set num_groups, result["!GROUPS"]
- set match_num, 0
-printLoop:
- ge match_num, num_groups, done - bsr printGroup - inc match_num - goto printLoop -done:
- .return ()
-
-printGroup:
- .local int match_start - .local int match_end - set match_start, result[match_num;0]
- set match_end, result[match_num;1]
- eq match_start, -2, skipPrint - eq match_end, -2, skipPrint - print match_num - print ": "
- print match_start - print ".."
- print match_end - print "\n"
-skipPrint:
- set valid_flag, 1
- ret -.end -END + $driver->output_header(*PIR);
for my $tree (@no-spam {
$driver->output_rule(*PIR, '_regex', $tree, $ctx, DEBUG => $DEBUG);
1.4 +65 -17 parrot/languages/regex/lib/Regex/Driver.pm Index: Driver.pm ===================================================================
RCS file: /cvs/public/parrot/languages/regex/lib/Regex/Driver.pm,v retrieving revision 1.3
retrieving revision 1.4
diff -u -r1.3 -r1.4
--- Driver.pm 25 Nov 2004 04:37:57 -0000 1.3
+++ Driver.pm 21 Feb 2005 23:12:03 -0000 1.4
@@no-spam -56,13 +56,10 @@no-spam print $fh "$_\n" foreach (@no-spam }
-# FIXME! This is only valid for a manual test program!
sub output_footer {
my ($self, $fh) = @no-spam + return 1 unless $self->{emit_main};
- # print OUTPUT "\nmy \$m = _rule_default(1, \$ARGV[0], 0);\n";
- # print OUTPUT "use Data::Dumper;\n";
- # print OUTPUT "print Dumper(\$m);\n";
print $fh <<'END';
sub reduce {
my ($m, $input) = @no-spam @@no-spam -84,7 +81,7 @@no-spam my ($input) = @no-spam my $m = _rule_default(1, $input, 0, []);
use Data::Dumper;
-print Dumper($m);
+print Data::Dumper->Dump([$m],["*DEFAULT_RULE_MATCH"]);
return reduce($m, $input);
}
sub minimatch {
@@no-spam -99,25 +96,76 @@no-spam my $m = match($ARGV[0]);
$Data::Dumper::Sortkeys = 1;
use Data::Dumper;
-print Dumper($m);
+print Data::Dumper->Dump([$m],["*MATCH_OBJECT"]);
my $mini = minimatch($m);
-use Data::Dumper;
-print Dumper($mini);
+print Data::Dumper->Dump([$mini],["*CAPTURES"]);
END }
package Regex::Driver::PIR;
our @no-spam = qw(Regex::Driver);
-# sub output_header {
-# my ($self, $fh) = @no-spam -# print $fh <<'END';
-# .sub __init_regex @no-spam -# loadlib $P0, "match_group"
-# .return ()
-# .end -# END -# }
+sub output_header {
+ my ($self, $fh) = @no-spam + $self->SUPER::output_header($fh);
+ return 1 unless $self->{emit_main};
+
+ print $fh <<'END';
+.sub _main @no-spam + .param pmc args + .local string input_string + input_string = args[1]
+
+ $P0 = loadlib "match_group"
+
+ .local pmc regex_sub + .local pmc result + .local int matched + .local pmc stack + stack = new PerlArray + regex_sub = newsub _default + result = regex_sub(1, input_string, 0, stack)
+ matched = result["!RESULT"]
+ if matched goto printResults +
+printMatchFailed:
+ print "Match failed\n"
+ goto done +printResults:
+ print "Match found\n"
+ .local int num_groups + .local int match_num + .local int ii + .local int valid_flag + set num_groups, result["!GROUPS"]
+ set match_num, 0
+printLoop:
+ ge match_num, num_groups, done + bsr printGroup + inc match_num + goto printLoop +done:
+ .return ()
+
+printGroup:
+ .local int match_start + .local int match_end + set match_start, result[match_num;0]
+ set match_end, result[match_num;1]
+ eq match_start, -2, skipPrint + eq match_end, -2, skipPrint + print match_num + print ": "
+ print match_start + print ".."
+ print match_end + print "\n"
+skipPrint:
+ set valid_flag, 1
+ ret +.end +END +}
sub output_rule_body {
my ($self, $fh, $subname, $rule, $ctx, $instructions) = @no-spam