
15 Most Recent [RSS]
More...
|
Reporting error lines in FLex
FLex is a great tokenizer-generator, but one of the things it sucks at when used together with YACC is error reporting. Here's one technique I use to be able to at least tell the user what line an error occurred on: I simply declare a global and assign it the current line number from the code in each token. You can declare the global up there where you already have your #include "y.tab.h":
%{
#include "y.tab.h"
int gLineNumber = 0;
%}
And then you can declare the newline token as:
[\n\r] { gLineNumber++; return NEWLINE; }
If your language is like C and sees a return as simple whitespace to be skipped, just leave away the return statement. Now, you have everything you need to define your yyerror function that provides some more useful output:
int yyerror( const char* str )
{
fprintf( stderr, "ERROR: %s (line %d)\n", str, gLineNum );
return 0;
}
Now, that's a nice and handy solution, but there's one problem: What if your language has a token (like a multi-line comment or string) that may also contain line breaks? If you define your comment token as:
\/\*.*\*\/ ;
Those line breaks will not be counted, throwing off your line number. The best solution I found was using FLex's states. A state is simply a specially-labeled group of tokens that can only occur while you're in a specified state. You define a state by specifying %start statename in the options section, in our case %start multilinecomment. To "turn on" a state, you write BEGIN statename. Here's how our comment-parsing code looks:
\/\* { BEGIN multilinecomment; }
<multilinecomment>[\n\r] { ++gLineNum; }
<multilinecomment>\*\/ { BEGIN INITIAL; }
<multilinecomment>.* ;
As you see, you mark a token as belonging to a state by writing the state name in angle brackets before the regular expression. All tokens that aren't labeled with a state are automatically added to the state INITIAL, and that's the state we return to when the comment ends.
Handy. Isn't it?
Uli Kusterer replies: ★ I'm not sure right now, but for some reason I couldn't get that to work. Might be that I just didn't know about %option yylineno. I also had to jump through some hoops because I was compiling the parser into a system with lots of C++ code, so maybe there was some undesired interaction there, I don't remember offhand. But you're right, if you can get yylineno working, that's definitely the recommended approach. |
|  |