Uli's Web Site
[ Zathras.de - Uli's Web Site ]
Other Sites: Stories
Pix
Abi 2000
Stargate: Resurgence
Lost? Site Map!
 
 
     home | blog | moose | programming | articles >> blog

 Blog
 
 Archive
 
 Blog Topics
 

15 Most Recent [RSS]

 Less work through Xcode and shell scripts
2011-12-16 @600
 
 iTunesCantComplain released
2011-10-28 @954
 
 Dennis Ritchie deceased
2011-10-13 @359
 
 Thank you, Steve.
2011-10-06 @374
 
 Cocoa Text System everywhere...
2011-03-27 @788
 
 Blog migration
2011-01-29 @520
 
 All you need to know about the Mac keyboard
2010-08-09 @488
 
 Review: Sherlock
2010-07-31 @978
 
 Playing with Objective C on Debian
2010-05-08 @456
 
 Fruit vs. Obst
2010-05-08 @439
 
 Mixed-language ambiguity
2010-04-15 @994
 
 Uli's 12:07 AM Law
2010-04-12 @881
 
 Uli's 1:24 AM Law
2010-04-12 @874
 
 Uli's 6:28 AM Law
2010-04-12 @869
 
 Uli's 3:57 PM Law
2010-04-12 @867
 

More...

Reporting error lines in FLex

FLex is a great tokenizer-generator, but one of the things it sucks at when used together with YACC is error reporting. Here's one technique I use to be able to at least tell the user what line an error occurred on: I simply declare a global and assign it the current line number from the code in each token. You can declare the global up there where you already have your #include "y.tab.h":

%{
    #include "y.tab.h"
    int gLineNumber = 0;
%}

And then you can declare the newline token as:
[\n\r]          { gLineNumber++; return NEWLINE; }

If your language is like C and sees a return as simple whitespace to be skipped, just leave away the return statement. Now, you have everything you need to define your yyerror function that provides some more useful output:
int yyerror( const char* str )
{
        fprintf( stderr, "ERROR: %s (line %d)\n", str, gLineNum );
        
        return 0;
}

Now, that's a nice and handy solution, but there's one problem: What if your language has a token (like a multi-line comment or string) that may also contain line breaks? If you define your comment token as:

\/\*.*\*\/     ;

Those line breaks will not be counted, throwing off your line number. The best solution I found was using FLex's states. A state is simply a specially-labeled group of tokens that can only occur while you're in a specified state. You define a state by specifying %start statename in the options section, in our case %start multilinecomment. To "turn on" a state, you write BEGIN statename. Here's how our comment-parsing code looks:
\/\*                                        { BEGIN multilinecomment; }
<multilinecomment>[\n\r]                    { ++gLineNum; }
<multilinecomment>\*\/                      { BEGIN INITIAL; }
<multilinecomment>.*                        ;

As you see, you mark a token as belonging to a state by writing the state name in angle brackets before the regular expression. All tokens that aren't labeled with a state are automatically added to the state INITIAL, and that's the state we return to when the comment ends.

Handy. Isn't it?

Reader Comments: (RSS Feed)
mulmail writes:
What's wrong with the yylineno option?
Uli Kusterer replies:
I'm not sure right now, but for some reason I couldn't get that to work. Might be that I just didn't know about %option yylineno. I also had to jump through some hoops because I was compiling the parser into a system with lots of C++ code, so maybe there was some undesired interaction there, I don't remember offhand. But you're right, if you can get yylineno working, that's definitely the recommended approach.
Comment on this article:
Name:
E-Mail: (not shown, hashed for Gravatar)
Web Site URL: (optional)
Comment: (plain text only)
Please Enter the following word:
Or E-Mail Uli privately.
 
Created: 2005-09-30 @045 Last change: 2014-09-24 @974 | Home | Admin | Edit
© Copyright 2003-2014 by M. Uli Kusterer, all rights reserved.