0README ./uncomment |-- 0Postings directory - statement of problem; responses .... | |-- postings.txt text file summary of postings | `-- uncomment-[0a]-[1-4].html 5 html files of postings |-- 0README this file |-- Makefile to test solutions, run: make test |-- Solutions directory - ca. 10 solutions that were posted |-- Tests directory - results of tests (see Makefile) |-- Trials-lex directory - different lex solutions |-- eatC.c chris torek's C solution |-- eatLex.l john rupley's lex solution |-- eatSed maarten litmaath's sed solution `-- uncomment.tst2 test file with pathological C comments PROBLEM: remmove comments from C code. SOLUTIONS: three selected; one written in C (by Chris Torek), another a sed script (by Maarten Litmaath), the third a lex source (by myself). It is instructive to compare the examples. The C code is fast and straightforward but difficult to get right (Chris Torek wrote it in 10 minutes, correctly, but several other posters did not do so well). Length: 58 lines. Maarten Litmaath's sed script is a tour de force, and worth study for its techniques. The posting (in file: 0postings/postings.txt) comments on the method. But it is not the simplest or the fastest way to uncomment. Length: 78 lines. Lex is preferred, IMHO - simple, easy to write and get right, and in time of execution close to the C-code solution. And it's the shortest by a factor of 3 or more. Length: 15 lines. Lex is character-stream oriented, as distinguished from the line orientation of sed and awk, so it serves best for matching patterns that cross line boundaries. This is perhaps the take-home message of the exercise. C-code solution (by Chris Torek): /* In article <16539@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: In article <9864@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes: >Score, anyone? (recent postings tested on K&R-I-syntax code) > > sed 1/1 correct > Lex 2/2 correct > C 2/2 wrong This sounds like a CHALLENGE! :-) I wrote the following working against the ten-minute spaghetti clock. It is slightly tested, and probably works, with the exception of #include (and unclosed comments, etc., in included files). It is more permissive than real C (allowing newlines in string and character constants, and allowing infintely long character constants) but should not get anything wrong that cpp gets right. Of course, there are no comments in it. :-) */ #include enum states { none, slash, quote, qquote, comment, cstar }; main() { register int c, q = 0; register enum states state = none; while ((c = getchar()) != EOF) { switch (state) { case none: if (c == '"' || c == '\'') { state = quote; q = c; } else if (c == '/') { state = slash; continue; } break; case slash: if (c == '*') { state = comment; continue; } state = none; (void) putchar('/'); break; case quote: if (c == '\\') state = qquote; else if (c == q) state = none; break; case qquote: state = quote; break; case comment: if (c == '*') state = cstar; continue; case cstar: if (c != '*') state = c == '/' ? none : comment; continue; default: fprintf(stderr, "impossible state %d\n", state); exit(1); } (void) putchar(c); } if (state != none) fprintf(stderr, "warning: file ended with unterminated %s\n", state == quote || state == qquote ? (q=='"' ? "string" : "character constant") : "comment"); exit(0); } /* In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris */ Sed script solution (by Maarten Litmaath): : loop /^$/{ x p n b loop } /^"/{ : double /^$/{ x p n b double } H x s/\n\(.\).*/\1/ x s/.// /^"/b break /^\\/{ H x s/\n\(.\).*/\1/ x s/.// } b double } /^'/{ : single /^$/{ x p n b single } H x s/\n\(.\).*/\1/ x s/.// /^'/b break /^\\/{ H x s/\n\(.\).*/\1/ x s/.// } b single } /^\\/{ H x s/\n\(.\).*/\1/ x b break } /^\/\*/{ s/.// : comment s/.// /^$/n /^*\//{ s/..// b loop } b comment } : break H x s/\n\(.\).*/\1/ x s/.// b loop Lex solution (by John Rupley): %{ /* * long strings (with escaped newlines) blow yytext[YYLMAX]; * this is a feature, not a bug. */ %} STRING \"(\\\n|\\\"|[^"\n])*\" COMMENTBODY ([^*\n]|"*"+[^*/\n])* COMMENTEND ([^*\n]|"*"+[^*/\n])*"*"*"*/" QUOTECHAR \'[^\\]\'|\'\\.\'|\'\\[x0-9][0-9]*\' ESCAPEDCHAR \\. %START COMMENT %% {COMMENTBODY} ; {COMMENTEND} BEGIN 0; .|\n ; "/*" BEGIN COMMENT; {STRING} ECHO; {QUOTECHAR} ECHO; {ESCAPEDCHAR} ECHO; .|\n ECHO; For a test of all three: make spotless test For other solutions offered, see the directory Solutions, and for yet more, the postings. For discussion, see the comp.lang.c postings: 0Postings/postings.txt or, more readable, 0Postings/uncomment-[0a]-[1-4].html John Rupley rupley@u.arizona.edu -or- jar@rupley.com 30 Calle Belleza, Tucson AZ 85716 - (520) 325-4533; fax - (520) 325-4991 Dept. Biochemistry & Molecular Biophysics, Univ. Arizona, Tucson AZ 85721