--- /dev/null
+YACC precedence and shift/reduce errors
+=======================================
+
+While I was working on a BibTeX parser, I ran into some trouble with
+shift/reduce errors in `yacc`. The problem essentially boiled down
+to: How do you define implicit multiplication in `yacc`?
+
+Keep reading for my intro to `yacc`, or, if you just want the simple
+solution, [skip to the end](#implicit_multiplication).
+
+Related work
+------------
+
+After much googling about, I would like to recommend the following sources:
+
+* David Beazley's [PLY documentation][DB] which has an good
+ introduction to shift/reduce errors with a nice walkthrough
+ shift/reduce example;
+* Stephen Johnson's [yacc introduction][SJ] (see Section 4: How the
+ Parser Works), which helped me understand the state-stack concept
+ which hadn't sunk in until then;
+* Leon Kaplan's [yaccviso documenation][LK], which reminded me that
+ visualizations are good, and inspired my resurrection of the
+ `yacc2dot` tool in Python; and
+* Xavier Leroy et al.'s [ocamlyacc documentation][XL],
+ where rule-vs-operator precedence finally clicked.
+
+For further details and entertaining browsing, you can take a slow
+meander through the [Bison manual][bison].
+
+Tools
+-----
+
+The following graphics were generated from the `y.output`
+files output by `yacc` using [my yacc2dot script](#yacc2dot) and
+[Graphviz dot][graphviz].
+
+Example 1: Single, explicit addition
+------------------------------------
+
+As a brief introduction to how parsers work, consider the *really*
+simple grammar
+
+[[!inline pagenames="p.y" template="raw" feeds="no"]]
+
+which generates the following `yacc` output and `dot` image:
+
+[[!inline pagenames="p.output" template="raw" feeds="no"]]
+[[!img p.png alt="Visualization of plus-only grammar"
+ title="Visualization of plus-only grammar" ]]
+
+Parsing input like `X PLUS X` would look like
+
+<table>
+ <tr><th>Step</th><th>State stack</th><th>Symbol stack</th><th>Symbol queue</th><th>Action</th></tr>
+ <tr><td>0</td><td>0</td><td>$accept</td><td>X PLUS X $end</td><td>Shift X</td></tr>
+ <tr><td>1</td><td>01</td><td>$accept X</td><td>PLUS X $end</td><td>Shift PLUS</td></tr>
+ <tr><td>2</td><td>013</td><td>$accept X PLUS</td><td>X $end</td><td>Shift X</td></tr>
+ <tr><td>3</td><td>0135</td><td>$accept X PLUS X</td><td>$end</td><td>Reduce using rule 1, pop back to state 0, and goto state 2</td></tr>
+ <tr><td>4</td><td>02</td><td>$accept expr</td><td>$end</td><td>Shift $end</td></tr>
+ <tr><td>5</td><td>024</td><td>$accept expr $end</td><td></td><td>Accept using rule 0</td></tr>
+</table>
+
+The key step here is 3-to-4. We reduce using Rule 1
+
+ (1) expr: X PLUS X
+
+Since there are three symbols on the rule's right hand side, we pop
+three symbols from the stack. The symbols we pop match the right hand
+side, which is why we can apply the rule. We also pop three *states*
+(1, 3, and 5), which returns us to state 0. We look up state 0's
+action for rules reducing to expr, and see that we're supposed to go
+to state 2, so we do. Whew!
+
+It's a lot of words, and not very intuitive for me the first time
+around, but the underlying idea is pretty simple.
+
+The tokens `$accept` and `$end` are inserted by the parser so it knows
+if the symbols left after parsing are acceptable or not.
+
+The input token string `X PLUS X` is the only valid token string for
+this grammar. For extra credit, you can figure out where other input
+strings go wrong. For example, consider `X` and `X PLUS X PLUS X`.
+
+More examples
+-------------
+
+In my gradual work up to implicit multiplication, I went through the
+following progressively more complicated examples.
+
+<table>
+ <tr><th>grammar</th><th>yacc output</th><th>yacc2dot output</th><th>notes</th></tr>
+ <tr><td>[[pr.y]]</td>
+ <td>[[pr.output]]</td>
+ <td>[[!ltio pr.png]]</td>
+ <td>Recursive addition (allow X PLUS X PLUS X PLUS ...). Already
+ a shift/reduce error!</td></tr>
+ <tr><td>[[pr_prec.y]]</td>
+ <td>[[pr_prec.output]]</td>
+ <td>[[!ltio pr_prec.png]]</td>
+ <td>Add precedence (PLUS > X) to solve in favor of reduction.</td></tr>
+ <tr><td>[[pt.y]]</td>
+ <td>[[pt.output]]</td>
+ <td>[[!ltio pt.png]]</td>
+ <td>Explicit addition and multiplication with precedence.</td></tr>
+ <tr><td>[[t_implicit.y]]</td>
+ <td>[[t_implicit.output]]</td>
+ <td>[[!ltio t_implicit.png]]</td>
+ <td>Implicit multiplication. Shift/reduce error, but no operator handle
+ for assigning higher precedence to reduction...</td></tr>
+</table>
+
+<a name="implicit_multiplication" />
+Implicit multiplication
+-----------------------
+
+The lack of an operator handle to assign precedence to an
+operator-less rule had me stumped for a few days, until I found the
+[OCaml site][XL] with a nice, explicit listing of the shift/reduce
+decision process.
+
+The shift/reduce decision is not a problem of which operator has a
+higher precedence, but about whether a given *token* has a higher
+precedence than a given *rule*. All we have to do is assign
+precedence to *both* the tokens *and* the rules.
+
+[[!inline pagenames="t_implicit_prec.y" template="raw" feeds="no"]]
+
+which generates the following `yacc` output and `dot` image:
+
+[[!inline pagenames="t_implicit_prec.output" template="raw" feeds="no"]]
+
+[[!img t_implicit_prec.png
+ alt="Visualization of implicit_multiplication grammar"
+ title="Visualization of implicit multiplication grammar" ]]
+
+<a name="yacc2dot" />
+yacc2dot
+--------
+
+I'll post my `yacc2dot` code once I polish it up a little bit.
+Take a look at [yaccviso][] if you're impatient.
+
+
+[DB]: http://www.dabeaz.com/ply/ply.html#ply_nn23
+[SJ]: http://dinosaur.compilertools.net/yacc/index.html
+[LK]: http://www.lo-res.org/~aaron/yaccviso/docu.pdf
+[XL]: http://caml.inria.fr/pub/docs/manual-ocaml/manual026.html#htoc140
+[bison]: http://www.gnu.org/software/bison/manual/
+[graphviz]: http://www.graphviz.org/
+[yaccviso]: http://directory.fsf.org/project/yaccviso/
+
+[[!tag tags/programming]]
+[[!tag tags/python]]
+[[!tag tags/tools]]
--- /dev/null
+#!/usr/bin/python
+#
+# convert yacc's verbose output to a dot-language diagram
+
+from sys import stdin, stdout, stderr
+
+DOT_HEAD = """
+digraph yacc {
+
+ epsilon = 0.01;
+ nodesep = 0.2; /* inches */
+
+ node [shape = record];
+
+"""
+
+DOT_GOTO_EDGE = """
+ edge [color = blue];
+"""
+
+DOT_REDUCE_EDGE = """
+ edge [color = red];
+"""
+
+DOT_TAIL = "}\n"
+
+
+
+
+class action (object) :
+ """
+ Parse the text defining a given rule:
+ >>> a = action(' TEXT shift, and go to state 1')
+ >>> print a.key
+ TEXT
+ >>> print a.action
+ shift
+ >>> a = action(' $default accept')
+ >>> print a.action
+ accept
+ >>> a = action(' $default reduce using rule 1 (sum)')
+ >>> print a.action
+ reduce
+ >>> print a.reduce_rule
+ 1
+ >>> a = action(' PLUS shift, and go to state 3')
+ >>> print a.goto
+ 3
+ >>> a = action(' sum go to state 2')
+ >>> print a.action
+ go
+ >>> print a.goto
+ 2
+ """
+ def __init__(self, string) :
+ self.rule = string
+ split = string.split()
+ self.key = split[0]
+ if split[1][-1] == ',' :
+ self.action = split[1][0:-1]
+ elif split[1][0] == '[':
+ self.action = 'IGNORED'
+ else :
+ self.action = split[1]
+
+ if self.action == 'reduce' :
+ self.reduce_rule = int(split[-2])
+ elif self.action == 'shift' :
+ self.goto = int(split[-1])
+ elif self.action == 'go' :
+ self.goto = int(split[-1])
+ def __str__(self) :
+ if self.action == 'go' or self.action == 'shift':
+ details = ', goto %d' % self.goto
+ elif self.action == 'reduce' :
+ details = ', rule %d' % self.reduce_rule
+ elif self.action == 'accept' :
+ details = ''
+ elif self.action == 'IGNORED' :
+ details = ''
+ else :
+ raise Exception, 'wierd action %s' % self.action
+
+ return '<action: %s, %s%s>' % (self.key, self.action, details)
+ def __repr__(self) :
+ return self.__str__()
+
+class state (object) :
+ """
+ Parse the text defining a given state:
+ >>> state_text = '''state 0
+ ...
+ ... 0 $accept: . authors $end
+ ...
+ ... TEXT shift, and go to state 1
+ ... LBRACE shift, and go to state 2
+ ...
+ ... authors go to state 3
+ ... author go to state 4
+ ... text go to state 5
+ ...
+ ... '''
+ >>> s = state(state_text)
+ >>> print s.id
+ 0
+ >>> print s.rules
+ [(0, '$accept: . authors $end')]
+ >>> print s.tok_act
+ [<action: TEXT, shift, goto 1>, <action: LBRACE, shift, goto 2>]
+ >>> print s.rule_act
+ [<action: authors, go, goto 3>, <action: author, go, goto 4>, <action: text, go, goto 5>]
+ >>> print s.goes_to()
+ [1, 2, 3, 4, 5]
+ """
+ def __init__(self, state_text, debug=False) :
+ # because the output of `yacc -v grammar.y' is so regular,
+ # we'll sneak by with a HACK parsing job ;).
+ self.id = None
+ self.rules = [] # list of "rule string"s ###(<rule-id>, <target>, [], pos)
+ self.tok_act = []
+ self.rule_act = []
+ lookfor = 'state'
+ hadtext = False # so consecutive blank lines don't keep switching lookfor state
+ for line in state_text.splitlines() :
+ if debug : print >> stderr, '%s : %s' % (lookfor, line)
+ split = line.split()
+ if len(split) < 1 : # blank line
+ if hadtext == False :
+ continue # ignore
+ elif lookfor == 'rules' :
+ lookfor = 'tok-act'
+ continue
+ elif lookfor == 'tok-act' :
+ lookfor = 'rule-act'
+ continue
+ elif split[0] == 'state' : # 'state <state-id>'
+ assert lookfor == 'state', "Multiple states in '%s'" % state_text
+ assert len(split) == 2, "%d items in state line '%s'" % line
+ self.id = int(split[1]) # store <state-id> as self.id
+ lookfor = 'rules'
+ hadtext = False
+ continue
+ elif lookfor == 'rules' : # e.g ' 2 expr: expr . PLUS expr'
+ rule_id = int(split[0])
+ rule_text = split[1:]
+ if rule_text[0] == '|' : # e.g. second line of
+ # expr: X
+ # | expr PLUS expr
+ rule_text.pop(0)
+ prev_rule = self.rules[len(self.rules)-1]
+ p_text_split = prev_rule[1].split()
+ rule_text.insert(0, p_text_split[0])
+ self.rules.append((rule_id, " ".join(rule_text)))
+ elif lookfor == 'tok-act' :
+ self.tok_act.append(action(line))
+ elif lookfor == 'rule-act' :
+ self.rule_act.append(action(line))
+ else :
+ raise Exception, "Unrecognized lookfor '%s' for line '%s'" % (lookfor, line)
+ hadtext = True
+ def __cmp__(self, other) :
+ return cmp(self.id, other.id)
+ def __str__(self) :
+ return '<state %d>' % self.id
+ def __repr__(self) :
+ return self.__str__()
+ def goes_to(self) :
+ "Return a list of integer states that this node goes to (not reduces to)."
+ ret = []
+ for a in self.tok_act :
+ if hasattr(a, 'goto') :
+ ret.append(a.goto)
+ for a in self.rule_act :
+ if hasattr(a, 'goto') :
+ ret.append(a.goto)
+ return ret
+
+class tnode (object) :
+ """
+ Tree structure, with backlinks.
+ >>> root = tnode("root")
+ >>> root.add_child("child0")
+ >>> root.add_child("child1")
+ >>> root.child[0].add_child("child00")
+ >>> root.child[0].add_child("child01")
+ >>> print root.child[0].child
+ [<tnode child00>, <tnode child01>]
+ >>> print root.child[0].parent
+ <tnode root>
+ >>> print root.find("child01")
+ <tnode child01>
+ """
+ def __init__(self, value=None, parent=None) :
+ self.value = value
+ self.parent = parent
+ self.child = []
+ def add_child(self, child_value) :
+ self.child.append(tnode(child_value, self))
+ def build(self, fn_get_children) :
+ "use gn_get_children(value) = [values' children] to recursively build the tree"
+ children = fn_get_children(self.value)
+ for child in children :
+ self.add_child(child)
+ for cnode in self.child :
+ cnode.build(fn_get_children)
+ def find(self, value) :
+ "depth first search"
+ if self.value == value :
+ return self
+ else :
+ for cnode in self.child :
+ match = cnode.find(value)
+ if match != None :
+ return match
+ return None
+ def __str__(self) :
+ string = '<tnode %s>' % str(self.value)
+ return string
+ def __repr__(self) :
+ return self.__str__()
+
+class yaccout (object) :
+ """
+ Parse the output of `yacc -v file.y'
+ For example, for sum.y =
+ %token X
+ %token PLUS
+ %%
+ sum : X PLUS X
+ We get:
+ >>> yacc_text = '''
+ ... Grammar
+ ...
+ ... 0 $accept: sum $end
+ ...
+ ... 1 sum: X PLUS X
+ ...
+ ...
+ ... Terminals, with rules where they appear
+ ...
+ ... $end (0) 0
+ ... error (256)
+ ... X (258) 1
+ ... PLUS (259) 1
+ ...
+ ...
+ ... Nonterminals, with rules where they appear
+ ...
+ ... $accept (5)
+ ... on left: 0
+ ... sum (6)
+ ... on left: 1, on right: 0
+ ...
+ ...
+ ... state 0
+ ...
+ ... 0 $accept: . sum $end
+ ...
+ ... X shift, and go to state 1
+ ...
+ ... sum go to state 2
+ ...
+ ...
+ ... state 1
+ ...
+ ... 1 sum: X . PLUS X
+ ...
+ ... PLUS shift, and go to state 3
+ ...
+ ...
+ ... state 2
+ ...
+ ... 0 $accept: sum . $end
+ ...
+ ... $end shift, and go to state 4
+ ...
+ ...
+ ... state 3
+ ...
+ ... 1 sum: X PLUS . X
+ ...
+ ... X shift, and go to state 5
+ ...
+ ...
+ ... state 4
+ ...
+ ... 0 $accept: sum $end .
+ ...
+ ... $default accept
+ ...
+ ...
+ ... state 5
+ ...
+ ... 1 sum: X PLUS X .
+ ...
+ ... $default reduce using rule 1 (sum)
+ ... '''
+ >>> y = yaccout(yacc_text)
+ >>> print y.states
+ [<state 0>, <state 1>, <state 2>, <state 3>, <state 4>, <state 5>]
+ >>> print y.states[0].tok_act
+ [<action: X, shift, goto 1>]
+ >>> print y.root.child[0].child
+ [<tnode <state 3>>]
+ >>> print y.root.child[0].parent
+ <tnode <state 0>>
+ """
+ def __init__(self, yacc_text, debug=False):
+ # because the output of `yacc -v grammar.y' is so regular,
+ # we'll sneak by with a HACK parsing job ;).
+ self.read_states(yacc_text, debug=debug)
+ #self.gen_tree()
+ #self.reduce_link(debug=debug)
+ def read_states(self, yacc_text, debug=False):
+ "read in all the states"
+ self.states = []
+ lookfor = 'first state'
+ for line in yacc_text.splitlines() :
+ split = line.split(' ')
+ if lookfor == 'first state' :
+ if split[0] != 'state' :
+ continue
+ else :
+ lookfor = 'states'
+ state_text = [line]
+ continue
+ else :
+ if split[0] == 'state' : # we've hit the next state
+ self.states.append(state('\n'.join(state_text), debug=debug))
+ state_text = [line]
+ continue
+ else :
+ state_text.append(line)
+ # end of file, so process the last state
+ self.states.append(state('\n'.join(state_text), debug=debug))
+ def state_goes_to(self, state) :
+ ret = []
+ for id in state.goes_to() :
+ for s in self.states :
+ if s.id == id :
+ ret.append(s)
+ return ret
+ def gen_tree(self):
+ "generate the state dependency tree"
+ self.root = tnode(self.states[0])
+ self.root.build(self.state_goes_to)
+ def reduce_action(self, state, action, debug):
+ if action.action == 'reduce' :
+ if debug : print >> stderr, 'reduce from %s with rule %d::' % (state, action.reduce_rule),
+ # find the rule text
+ rule = None
+ for r in state.rules :
+ if r[0] == action.reduce_rule :
+ rule = r[1] # e.g. 'sum: X PLUS X .'
+ if debug : print >> stderr, rule
+ # find the number of args in the rule
+ split = rule.split()
+ args = len(split) - 2 # 2 for 'sum:' and '.'
+ # find the state in the tnode tree
+ tnode = self.root.find(state)
+ for i in range(args) : # take arg steps up the tree
+ tnode = tnode.parent
+ tstate = tnode.value # reduction target state
+ action.reduce_to = tstate.id # state we reduce to
+ action.reduce_targ = split[0][0:-1] # rule we reduce to 'sum'
+ i = 0
+ for a in tstate.tok_act :
+ if a.key == action.reduce_targ :
+ action.reduce_targ_i = i
+ i += 1
+ for a in tstate.rule_act :
+ if a.key == action.reduce_targ :
+ action.reduce_targ_i = i
+ i += 1
+ if debug : print >> stderr, 'to state %d' % action.reduce_to
+ def reduce_link(self, debug=False):
+ "generate the reduce_to backlinks"
+ for state in self.states :
+ for a in state.tok_act :
+ self.reduce_action(state, a, debug)
+ for a in state.rule_act :
+ self.reduce_action(state, a, debug)
+
+
+def dot_state (state) :
+ """
+ Print a dot node for a given state.
+ Node type must be 'record'.
+ layout:
+ state %d
+ rules
+ ...
+ a0 | a1 | ...
+ >>> state_text = '''state 0
+ ...
+ ... 0 $accept: . authors $end
+ ...
+ ... TEXT shift, and go to state 1
+ ... LBRACE shift, and go to state 2
+ ...
+ ... authors go to state 3
+ ... author go to state 4
+ ... text go to state 5
+ ...
+ ... '''
+ >>> s = state(state_text)
+ >>> print dot_state(s),
+ state0 [label = "{ <s> state 0 | { <a0> TEXT | <a1> LBRACE | <a2> authors | <a3> author | <a4> text } }"];
+ """
+ label = '{ <s> state %d | ' % state.id
+ for rule in state.rules :
+ label += '(%d) %s\\n' % (rule[0], rule[1])
+ label += ' | {'
+ a = 0
+ for action in state.tok_act[:-1] :
+ label += ' <a%d> %s |' % (a, action.key,)
+ a += 1
+ if len(state.tok_act) > 0 :
+ label += ' <a%d> %s' % (a, state.tok_act[-1].key)
+ a += 1
+ for action in state.rule_act :
+ label += ' | <a%d> %s' % (a, action.key)
+ a += 1
+ label += ' } }'
+
+ string = ' state%d [label = "%s"];\n' % (state.id, label)
+ return string
+
+def dot_goto(id, a, action) :
+ """
+ Print dot links for a given action (action a for state id).
+ >>> a = action(' TEXT shift, and go to state 1')
+ >>> print dot_goto(5,8,a),
+ state5:a8 -> state1:s;
+ >>> a = action(' $default reduce using rule 1 (sum)')
+ >>> print dot_goto(0,1,a),
+
+ >>> a = action(' sum go to state 2')
+ >>> print dot_goto(0,1,a),
+ state0:a1 -> state2:s;
+ """
+ if hasattr(action, 'goto') :
+ string = ' state%d:a%d -> state%d:s;\n' % (id, a, action.goto)
+ else :
+ string = ''
+ return string
+
+
+def dot_gotos(state) :
+ """
+ Print dot links for a given state.
+ >>> state_text = '''state 0
+ ...
+ ... 0 $accept: . authors $end
+ ...
+ ... TEXT shift, and go to state 1
+ ... LBRACE reduce using rule 1 (braces)
+ ...
+ ... authors go to state 3
+ ... author go to state 4
+ ... text go to state 5
+ ...
+ ... '''
+ >>> s = state(state_text)
+ >>> print dot_gotos(s),
+ state0:a0 -> state1:s;
+ state0:a2 -> state3:s;
+ state0:a3 -> state4:s;
+ state0:a4 -> state5:s;
+ """
+ string = ""
+ a = 0
+ for action in state.tok_act :
+ string += dot_goto(state.id, a, action)
+ a += 1
+ for action in state.rule_act :
+ string += dot_goto(state.id, a, action)
+ a += 1
+ return string
+
+def dot_reduce(id, a, action) :
+ """
+ Print dot reduce links for a reduction action (action a for state id).
+ """
+ if action.action == 'reduce' :
+ string = ' state%d:a%d -> state%d:a%d;\n' % (id, a, action.reduce_to, action.reduce_targ_i)
+ else :
+ string = ''
+ return string
+
+
+def dot_reduces(state) :
+ """
+ Print dot reduce links for a given state.
+ """
+ string = ""
+ a = 0
+ for action in state.tok_act :
+ string += dot_reduce(state.id, a, action)
+ a += 1
+ for action in state.rule_act :
+ string += dot_reduce(state.id, a, action)
+ a += 1
+ return string
+
+def yacc2dot(yaccout) :
+ string = DOT_HEAD
+ string += "\n"
+ for state in yaccout.states :
+ string += dot_state(state)
+ string += "\n"
+ string += DOT_GOTO_EDGE
+ for state in yaccout.states :
+ string += dot_gotos(state)
+ string += "\n"
+ string += DOT_REDUCE_EDGE
+ #for state in yaccout.states :
+ # string += dot_reduces(state)
+ string += "\n"
+ string += DOT_TAIL
+ return string
+
+def open_IOfiles(ifilename=None, ofilename=None, debug=False):
+ if ifilename :
+ if debug : print >> stderr, "open input file '%s'" % ifilename
+ ifile = file(ifilename, 'r')
+ else :
+ ifile = stdin
+ if ofilename :
+ if debug : print >> stderr, "open output file '%s'" % ofilename
+ ofile = file(ofilename, 'w')
+ else :
+ ofile = stdout
+ return (ifile, ofile)
+
+def close_IOfiles(ifilename=None, ifile=stdin,
+ ofilename=None, ofile=stdout,
+ debug=False):
+ if ifilename :
+ if debug : print >> stderr, "close input file '%s'" % ifilename
+ ifile.close()
+ if ofilename :
+ if debug : print >> stderr, "close output file '%s'" % ofilename
+ ofile.close()
+
+def _test():
+ import doctest
+ doctest.testmod()
+
+if __name__ == "__main__":
+ from optparse import OptionParser
+
+ parser = OptionParser(usage="usage: %prog [options]", version="%prog 0.1")
+
+ parser.add_option('-f', '--input-file', dest="ifilename",
+ help="Read input from FILE (default stdin)",
+ type='string', metavar="FILE")
+ parser.add_option('-o', '--output-file', dest="ofilename",
+ help="Write output to FILE (default stdout)",
+ type='string', metavar="FILE")
+ parser.add_option('-t', '--test', dest="test",
+ help="Run the yacc2dot test suite",
+ action="store_true", default=False)
+ parser.add_option('-v', '--verbose', dest="verbose",
+ help="Print lots of debugging information",
+ action="store_true", default=False)
+
+ (options, args) = parser.parse_args()
+ parser.destroy()
+
+ ifile,ofile = open_IOfiles(options.ifilename, options.ofilename,
+ options.verbose)
+
+ if options.test :
+ _test()
+ else :
+ text = ifile.read()
+ y = yaccout(text, options.verbose)
+ dot = yacc2dot(y)
+ print >> ofile, dot
+
+ close_IOfiles(options.ifilename, ifile,
+ options.ofilename, ofile, options.verbose)
+