Why Native Grammars

Integrated tools. Zero dependencies.

See a full worked example on the Comparison page — we juxtapose Python since it is commonly used for parsing tasks, but similar limitations apply to any language without native grammars: Rust, Go, TypeScript, and beyond.

Built In — Not Bolted On

Python needs an external library and a grammar string stored separately from the code. Raku grammars are a first-class language feature — the same syntax you use everywhere.


# Python: external library + grammar-as-string
from lark import Lark

GRAMMAR = r"""
    start: word+
    word:  LETTER+
    LETTER: /[a-z]/i
"""

parser = Lark(GRAMMAR)
tree = parser.parse("hello world")
# Raku: grammar is part of the language
grammar WordParser {
    token TOP    { <word>+ % \s+ }
    token word   { <letter>+ }
    token letter { <[a..zA..Z]> }
}

say WordParser.parse("hello world");

Named Captures — An Instant Parse Tree

Lark builds a tree, but you still navigate it by position — swap two rules and your indices silently break. Raku grammar tokens give every matched part a name, so the parse tree is self-documenting.


from lark import Lark

GRAMMAR = r"""
    start: year "-" month "-" day
    year:  /\d{4}/
    month: /\d{2}/
    day:   /\d{2}/
"""

parser = Lark(GRAMMAR)
tree = parser.parse("2026-05-12")

# navigate the tree by child position
year  = tree.children[0].children[0]
month = tree.children[1].children[0]
day   = tree.children[2].children[0]
grammar DateParser {
    token TOP   { <year> '-' <month> '-' <day> }
    token year  { \d ** 4 }
    token month { \d ** 2 }
    token day   { \d ** 2 }
}

my $m = DateParser.parse("2026-05-12");
say $m<year>;   # 「2026」 named, not positional
say $m<month>;  # 「05」
say $m<day>;    # 「12」

Actions Classes — Parsing Separate from Semantics

In Python you mix tree-walking into the transformer class. Raku keeps the grammar (structure) and actions class (meaning) cleanly apart, so each can evolve independently.


from lark import Lark, Transformer

GRAMMAR = r"""
    start: left "+" right
    left:  /\d+/
    right: /\d+/
"""

class CalcActions(Transformer):
    def left(self, t):  return int(t[0])
    def right(self, t): return int(t[0])
    def start(self, t): return t[0] + t[1]

parser = Lark(GRAMMAR)
print(CalcActions().transform(parser.parse("3+4")))
# 7
grammar Calc {                        # structure only
    token TOP    { <left> '+' <right> }
    token left   { \d+ }
    token right  { \d+ }
}

class CalcActions {                   # meaning only
    method TOP($/)   { make +$<left> + +$<right> }
}

say Calc.parse("3+4", actions => CalcActions.new).made;
# OUTPUT: 7

Grammar Inheritance — Composable & Extensible

Raku grammars are classes. You can inherit from them and override individual tokens or rules — extend a grammar without touching the original.


from lark import Lark

# no grammar inheritance — copy-paste or
# string manipulation required
BASE_GRAMMAR = r"""
    start: word+
    word:  LETTER+
    LETTER: /[a-z]/
"""

EXTENDED = BASE_GRAMMAR + r"""
    word: LETTER+ | DIGIT+
    DIGIT: /[0-9]/
"""

parser = Lark(EXTENDED)
print(parser.parse("hello 42 world"))
grammar Base {
    token TOP    { <word>+ }
    token word   { <[a..z]>+ }
}

grammar Extended is Base {
    token word   { <[a..z]>+ | <[0..9]>+ }  # override one token
}

say Extended.parse("hello 42 world");
# 「hello 42 world」

Unicode Properties — Match Any Language Natively

Python's Lark uses re terminals by default, which are ASCII-only — handling accented letters or non-Latin scripts needs an extra regex flag and a third-party install. Raku grammars understand Unicode categories natively, and all Raku strings are NFG (Normal Form Grapheme) — every Str counts user-perceived characters, so "é".chars is 1, not 2. The same grammar parses English, Arabic, Japanese, or emoji without extra dependencies or encoding surprises.


from lark import Lark

# Lark terminals use re by default — ASCII only
GRAMMAR = r"""
    start: word+
    word:  LETTER+
    LETTER: /[a-zA-Z]+/   # fails on accented chars
"""
parser = Lark(GRAMMAR)
parser.parse("café résumé")  # UnexpectedCharacters

# Unicode: extra flag + pip install regex
GRAMMAR2 = r"""
    start: word+
    word:  LETTER+
    LETTER: /\p{L}+/
"""
parser2 = Lark(GRAMMAR2, regex=True)
print(parser2.parse("café résumé"))
grammar NaturalText {
    token TOP  { <word>+ % \s+ }
    token word { <:Letter>+ }  # any Unicode letter, NFG-aware
}

# all Raku Str are NFG — "é".chars == 1, not 2
say NaturalText.parse("café résumé");
# 「café résumé」

say NaturalText.parse("日本語 한국어");
# 「日本語 한국어」