Written on: Sunday, May 5, 2024
This is an announcement for Weaver, my arg parsing library for the Roc language. I did a talk on Weaver that you can watch here, the transcript for which is available as a gist here.
Programming languages, much like life, are mired in decisions concerning scarcity. “There is no silver bullet.” If you want the convenience of using Python for prototyping your project, you’ll either need to rewrite it into a compiled language at some point or face scaling issues relating to both performance and complexity management. That’s what I love so much about the budding Roc language: it’s very ergonomic to write, but it’s designed from the ground up to be well-typed, fast to compile, and fast to run. Roc is all about letting me not feel like I’m making a big sacrifice in using it.
There are many efforts by the Roc language to achieve this uncompromising simplicity;
most recently, the await bang syntax (e.g. data = Http.get! url
) has made writing effectful code much more ergonomic
while still being safe. Though that’s a great feature, I personally was most enamored
with the record builder syntax that was added last year by @agu-z.
The builder pattern : <-
is syntax sugar that turns this:
init {
a: <- foo,
b: <- bar,
}
|> finish
into this:
init (a1 -> b1 -> { a: a1, b: b1 })
|> foo
|> bar
|> finish
It’s a syntax in Roc that lets you call multiple functions on an applicative functor and store the results in a record. If you don’t know what an applicative functor is, you can read about it here, but it’s basically a convenient type that allows for effectful-like programming. There are lots of tasks that are made much simpler, or even possible at all, with this syntax:
That last one is the first thing I thought of when I saw this syntax, a way to parse, not validate my command line arguments without metaprogramming/macros! Normally, when you get a list of command line arguments and want to parse them into specific values, you’ll define your flags and parameters, and whatever library will give you back a dictionary with string values for everything you asked for. You can see what I mean with this simplified Python example:
config = Cli(
name="my-app",
flags=["-f/--force"],
params={
"file": "string",
"amount": "number"
}
)
args = ["my-app", "--force", "abc.txt", "5"]
data = config.parse(args)
data == { "force": "true", "file": "abc.txt": "amount": "5" }
You then you have to go in and validate that data after it’s already parsed:
force = data["force"] == "true"
file = data["file"] or ""
amount = int(data["amount"]) # might throw if invalid
Now, if the above Python was written using its standard library, then it
would return an object with the actual data types in it (e.g. data.amount = 5
),
but that is only possible because Python is dynamically typed, and I don’t
want to live in that kind of jungle. So as long as we live in Type Land,
then we need to do the above.
However, if you have macros like in Rust, then you can get the above for free** with clap, a magical and type-safe library for argument parsing! All you do is define a struct with your options and then put some annotations:
/// A tool to color some data.
#[derive(Debug, Parser)]
pub struct ArgConfig {
/// The color of your output.
#[clap(short, long)]
pub color: String,
/// How much logging to show.
#(clap(count))
pub verbosity: usize,
}
And clap will generate code during compilation to do what we had to manually do in Python, and we can just call a single function to handle everything:
let config = ArgConfig::parse();
config == ArgConfig { color: "abc".to_owned(), verbosity: 3 }
You may have noticed a conspicuous asterisk or two on the word “free” when I said clap generates this for free. The hidden cost here is a compile time cost: macros are a box of legos bundled on top of any language they’re in, and make compilation a good deal slower. Any Rust engineer (myself included) will tell you how much serde or sqlx contributes to their development slowdown. Not to mention that Rust’s procedural macros, like the ones that clap uses here, are extremely powerful, which lets you do write very convenient-to-use code that unfortunately makes it much harder for readers to know what’s actually happening in the code they’re writing.
Knowing about these difficulties in conveniently parsing data is what makes
me so excited about using Roc’s record builder syntax to solve this problem!
One Hackers-inspired montage of
cracking out code later, I’ve now released version 0.2.0 of Weaver,
my library for parsing CLI args using this great : <- ...
doohickey.
Weaver CLI’s look like this:
cli =
Cli.weave {
alpha: <- Opt.maybeU64 { short: "a", long: "alpha" },
verbosity: <- Opt.count { short: "v", long: "verbose" }
file: <- Param.str { name: "file" },
files: <- Param.strList { name: "files" },
}
|> Cli.finish {
name: "transmogrify",
version: "1.0.0",
description: "Transform some files into a new state."
}
|> Cli.assertValid
main =
when Cli.parseOrDisplayMessage cli (Arg.list!) is
Ok data ->
data == {
alpha: 5,
verbosity: 1,
file: "1.txt",
files: ["2.txt", "3.txt"]
}
Err message ->
Stderr.line message
There are a few cool things I want to point out:
cli
we defined didn’t require us to pass the arguments ahead
of time, because Weaver is able to be very modular thanks to how the
builder syntax works.The main takeaway is that we get a simple API to parse CLI args in a type-safe way without needing to do metaprogramming at compile time. To be fair, there was a lot more work necessary to develop this library than it is to use (keep reading), but that parallels how the Roc language works: the language team has done a lot of great design work and optimization to make it so that you don’t have to worry about writing performant or safe code, it happens by virtue of writing anything that compiles in Roc.
Weaver has everything you need already to write a CLI parser to get your app rolling, but there are a few other features I’m planning on adding when I get to it:
If there’s a feature that’s missing that you would find crucial to writing a CLI using Weaver, please make an issue and I can jump on doing what the community finds important. I want to make Weaver a one-stop-shop for CLI parsing in Roc!
I love writing code in Roc. It’s awesome. There are tons of great features that I take for granted like the type system, the super helpful compiler error messages, and the clean syntax, but the biggest two are definitely the expressiveness and the good dev experience.
Roc has something I’ve not seen in any other programming language: anonymous tag unions. You can create sum types like in Rust, but you don’t need to write a type that captures all variants of your sum type, you just write code and Roc will figure out which variants you used, and make sure you handle all variants where necessary.
So not only does Roc know that color
’s type in
score = when color is
Red -> 5
Green -> 10
Blue -> 15
is [Red, Green, Blue]
, but you can use the same tag in different
contexts and Roc will group the unions differently:
myScore = when myColor is
Red -> 5
Green -> 10
yourScore = when yourColor is
Green -> 10
Custom score -> score
allColors = [myColor, yourColor]
myColor
is typed [Red, Green]
and yourScore
is typed [Green, Custom U64]
, but allColors
is typed List [Red, Green, Custom U64]
.
Just one of many magic features in Roc.
Roc only accepts features that can be parsed and type-checked very quickly, meaning that you can expect your code to always type-check in milliseconds. That paired with a language server and a tree sitter implementation makes for a great experience writing code. It will take some time for Roc to mature enough to test this, but I’m pretty confident that Roc will be a great language for monorepos, meaning I won’t have to eat my golang brussel sprouts when working on large, collaborative codebases.
Since the compiler errors are so helpful, you can just start writing code and the type-checker will tell you what part of your code is missing. That was a godsend during my development of Weaver with how complicated some of the types that I was dealing with were.
When I had a working prototype and needed to get the library ready for release, refactoring was a breeze, since everything is just functions, as Roc is a pure functional language, and also doesn’t support name shadowing. I just copied around text and renamed functions, and then added doc comments and made a GitHub release. Nothing to it!
If you’re wondering whether or not you should write code in Roc, the tl;dr is go for it now if you don’t mind using an alpha-language, but probably you should wait a couple years until they do versioned releases so you can get stability guarantees. There are definitely a lot of compiler bugs at the moment, but they are getting fixed as the months go by.
The thing that gives me the most confidence that I’ll be using Roc at my job someday is the team of developers currently working on Roc. The GitHub repo is always active with new issues and pull requests, and I don’t have to worry about a low bus factor. Above all, the BDFN (Benevolent Dictator For Now) is Richard Feldman, who is a big player from the Elm ecosystem and the reason Roc exists at all. He is so knowledgeable about how to make the right programming language that I have no doubt that Roc will be a great language in the next few years!
If you’re curious about how Weaver works, keep reading, but if not,
you can use it right now by copying the example in the README into a file and running it with roc example.roc -- ARGS
.
…okay, you didn’t run away. Good! I learned a good bit while writing Weaver, and anyone writing Roc code with or without the record builder syntax could probably glean something from my experience. I’ll start by explaining the gist of how Weaver works, and then we’ll get into some nitty gritty bits. Fair warning, I’m assuming your familiar with how Roc works, which you can learn from the tutorial if you haven’t already.
Cli.weave
is the entry point for any Weaver CLI, and it creates a
CLI builder that receives the curried builder and initializes an empty metadata config.
Each new field (e.g. Opt.maybeStr
or Param.u64
) does the following:
What we want at the end is a parser of type List Str -> Result data CliErr
,
and all metadata defined. To get that parser, we need some way of getting from
the curried builder:
A -> B -> C -> { a: A, b: B, c: C }
to an uncurried argument parser:
List Str -> Result { a: A, b: B, c: C } CliErr
We do this by having each field parser (e.g. Opt.i16List)
take an applicative functor that currently has a data parser of type A -> rest
and returns an applicative functor that has a data parser of type rest
, and we end
up working through the curried builder until we have a simple function at the end.
For example, the type of Opt.i16List
is:
OptionConfigBaseParams -> (CliBuilder (List I16 -> state) GetOptionsAction -> CliBuilder state GetOptionsAction)
where CliBuilder state action
is the builder we pass around. The state
represents
what remains of the original A -> B -> C -> { ... }
we have remaining in our builder.
You’ll notice that the second parameter has a state
of List I16 -> state
, which
sets the corresponding field to List I16
and returns the remaining state for
deconstruction by the rest of the field extractor functions.
You should look at the source of Weaver to properly understand how this works if you want to build your own builder pattern library for Roc.
Based on looking at how other CLIs work, including the official roc
CLI,
there’s a precedence on the order in which arguments should be parsed:
-a
or --alpha
),file.txt
)This lets us handle these cases correctly:
roc example.roc
and roc test example.roc
-
(e.g. a file named “-file.txt”)We want to allow starting the builder with Opt
s, Subcommand
s, or Param
s.
We should be able to move to any following field from a prior type, but not
move back. e.g. Opt
to Subcommand
or Opt
to Param
, but not Param
to Opt
.
To achieve the above, the CliBuilder
has a type variable called action
that
can either be:
{ getOptions: {} }
(alias GetOptionsAction
){ getParams: {} }
(alias GetParamsAction
)[]
(alias StopCollectingAction
)This is an approach called typestate that I learned from the Rust ecosystem. Typestate encodes the state of the program into its types, like a finite state machine. We follow this recipe when traversing the “finite state machine”:
On builder creation, the action
is { getOptions: {} }
.
We can start taking Opt
, Subcommand
, or Param
.
Opt.*
takes { getOptions: {} }
and returns { getOptions: {} }
.
We can only take Opt
s at the beginning of building as any other
action will not type-check.
Subcommand.*
takes { getOptions: {} }
and returns { getParams: {} }
.
Subcommands must come after Opt
s (though none are required) and
before Param
s.
Param.*
takes {}action
and returns { getParams: {} }
.
Once we start taking Param
s, we can’t take anything else but Param
s.
All other actions need { getOptions: {} }
as the input action. The {}
at the beginning of the {}action
parameter says that whatever
action parameter is passed must be a record of some type, disallowing
the []
empty union type that is returned after a Param.*List
call.
Param.*List
takes action
and returns []
.
Once we take a list of params, we can’t take anything else as it will consume all remaining parameters. This type-checks because of the other actions need their action to be some kind of struct, not a tag union.
This specific recipe for transitions between field action
s means that
the compiler keeps the user from compiling code that doesn’t have their
fields (and therefore the underlying parser) in the right order! It’s tricky
to wrap your head around, but it’s definitely the coolest thing I applied in
building Weaver!
P.S please check out Roc if you haven’t already.
Bye!