programming language; part 1
— in which i propose a language conceptover the years i have attempted to make many programming languages, often it’s something very silly and esoteric like “lazers” (weird 2d language), that one regex replacement language (bad /// clone), or No Semicolon C (which isn’t really a language, but it’s language adjacent so i’ll count it here). However over the past 6 or so years i’ve wanted to make an actually useful programming language, and have thus rewritten it about 22 times (rough estimate). hopefully this time i can actually make a language!
this part 1 will pretty much just be me listing the silly ideas that lead up to this language & a status on my current progress (pretty much just lexing & parsing)
general ideas / goals
the language itself is called asyl, i might write about the origin story of the name in a later part of this series¹.
pretty much i want to make a “small” language (as many people probably do), and i really like the idea of not giving primitive types too much priority over user-defined things, that includes things like
- very few shorthands for types (currently only strings and functions)
- lots of fun syntax things to make operations doable on multiple types
- hopefully easily extending existing things / types / functions, &c
- but also very minimal type system? i don’t really wanna program a full type system but i like types
this leads to some sillier ideas such as
- embed a some binary (like an image) directly in your source code, and use a fancy enough editor to view it
- far too much metaprogramming
- also what if every op was actually just a function call, and there’s just way too much syntax for calling functions
- that’s a good reason to have it typed, so we could have compile-time function overloading
for actually writing the language, i’m using racket this time around because it seems to already implement most of the ideas of compile-time vs. run-time that i want with its idea of phase levels, as well as generally having a lot of language-building conveniences.
lexing
the file itself is parsed with bytes rather than unicode characters, and my tokens are as follows:
- whitespace:
00→32, and the utf-8 encodings³ for85,A0,1680,2000→200B,2028,2029,202F,205F,3000, andFEFF - newline characters:
0A→0D, and the utf-8 encodings for852028, and2029 - some basic op characters,
()[]{}:;.,@ - the most terrifying string syntax:
'marker'text'marker', the marker can be any number of bytes that aren’t', including no bytes for a quick''string'', and matches any number of bytes until the marker is reached, and no escape sequence parsing of any sort - line comments:
# line comment...terminated by a newline - block comments:
#''block comment'', just comments until the end of the string, the#and'need to be next to each other, so# ''something like this is still a line comment'' - the one keyword so far (this number will likely change in the future):
fn - vague reader extensions with
#@although currently that just crashes the lexer - any other text is treated as an identifier⁴, which allow escapes in the forms:
\hh,\{u…}, and\c(any literal character)
the actual lexer itself is just some handwritten input-port-reading nonsense but works rather well.
parsing
for parsing i wanted to try out brag, which lets me write a
#lang brag
block : stmt ';' block
| stmt
| ∅
stmt : "fn" [s-ident] '(' table ')' stmt
| '@' expr stmt
| expr stmt-tail
stmt-tail : expr stmt-tail
| '.' expr-dot ['(' table ')'] stmt-tail
| '.' '(' table ')' stmt-tail
| ∅
expr-dot : '@' expr expr-dot
| expr-head
expr : "fn" '(' table ')' expr
| '@' expr expr
| expr-head expr-tail*
expr-head : s-ident
| s-string
| ':' s-ident
| '{' block '}'
expr-tail : '(' table ')'
table : table-key ',' table
| table-key
| ∅
table-key : expr ':' block
| block
| '.' [ block ]
; to generate specific ast nodes
s-ident : IDENT
s-string : STRINGand it sets up all the parsing for me automatically, so i just give it a list of brag tokens and it returns some vague ast.
also yes, almost all of these combinations of .s, ()s, and @s are function calls, i’ll probably go into more detail once functions are actually being called.
unparsing?
problem is the ast it returns is way too verbose and also just a direct translation of the syntax tree, so i have an “unparsing” step that syntax-parse’s⁵ the strings, for example my current testing file contains⁶
#lang asyl
@public
fn factorial(let Number n, let Number t, ->: Number)
if {n .<= 0}
t
factorial(n .- 1, t .*n);
@public
fn factorial(let Number n, ->: Number)
factorial(n, 1);which expands to these tokens
(list
(token-struct '@ "@" 11 2 1 1 #f)
(token-struct 'IDENT #"public" 12 2 2 6 #f)
(token-struct 'fn "fn" 19 3 1 2 #f)
(token-struct 'IDENT #"factorial" 22 3 4 9 #f)
(token-struct '|(| "(" 31 3 13 1 #f)
(token-struct 'IDENT #"let" 32 3 14 3 #f)
(token-struct 'IDENT #"Number" 36 3 18 6 #f)
(token-struct 'IDENT #"n" 43 3 25 1 #f)
(token-struct '|,| "," 44 3 26 1 #f)
(token-struct 'IDENT #"let" 46 3 28 3 #f)
(token-struct 'IDENT #"Number" 50 3 32 6 #f)
(token-struct 'IDENT #"t" 57 3 39 1 #f)
(token-struct '|,| "," 58 3 40 1 #f)
(token-struct 'IDENT #"->" 60 3 42 2 #f)
(token-struct ': ":" 62 3 44 1 #f)
(token-struct 'IDENT #"Number" 64 3 46 6 #f)
(token-struct '|)| ")" 70 3 52 1 #f)
(token-struct 'IDENT #"if" 73 4 2 2 #f)
(token-struct '|{| "{" 76 4 5 1 #f)
(token-struct 'IDENT #"n" 77 4 6 1 #f)
(token-struct '|.| "." 79 4 8 1 #f)
(token-struct 'IDENT #"<=" 80 4 9 2 #f)
(token-struct 'IDENT #"0" 83 4 12 1 #f)
(token-struct '|}| "}" 84 4 13 1 #f)
(token-struct 'IDENT #"t" 88 5 3 1 #f)
(token-struct 'IDENT #"factorial" 92 6 3 9 #f)
(token-struct '|(| "(" 101 6 12 1 #f)
(token-struct 'IDENT #"n" 102 6 13 1 #f)
(token-struct '|.| "." 104 6 15 1 #f)
(token-struct 'IDENT #"-" 105 6 16 1 #f)
(token-struct 'IDENT #"1" 107 6 18 1 #f)
(token-struct '|,| "," 108 6 19 1 #f)
(token-struct 'IDENT #"t" 110 6 21 1 #f)
(token-struct '|.| "." 112 6 23 1 #f)
(token-struct 'IDENT #"*n" 113 6 24 2 #f)
(token-struct '|)| ")" 115 6 26 1 #f)
(token-struct '|;| ";" 116 6 27 1 #f)
(token-struct '@ "@" 118 7 1 1 #f)
(token-struct 'IDENT #"public" 119 7 2 6 #f)
(token-struct 'fn "fn" 126 8 1 2 #f)
(token-struct 'IDENT #"factorial" 129 8 4 9 #f)
(token-struct '|(| "(" 138 8 13 1 #f)
(token-struct 'IDENT #"let" 139 8 14 3 #f)
(token-struct 'IDENT #"Number" 143 8 18 6 #f)
(token-struct 'IDENT #"n" 150 8 25 1 #f)
(token-struct '|,| "," 151 8 26 1 #f)
(token-struct 'IDENT #"->" 153 8 28 2 #f)
(token-struct ': ":" 155 8 30 1 #f)
(token-struct 'IDENT #"Number" 157 8 32 6 #f)
(token-struct '|)| ")" 163 8 38 1 #f)
(token-struct 'IDENT #"factorial" 166 9 2 9 #f)
(token-struct '|(| "(" 175 9 11 1 #f)
(token-struct 'IDENT #"n" 176 9 12 1 #f)
(token-struct '|,| "," 177 9 13 1 #f)
(token-struct 'IDENT #"1" 179 9 15 1 #f)
(token-struct '|)| ")" 180 9 16 1 #f)
(token-struct '|;| ";" 181 9 17 1 #f))and this ast
'(block
(stmt
"@"
(expr (expr-head (s-ident #"public")))
(stmt
"fn"
(s-ident #"factorial")
"("
(table
(table-key
(block
(stmt
(expr (expr-head (s-ident #"let")))
(stmt-tail
(expr (expr-head (s-ident #"Number")))
(stmt-tail (expr (expr-head (s-ident #"n"))) (stmt-tail))))))
","
(table
(table-key
(block
(stmt
(expr (expr-head (s-ident #"let")))
(stmt-tail
(expr (expr-head (s-ident #"Number")))
(stmt-tail (expr (expr-head (s-ident #"t"))) (stmt-tail))))))
","
(table
(table-key
(expr (expr-head (s-ident #"->")))
":"
(block (stmt (expr (expr-head (s-ident #"Number"))) (stmt-tail)))))))
")"
(stmt
(expr (expr-head (s-ident #"if")))
(stmt-tail
(expr
(expr-head
"{"
(block
(stmt
(expr (expr-head (s-ident #"n")))
(stmt-tail
"."
(expr-dot (expr-head (s-ident #"<=")))
(stmt-tail (expr (expr-head (s-ident #"0"))) (stmt-tail)))))
"}"))
(stmt-tail
(expr (expr-head (s-ident #"t")))
(stmt-tail
(expr
(expr-head (s-ident #"factorial"))
(expr-tail
"("
(table
(table-key
(block
(stmt
(expr (expr-head (s-ident #"n")))
(stmt-tail
"."
(expr-dot (expr-head (s-ident #"-")))
(stmt-tail (expr (expr-head (s-ident #"1"))) (stmt-tail))))))
","
(table
(table-key
(block
(stmt
(expr (expr-head (s-ident #"t")))
(stmt-tail
"."
(expr-dot (expr-head (s-ident #"*n")))
(stmt-tail)))))))
")"))
(stmt-tail)))))))
";"
(block
(stmt
"@"
(expr (expr-head (s-ident #"public")))
(stmt
"fn"
(s-ident #"factorial")
"("
(table
(table-key
(block
(stmt
(expr (expr-head (s-ident #"let")))
(stmt-tail
(expr (expr-head (s-ident #"Number")))
(stmt-tail (expr (expr-head (s-ident #"n"))) (stmt-tail))))))
","
(table
(table-key
(expr (expr-head (s-ident #"->")))
":"
(block (stmt (expr (expr-head (s-ident #"Number"))) (stmt-tail))))))
")"
(stmt
(expr
(expr-head (s-ident #"factorial"))
(expr-tail
"("
(table
(table-key
(block (stmt (expr (expr-head (s-ident #"n"))) (stmt-tail))))
","
(table
(table-key
(block (stmt (expr (expr-head (s-ident #"1"))) (stmt-tail))))))
")"))
(stmt-tail))))
";"
(block)))and finally gets “unparsed” into
'(#%kw-block
(public (#%kw-fn
(factorial (let Number n) (let Number t) (#%kw-dot -> Number))
(if (<= n |0|) t (factorial (- n |1|) (*n t)))))
(public (#%kw-fn
(factorial (let Number n) (#%kw-dot -> Number))
(factorial n |1|)))
(#%kw-block))to be expanded later on. “unparsing” is definitely the wrong name for this, but i don’t care, it sounds funny.
for the future
the next step to implement is some macro expansion system, originally i wanted to just use racket’s expander but i think now that the ways in which it doesn’t work from how i want it to means that i need to make my own. so far there’s pretty much nothing implemented there except for a giant comment with my ideas in it and a function that crashes.
already being my 22nd attempt⁷, there’s a high chance it’s not my last, but so far it’s going along pretty well and seeming fairly doable as a programming language, although i don’t have as much time or motivation as i’d like to work on it. currently the implementation isn’t published online anywhere, but i might open-source it sometime soon once i can be more confident this language won’t explode anytime soon.
see you next month if i can get an idea for a post by then.
-michael