How It Works
The formatter takes a .jl
file as input and produce a idealized, formatted .jl
as output. Some formatters mutate the state of the current file, JuliaFormatter
takes a different approach - first generating a canonical output, and then mutating that canonical output; adhering to the indent and margin constraints.
Generating an FST
The source code is parsed with CSTParser.jl
which returns a CST (Concrete Syntax Tree). A CST is a one-to-one mapping of the language to a tree form. In most cases a more compact AST (Abstract Syntax Tree) representation is desired. However, since formatting manipulate the source text itself, the richer representation of a CST is incredibly useful.
Once the CST is created it's then used to generate a FST
(Formatted Syntax Tree).
Note: this is not an actual term, just something I made up. Essentially it's a CST with additional formatting specific metadata.
The important part of an FST is any .jl
file that is syntactically the same (whitespace is irrelevant) produce an identical FST
.
For example:
# p1.jl
a =
foo(a, b,
c,d)
and
# p2.jl
a = foo(a,
b,
c,d)
will produce the same FST, which printed would look like:
# fst output
a = foo(a, b, c, d)
So what does a typical FST
look like?
Code and comments are indented to match surrounding code blocks. Unnecessary whitespace is removed. Newlines in between code blocks are untouched.
If the expression can be put on a single line it will be. It doesn't matter it's a function call which 120 arguments, making it 1000 characters long. During this initial stage it will be put on a single line.
If the expression has a structure to it, such as a try
, if
, or 'struct' definition. It will be spread across multiple lines appropriately:
# original source
try a1;a2 catch e b1;b2 finally c1;c2 end
->
# printed FST
try
a1
a2
catch e
b1
b2
finally
c1
c2
end
With this FST
representation it's much easier to determine when and how lines should be broken.
Nesting - breaking lines
During the nesting stage and original FST
is mutated to adhere to the margin specification.
Throughout the previous stage, while the FST
was being generated, PLACEHOLDER
nodes were being inserted at various points. These can be converted to NEWLINE
nodes during nesting, which is how lines are broken.
Assume we had a function call which went over the margin.
begin
foo = funccall(argument1, argument2, ..., argument120) # way over margin limit !!!
end
It would be nested to
begin
foo = funccall(
argument1,
argument2,
...,
argument120
) # way over margin limit !!!
end
You can read how code is nested in the style section.
Once the FST
has been nested it's then printed out to a file and voila! You have a formatted version of your code!