How It Works

The formatter takes a .jl file as input and produce a idealized, formatted .jl as output. Some formatters mutate the state of the current file, JuliaFormatter takes a different approach - first generating a canonical output, and then mutating that canonical output; adhering to the indent and margin constraints.

Generating an FST

The source code is parsed with CSTParser.jl which returns a CST (Concrete Syntax Tree). A CST is a one-to-one mapping of the language to a tree form. In most cases a more compact AST (Abstract Syntax Tree) representation is desired. However, since formatting manipulate the source text itself, the richer representation of a CST is incredibly useful.

Once the CST is created it's then used to generate a FST (Formatted Syntax Tree).

Note: this is not an actual term, just something I made up. Essentially it's a CST with additional formatting specific metadata.

The important part of an FST is any .jl file that is syntactically the same (whitespace is irrelevant) produce an identical FST.

For example:

# p1.jl
a = 
       foo(a,                     b,           
       c,d)

and

# p2.jl
a =                      foo(a,
b,
c,d)

will produce the same FST, which printed would look like:

# fst output
a = foo(a, b, c, d)

So what does a typical FST look like?

Code and comments are indented to match surrounding code blocks. Unnecessary whitespace is removed. Newlines in between code blocks are untouched.

If the expression can be put on a single line it will be. It doesn't matter it's a function call which 120 arguments, making it 1000 characters long. During this initial stage it will be put on a single line.

If the expression has a structure to it, such as a try, if, or 'struct' definition. It will be spread across multiple lines appropriately:


# original source
try a1;a2 catch e b1;b2 finally c1;c2 end

-> 

# printed FST
try
   a1
   a2
catch e
   b1
   b2
finally
   c1
   c2
end

With this FST representation it's much easier to determine when and how lines should be broken.

Nesting - breaking lines

During the nesting stage and original FST is mutated to adhere to the margin specification.

Throughout the previous stage, while the FST was being generated, PLACEHOLDER nodes were being inserted at various points. These can be converted to NEWLINE nodes during nesting, which is how lines are broken.

Assume we had a function call which went over the margin.

begin
    foo = funccall(argument1, argument2, ..., argument120) # way over margin limit !!!
end

It would be nested to

begin
    foo = funccall(
        argument1,
        argument2,
        ...,
        argument120
    ) # way over margin limit !!!
end

You can read how code is nested in the style section.

Once the FST has been nested it's then printed out to a file and voila! You have a formatted version of your code!