Gavin Grover's GROOVY Wikiblog

Page 10 Contents
17 February 2017

Go Symbology

I listed all the symbols used by Java and Apache Groovy 9 years ago when I was using them for programming. Now that I'm using Go, I'll do the same for Go here.

Go, like many programming language and toolchain systems, is made up of many smaller languages hashed together. The Go language spec describes the containments that can occur between the toplevel structure (i.e. the sequence of package, import, type, var, const, and func keywords) and the lexical syntax (i.e. strings, numerics, identifiers, symbols, and comments). It's almost cleanly divided into the 3 distinct syntactic sub-languages of statements, expressions, and types, with clearly defined boundaries between them. Statements can contain both expressions and types, as well as recursively contain other statements. Expressions, too, can contain statements, other expressions, and types. And types can contain expressions and other types. The doc syntax within comments has it's own syntax, another little sub-language. Commonly used packages in the standard library also have their own distinct syntaxes, such as regexps, f-formats, and templates. And the commands in the Go toolchain have yet another.

Let's look at all the symbols in each sub-language to see the real complexity of the Go language. Like last time, I've ignored alphanumerics used as symbols, such as in \t, 0xFF or \P{Greek} , or there'd be too many to list.

Lexical items

/* */ comment
// comment until end of line
` ` quoted string
" " quoted string with escapes
' ' character
\ escape in string and character
_ in identifier names
. in floats and complex numbers
+ in float exponents
- in float exponents
; optional statement separator
spaces, tabs, and newlines for whitespace

Perhaps ``` ``` will be added in a future release of Go.

Top-levels

( ) in import, type, const, and var specs
. in imports
_ in imports
= in const and var declarations
, in const, var, and type declarations

Expressions

( ) in parenthesized expressions, type assertions, and calls
. to qualify identifiers
[ ] for indexing
... in calls and array indexes
{ } in literals
, in literals
: in struct and map literals, and array indexes
|| for short-circuit or
&& for short-circuit and
== for equals
!= for not equals
< for less than
<= for less or equal
> for greater than
>= for greater or equal
+ for add or unary positive
- for minus or unary negative
& for addressing and bitwise and
* for pointer contents and multiplication
| for bitwise or
^ for bitwise xor and bitwise complement
/ for division
% for mod
<< for left shift
>> for right shift
&^ for bitwise clear
! for boolean not
<- for channel reads

Statements

{ } in blocks
:= in short variable declarations
, in short var decls and case lists
: in labels and switch/select statement clauses
<- for channel sends
++ and -- for increments
= for assignments
+= for add assignments
-= for minus assignments
&= for bitwise and assignments
*= for multiplication assignments
|= for bitwise or assignments
^= for bitwise xor assignments
/= for division assignments
%= for mod assignments
<<= for left shift assignments
>>= for right shift assignments
&^= for bitwise clear assignments
; in for and if statement headers

Types

[ ] in array, slice, and map types
* in pointers
{ } in structs
( ) in interfaces and functions
. for referencing other types
, in structs and function parameters
... in function parameters
<- in channels

Note * has the opposite meaning when used in a type than when used in an expression.

regexp package

. match any character
[ ] character class
^ character class not; match beginning of line/input
- character class range; unset flags
\ various escapes; quote following character
[: :] ASCII character class
{ } Unicode character class, match exact number of times
| alternation
* greedy repetition (zero or more)
+ greedy repetition (one or more)
? greedy option, also reluctant adverb
{ , } match in range of times
( ) capturing group
(?: ) non-capturing group, and setting flags
(?:P<name> ) named non-capturing group
(? ) setting flags
$ match end of line/input
\x{ } hex character code

fmt package

% to indicate insertion points
# alternative format
. for formatting floats
+ sign for numerics
- pad with spaces
[ ] for parameter repetitions
* for width or precision insertion
space for elided sign in numeric

template package

{{ }} code escape
. field reference
{{- trim preceding whitespace
- }} trim trailing whitespace
{{/* */ }} comment
" " named template
$ variable name
( ) parenthesized actions
:= variable capture
| piped actions

Doc comments

+ in build directives
- in build directives
: in go generate
= in go generate
indentation and blank lines in doc comments

commands

- in flags
-- in various flags
= in various flags
/ in directory paths
, to separate args
. in directory paths and go doc arguments
.. in directory paths
... in directory paths

artifact names

_ in filenames


The quantity of these punctuation and other symbols give an idea of how complicated the grammar of Go really is, before we even consider the role of alphanumerics. But even with all this syntactic complexity, it's still only about half of Java or Groovy's syntactic complexity.

See earlier entries

Last edited Feb 16 at 11:54 PM by gavingrover, version 2

Comments

No comments yet.