March 30, 2020 Regular Expressions Sources
Why regular expressions with dot (".") work differently in Go compared to PHP and JavaScript.
To enable code syntax highlighting on this website, I use regular expressions. The logic is simple — I put source code into special HTML tags. When a post loads, I process these tags — search for them using regular expressions, and replace the source code with highlighted versions.
I spent a lot of time trying to understand why some code examples were not matched by regular expressions. I used the dot “.” special character to match any symbol inside my tag. Look at the following regexp and text example and guess if it matches or not:
Regexp:
<tag>(.*?)</tag>
Text:
<tag>1
2
3</tag>
If you have PHP experience, your answer will probably be “yes”.
However, a simple example from the Go Playground makes it clear that the answer is actually “no”:
match, _ := regexp.MatchString("<tag>(.*)</tag>", "<tag>1\n2\n3</tag>")
fmt.Println(match)
// false
Then I searched Go sources trying to understand how Go deals with character classes, and I found the following list of flags:
const (
FoldCase Flags = 1 << iota // case-insensitive match
Literal // treat pattern as literal string
ClassNL // allow character classes like [^a-z] and [[:space:]] to match newline
DotNL // allow . to match newline
OneLine // treat ^ and $ as only matching at beginning and end of text
NonGreedy // make repetition operators default to non-greedy
PerlX // allow Perl extensions
UnicodeGroups // allow \p{Han}, \P{Han} for Unicode group and negation
WasDollar // regexp OpEndText was $, not \z
Simple // regexp contains no counted repetition
MatchNL = ClassNL | DotNL
Perl = ClassNL | OneLine | PerlX | UnicodeGroups // as close to Perl as possible
POSIX Flags = 0 // POSIX syntax
)
According to the sources, Go works in the following way:
syntax.Parsesyntax.Parse uses Flags to “plan” regular expression execution (to match regexp symbols to operations)regexp.Regexp (a public struct) is created using the results of syntax.ParseSo we need to compile the regexp with the DotNL flag.
When I searched all regexp.Compile function use cases, I found that there were only two regexp flag options available — POSIX and Perl. That means there is no option in Go to match newlines with dot.
So, the regexp that actually works is below:
<tag>([[:graph:]\\s]*?)</tag>
There are also a lot of predefined character classes, documented here. I used two of them to cover all characters in the [] brackets.
Packages text/template and html/template are part of the Go standard library. Go templates are used in many Go-programmed software — Docker, Kubernetes, Helm. Many third-party libraries are integrated with Go templates, for example Echo. Knowing Go template syntax is very useful.
This article consists of text/template package documentation and a couple of author’s solutions. After describing Go template syntax, we’ll dive into text/template and html/template sources.
The Go blog describes how to use slices. Let’s take a look at slice internals.
Read More → Slice Allocation SourcesIn Go, we have goroutines functionality out of the box. We can run code in parallel. However, in our parallel running code we can work with shared variables, and it is not clear how exactly Go handles such situations.
Read More → Map SourcesThe map programming interface in Go is described in the Go blog. We just need to recall that a map is a key-value storage and it should retrieve values by key as fast as possible.
Read More → Map Sources