dish Monitoring Service

krusty + tack,

A small & fast monitoring service written in Go: introducing dish by vxn.

Introduction

dish is a tiny, one-shot executable written in Go. It is meant to help monitor remote endpoints, websites and services with ease when you do not wish to resort to heavier monitoring solutions.

Fig.: The official logo of dish

Being a one-shot executable means that in order to monitor the specified services you start dish, it performs the check, optionally submits alerts and then it stops. There is no long-running agent or server - this, combined with being a single, small binary makes it an easily maintainable and portable solution.

See below for a quick example of running dish:

Fig.: A basic dish run example

Installation

Following the brief introduction, let's fetch and install dish!

Four methods are currently available:

  • Local Go runtime
  • Docker
  • Homebrew
  • Manual download of the built binary

Local Go runtime (Linux)

The Go runtime is required to build a common Go project. You can find and install it via this link.

The installation (including the build part) itself is very easy to start, just type:

shell
# Fetch and install the specific version (after the @ sign)
go install go.vxn.dev/dish/cmd/dish@latest

export PATH=$PATH:~/go/bin

The export command ensures that the shell can find the executable from its input.

Docker

Another way of running dish is to fetch an image with the Go runtime and build it inside a container. This process utilizes the staging approach, where the build and release parts are isolated, and the final executable is then copied from the build stage into the clean release stage.

To use this approach, the Docker runtime/engine has to be installed in the system.

The build process is made into a gnumake target, so in the repository's root just type:

shell
make build

This procedure will build a light image containing just the necessary OS runtime (size varies depending on the base image of choice: e.g. alpine vs. debian), and the dish executable itself.

The image can be used in many ways. Some of the examples would be:

  • Using the provided compose stack example from the repo (hardcoded into the compose configuration file)
  • Using native Docker run

Examples are shown below.

shell
# Run using docker compose stack
make run

# Run using native docker run
docker run --rm \
         dish:1.10.4-go1.23 \
        -timeout 15 \
        ${SOURCE_URL}

Homebrew

Simply run the following:

shell
brew install dish

Manual Download

Download the built binary for your OS and architecture from our GitHub repository.

Configuration

dish provides multiple configurable parameters such as the source of the endpoints to be monitored (also referred to as sockets), which notification channels to use or whether successful checks should also be reported through these channels.

Socket List

The sockets can be one of the following:

  • A generic server exposing a TCP port
  • An HTTP/S server or proxy

A simple configuration of a socket to be checked can be seen below:

sockets.json
{
  "sockets": [
    {
      "id": "vxn_dev_https",
      "socket_name": "vxn-dev HTTPS",
      "host_name": "https://vxn.dev",
      "port_tcp": 443,
      "path_http": "/",
      "expected_http_code_array": [200]
    }
  ]
}

One way we can tell dish which sockets should be checked is by using a JSON file as the source argument (in the format shown above):

shell
dish /opt/dish/sockets.json

The other way is providing the same configuration format as a response from a JSON API:

shell
dish myremoteapi.example.xyz/dish/sockets

Flags

There is a plethora of supported flags which alter the behavior of dish. They can for example specify which channels to use for notifications or if caching of sockets from a remote API should be used.

This command will show the list of available flags:

shell
dish -h

An example of the output would look like this:

shell
dish -h
Usage of dish:
  -cache
        a bool, specifies whether to cache the socket list fetched from the remote API source
  -cacheDir string
        a string, specifies the directory used to cache the socket list fetched from the remote API source (default ".cache")
  -cacheTTL uint
        an int, time duration (in minutes) for which the cached list of sockets is valid (default 10)
  -hname string
        a string, name of a custom additional header to be used when fetching and pushing results to the remote API (used mainly for auth purposes)
  -hvalue string
        a string, value of the custom additional header to be used when fetching and pushing results to the remote API (used mainly for auth purposes)
  -machineNotifySuccess
        a bool, specifies whether successful checks with no failures should be reported to machine channels
  -name string
        a string, dish instance name (default "generic-dish")
  -target string
        a string, result update path/URL to pushgateway, plaintext/byte output
  -telegramBotToken string
        a string, Telegram bot private token
  -telegramChatID string
        a string, Telegram chat/channel ID
  -textNotifySuccess
        a bool, specifies whether successful checks with no failures should be reported to text channels
  -timeout uint
        an int, timeout in seconds for http and tcp calls (default 10)
  -updateURL string
        a string, API endpoint URL for pushing results
  -verbose
        a bool, console stdout logging toggle
  -webhookURL string
        a string, URL of webhook endpoint

Usage and Integrations

In this section, some more detailed examples of usage are presented.

Telegram

For alert notifications when one or more checks fail during a run, an integration with the Telegram IM provider is available. The integration presumes there is a Telegram group created with any registered Telegram Bot being present. To enable the integration, a set of two flags has to be appended to the CLI command:

  • telegramBotToken to specify the secret to identify the Bot (the message is sent on its behalf)
  • telegramChatID to specify the chat group where the composed report message is to be sent

An extended example is shown below:

shell
# Load sockets from a sockets.json file and use the Telegram provider for alerting
dish -telegramBotToken "123:AAAbcD_ef" \
     -telegramChatID "-123456789" \
     /opt/dish/sockets.json

The resulting Telegram notification then looks like this:

Fig.: Example Telegram alert message

Prometheus Pushgateway

The other way to ensure quick notifications is an integration with Prometheus via Pushgateway. A potential notification will be delayed a bit, because Prometheus performs the target scraping periodically, usually in tens of seconds.

shell
# Use a remote JSON API endpoint as the socket source and push the results to Pushgateway
dish -target https://pushgw.example.com/ \
     https://api.example.com/dish/sockets
Fig.: dish Pushgateway integration

Remote API

Not only can you use your own API endpoint to provide dish with sockets to be checked; you can also tell it to push the check results to your endpoint! This way, you can extend on the monitoring functionality in any way you may wish to.

Remote API integration also supports using a custom header (used mostly for authorization) via the -hname and -hvalue flags.

shell
# Use a remote JSON API endpoint as the socket source and push the results to a result endpoint
dish -updateURL https://api.example.com/dish/results \
     -hname X-Auth-Key \
     -hvalue yourkey \
     https://api.example.com/dish/sockets

dish pushes the results to the target API in the following JSON format:

JSON request body
{
  "dish_results": {
    "openttd_TCP": false,
    "text_n0p_cz_https": true,
    "vxn_dev_https": true
  }
}

Webhooks

You can also use the -webhookURL flag for a typical webhooks integration:

shell
# Use a remote JSON API endpoint as the socket source and push the results to a webhook URL
dish -webhookURL https://mywebhookurl.xyz \
     https://api.example.com/dish/sockets

dish pushes the results to the webhook in the same JSON format as when pushing to a remote API:

JSON request body
{
  "dish_results": {
    "openttd_TCP": false,
    "text_n0p_cz_https": true,
    "vxn_dev_https": true
  }
}

Custom Integrations

What you do with the results is then entirely up to you. We have built multiple internal integrations which rely on dish pushing these results to our own API endpoint. A few of these will be showcased below.

Status Page

Our very own status page loads our public services, their states and times of their last check from our internal API. dish pushes the results of the checks of these services to the same API. This way, if any check fails, it is automatically and immediately reflected on the status page.

Fig.: vxn.dev status page

dish GUI

dish GUI is our graphical interface built for managing dish sockets stored in our API. It supports the usual CRUD functionalities and helps us quickly make any socket visible or hidden from our status page, set maintenance status or mute the socket. Muted sockets are ignored by dish when loading the socket list from our API.

dish GUI also connects to our API's real-time events endpoint using SSE. This way, if any of the sockets is reported to the API by dish as being down, we can get notified in real time right in the dashboard:

Fig.: Socket down event in dish GUI triggered by dish

The same is true in case of the opposite situation, where a previously failed socket goes back up:

Fig.: Socket up event in dish GUI triggered by dish

Under the hood

In this part, some technical aspects of the project will be explained in more detail.

Socket fetch

To run its tests, dish needs a list of sockets to check. As mentioned in the introduction, both a local JSON file or a remote API endpoint can be used as a source.

If the source argument contains an HTTP/S URL to a remote source, dish will perform an API call to the destination address to fetch the socket list (see Fig. 8).

Fig.: Socket fetching decision diagram

Socket List Caching

dish also supports caching of the configuration pulled from a remote API. This is useful when using it to run frequent periodic checks (e.g. every minute) to ensure your important services and websites are up and running. In these cases, caching prevents hitting your API endpoint frequently and (most often) unnecessarily. If your remote API endpoint goes down, the cached configuration (if present) will be used to continue running until the endpoint is back up, even when the cache is considered expired (old/expired list of sockets to check is better than none!).

Concurrent run

When the socket list is loaded, dish spins off a goroutine per socket. Therefore, each socket is being checked in its own goroutine:

  • TCP sockets are checked directly by dialing the remote host + socket combination.
  • HTTP/S endpoints are checked via a GET request using dish's HTTP client.

These checks run concurrently (not exclusively in parallel due to how concurrency works in Go). This approach shrinks the execution time from several seconds to a single one (sometimes even less than that) on average.

Comparison Against a Serial Run

If the serial approach was used, where sockets are tested one by one, a problem would arise: when a socket times out, the whole queue of the rest of the socket list has to wait until the current check times out (10 seconds by default). This could cause an unnecessarily long execution time, depending on the number of sockets timing out. For example, if the default timeout was used and we were checking 6 sockets, all of which would time out, this would result in a minute-long run.

Check reports

Each goroutine performing a check is assigned a dedicated channel. These channels are then combined into one common channel using the fan-in technique right after the last goroutine is spawned:

runner.go
func fanInChannels(channels ...chan socket.Result) <-chan socket.Result {
 var wg sync.WaitGroup
 out := make(chan socket.Result)

 // Start a goroutine for each channel
 for _, channel := range channels {
  wg.Add(1)
  go func(ch <-chan socket.Result) {
   defer wg.Done()
   for result := range ch {
    // Forward the result to the output channel
    out <- result
   }
  }(channel)
 }

 // Close the output channel once all workers are done
 go func() {
  wg.Wait()
  close(out)
 }()

 return out
}

This feature enables the consumer to get a single source of the socket check reports.

After all checks are performed (either by succeeding, failing or timing out), the common channel is ready to be read from. A message reporting the results of the checks is then prepared. There are 2 types of report message types: text (for text channels such as Telegram) and machine (for machine integrations such as webhooks or Pushgateway for Prometheus).

Conclusion

dish started out as a small, learning project. Over time it grew on us thanks to its simplicity, ease of use and maintainability. We have been using it for over 3 years to monitor our services and could not manage without it. We hope you find it as useful as we do.

You can find the source code on our public GitHub profile or just visit the repository directly via this link.