Nuxtstop

For all things nuxt.js

Part IV: Telegram notifications

7 0

When we have all the monitoring component in the place we might be wondering how to receive notification when something went wrong. Good question.

Alertmanager does the trick and it natively supports all the major channels - Slack, PagerDuty, OpsGenie and more. But it does not support Telegram which is kinda popular even for the business communication in smaller companies.

Let's fix it with a few AWS resources and simple Go lambda function.

SNS

SNS is one of the supported Alermanager's channels. It's actually very convenient in this case since we have the IAM role attached to instance - we don't need to hassle with credentials.

Let's dive into Terraform again.

resource "aws_sns_topic" "prometheus_alerts" {
  name = "prometheus_alerts"
}
Enter fullscreen mode Exit fullscreen mode

SQS

The next part is the queue, we can possibly connect a lambda directly do SNS but SQS gives us more reliability. SQS will effectively become the only SNS subscriber and messages will be queued for the further processing.

resource "aws_sqs_queue" "prometheus_alerts" {
  name = "prometheus_alerts"
}
Enter fullscreen mode Exit fullscreen mode

And since SNS topic will be pushing messages to this queue - we also need to allow this:

resource "aws_sqs_queue_policy" "prometheus_alerts" {
  queue_url = aws_sqs_queue.prometheus_alerts.id

  policy = <<POLICY
{
  "Version": "2012-10-17",
  "Id": "sqspolicy",
  "Statement": [
    {
      "Sid": "First",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "sqs:SendMessage",
      "Resource": "${aws_sqs_queue.prometheus_alerts.arn}",
      "Condition": {
        "ArnEquals": {
          "aws:SourceArn": "${aws_sns_topic.prometheus_alerts.arn}"
        }
      }
    }
  ]
}
POLICY
}
Enter fullscreen mode Exit fullscreen mode

Also, create the subscription:

resource "aws_sns_topic_subscription" "prometheus_alerts" {
  topic_arn = aws_sns_topic.prometheus_alerts.arn
  protocol  = "sqs"
  endpoint  = aws_sqs_queue.prometheus_alerts.arn
}
Enter fullscreen mode Exit fullscreen mode

Lambda function

Here we go again, Lambda needs role, permissions.. so let's create these resources first.

data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

resource "aws_iam_role" "alertmanager_notify" {
  name = "alertmanager_notify"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "lambda.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

resource "aws_iam_policy" "alertmanager_notify" {
  name = "alertmanager_notify"
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
        ]
        Resource = [
          "arn:aws:logs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:log-group:/aws/lambda/alertmanager_notify",
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogStream",
          "logs:PutLogEvents",
        ]
        Resource = [
          "arn:aws:logs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:log-group:/aws/lambda/alertmanager_notify:*",
        ]
      },
      {
        "Effect": "Allow",
        "Action": [
          "sqs:DeleteMessage",
          "sqs:GetQueueAttributes",
          "sqs:ReceiveMessage",
        ],
        "Resource": [
          aws_sqs_queue.prometheus_alerts.arn,
        ]
      },
    ]
  })
}

resource "aws_iam_role_policy_attachment" "alertmanager_notify" {
  role       = aws_iam_role.alertmanager_notify.name
  policy_arn = aws_iam_policy.alertmanager_notify.arn
}
Enter fullscreen mode Exit fullscreen mode

With such permissions, Lambda function is able to process messages from the SQS queue and delete them when processed. Cloudwatch part is pretty self-descriptive - we want to see some logs. Lambda function definition itself is straightforward:

resource "aws_lambda_function" "alertmanager_notify" {
  filename         = "${path.module}/assets/function.zip"
  source_code_hash = filebase64sha256("${path.module}/assets/function.zip")
  function_name    = "alertmanager_notify"
  role             = aws_iam_role.this.arn
  handler          = "main"
  runtime          = "go1.x"

  lifecycle {
    ignore_changes = [
      filename,
      last_modified,
      source_code_hash,
    ]
  }

  environment {
    variables = {
      TELEGRAM_TOKEN   = <telegram token from bot father>
      TELEGRAM_CHANNEL = <telegram group id>
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Note the ${path.module}/assets/function.zip, this is a zip file with some dummy code. We only need this archive for the initial creation. The final code will be pushed to the AWS externally.

Lambda code

For the integration with Telegram we're gonna use github.com/go-telegram-bot-api/telegram-bot-api library. It's just thin wrapper over the Telegram API and it's sufficient for this use case.

Please note that signatures of handle and sendMessage should ideally contain interface, some sort of TelegramSender or so. But for the sake of simplicity we use BotAPI directly 😊

package main

import (
    "context"
    "fmt"
    "github.com/aws/aws-lambda-go/events"
    "github.com/aws/aws-lambda-go/lambda"
    tgbotapi "github.com/go-telegram-bot-api/telegram-bot-api"
    log "github.com/sirupsen/logrus"
    "os"
    "strconv"
)

var (
    telegramToken   = os.Getenv("TELEGRAM_TOKEN")
    telegramChannel = os.Getenv("TELEGRAM_CHANNEL")
    parseMode       = "HTML"
)

func init() {
    log.SetFormatter(&log.JSONFormatter{})
}

func sendMessage(bot *tgbotapi.BotAPI, channel int64, message string, mode string) error {
    log.Info("sending notification to Telegram")
    msg := tgbotapi.NewMessage(channel, message)
    msg.ParseMode = mode

    _, err := bot.Send(msg)
    if err != nil {
        return fmt.Errorf("could not send message: %s", err)
    }

    return nil
}

func handle(bot *tgbotapi.BotAPI, channel int64) func(context.Context, events.SQSEvent) error {
    return func(ctx context.Context, event events.SQSEvent) error {
        for _, record := range event.Records {
            log.
                WithField("message_id", record.MessageId).
                Info("processing SQS record")
            err := sendMessage(bot, channel, record.Body, parseMode)
            if err != nil {
                log.Fatalf("could process message: %s", err)
            }

        }
        return nil
    }
}

func main() {
    // create telegram client
    bot, err := tgbotapi.NewBotAPI(telegramToken)
    if err != nil {
        log.Fatalf("could not create telegram client")
    }

    // parse
    telegramChannelInt, err := strconv.ParseInt(telegramChannel, 10, 64)
    if err != nil {
        log.Fatalf("could not parse channel id")
    }

    lambda.Start(handle(bot, telegramChannelInt))
}
Enter fullscreen mode Exit fullscreen mode

Let's deploy the function!

GOARCH=amd64 GOOS=linux go build main.go
zip function.zip main
aws lambda update-function-code --function-name alertmanager_notify --zip-file fileb://./function.zip
Enter fullscreen mode Exit fullscreen mode

Alertmanager configuration

First of all, we need to allow interaction with SNS topic for the instance where the Alertmanager is running. Add this statement to the IAM policy from the first chapter:

      {
        Effect = "Allow"
        Action = [
          "sns:Publish",
        ]
        Resource = [
          aws_sns_topic.prometheus_alerts.arn,
        ]
      },
Enter fullscreen mode Exit fullscreen mode

The last bit is yaml for the Alertmanager. This is perhaps the simplest configuration:

global:
  resolve_timeout: 1m
receivers:
- name: sns
  sns_configs:
  - message: |
      {{ if eq .Status "firing" }}🔥 {{ end }}{{ if eq .Status "resolved" }}✅ {{ end }}[{{ .Status | toUpper }}] {{ .CommonLabels.alertname }}
      {{ range .Alerts }}
          <b>Alert:</b> {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
          <b>Description:</b> {{ .Annotations.description }}
          <b>Details:</b>
          {{ range .Labels.SortedPairs }}- {{ .Name }}: <i>{{ .Value }}</i>
          {{ end }}
      {{ end }}
    sigv4:
      region: eu-west-1
    topic_arn: <SNS topic ARN>
route:
  group_by:
  - "..."
  group_interval: 30s
  group_wait: 5s
  receiver: sns
  repeat_interval: 3h
Enter fullscreen mode Exit fullscreen mode

Result

Now we're able to receive alerts in the Telegram group, see the following example that comes from the similar configuration. Pretty neat, huh?

Image description

Wrap

And this is the end of this series. I've been working on this setup in past two months and I must say I'm pretty happy with the overall result.

Here's also one final disclaimer - do not forget to establish external monitoring for the Prometheus. Expose readiness server's probe to the internet and use tools such uptime robot because you really want to know when Prometheus goes down.

Do you have any questions? Ping me on twitter or ask it directly here. I'll try to do my best to help you.