I have been gathering metrics from my DrayTek Vigor 165 modem for a while now, and finally got around to documenting the setup, so now you get to read about it.
I’m using the Vigor 165 to connect to the Internet via a Deutsche Telekom 250 Mbit/s VDSL connection. That modem supports SNMP and can provide metrics like the line speed or quality. A couple of years back, I wanted to get that data into my Grafana dashboards. After some searching, I came across the SNMP Exporter.
The way the exporter works is by regularly making SNMP requests to the targets and providing the data in the standard Prometheus format. And because it involves SNMP, the setup is a bit more involved than your average exporter.
SNMP
SNMP is the Simple Network Management Protocol. As the name implies, the protocol is intended for managing networking devices, like modems or routers. I’ve never worked professionally in the networking area, so I don’t have any experience with actively managing network devices like switches via SNMP. And as far as I’m aware, my modem is the only device that supports SNMP in my Homelab. So I won’t be discussing the configuration part here, only the read-only part my modem uses for providing metrics.
Information in SNMP is organized according to management information base (MIB) files, which specify what variables are available, and their hierarchy. Here is one such file, defining the variables for VDSL information: VDSL2-LINE-MIB. While these files can be read by humans, I always kept to websites like observium for browsing them.
I think for the purposes of understanding the format, an example result from querying my modem is going to more illuminating than me trying to describe what SNMP queries look like:
snmpbulkwalk -v 2c -c example 203.0.113.1
IF-MIB::ifType.1 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.4 = INTEGER: vdsl2(251)
IF-MIB::ifType.5 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.6 = INTEGER: propVirtual(53)
IF-MIB::ifType.7 = INTEGER: propVirtual(53)
IF-MIB::ifType.8 = INTEGER: propVirtual(53)
IF-MIB::ifType.9 = INTEGER: propVirtual(53)
IF-MIB::ifType.10 = INTEGER: propVirtual(53)
IF-MIB::ifType.11 = INTEGER: propVirtual(53)
IF-MIB::ifType.12 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.13 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifMtu.1 = INTEGER: 1500
IF-MIB::ifMtu.4 = INTEGER: 1500
IF-MIB::ifMtu.5 = INTEGER: 1500
IF-MIB::ifMtu.6 = INTEGER: 1500
IF-MIB::ifMtu.7 = INTEGER: 1500
IF-MIB::ifMtu.8 = INTEGER: 1500
IF-MIB::ifMtu.9 = INTEGER: 1500
IF-MIB::ifMtu.10 = INTEGER: 1500
IF-MIB::ifMtu.11 = INTEGER: 1500
IF-MIB::ifMtu.12 = INTEGER: 1500
IF-MIB::ifMtu.13 = INTEGER: 1500
IF-MIB::ifSpeed.1 = Gauge32: 1000000000
IF-MIB::ifSpeed.4 = Gauge32: 292016000
IF-MIB::ifSpeed.5 = Gauge32: 1000000000
IF-MIB::ifSpeed.6 = Gauge32: 0
IF-MIB::ifSpeed.7 = Gauge32: 0
IF-MIB::ifSpeed.8 = Gauge32: 0
IF-MIB::ifSpeed.9 = Gauge32: 1000000000
IF-MIB::ifSpeed.10 = Gauge32: 1000000000
IF-MIB::ifSpeed.11 = Gauge32: 1000000000
IF-MIB::ifSpeed.12 = Gauge32: 1000000000
IF-MIB::ifSpeed.13 = Gauge32: 1000000000
The snmpbulkwalk
command has the advantage over other SNMP commands that it
just walks everything the target has to offer, instead of having to provide the
OIDs (Object Identifier) to be queried explicitly.
The above output shows a couple of values from my modem, regarding the setup
of its network interfaces. You can for example see that interface .4
is the
VDSL interface, while .1
, .5
are Ethernet interfaces. The ifSpeed
object then
shows the speed. The -v 2c
parameter in the command provides the SNMP version
to be used, which -c example
defines the community
. The community is sort-of
an identifier.
SNMP itself, until version 3, does not support any kind of authentication at all. So none needs to be provided. As my modem only supports querying but not configuration, that’s okay.
The interesting information for me comes a bit later in the output of the same command:
[...]
SNMPv2-SMI::transmission.94.1.1.3.1.7.4 = INTEGER: -3
SNMPv2-SMI::transmission.94.1.1.3.1.8.4 = Gauge32: 51374000
SNMPv2-SMI::transmission.94.1.1.4.1.1.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.4.1.2.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.4.1.3.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.4.1.4.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.5.1.1.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.5.1.2.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.5.1.3.4 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.5.1.4.4 = Gauge32: 0
SNMPv2-SMI::transmission.251.1.2.2.1.1.4.2 = INTEGER: 2
SNMPv2-SMI::transmission.251.1.2.2.1.2.4.1 = Gauge32: 292016000
SNMPv2-SMI::transmission.251.1.2.2.1.2.4.2 = Gauge32: 46718000
SNMPv2-SMI::transmission.251.1.2.2.1.3.4.1 = Gauge32: 0
SNMPv2-SMI::transmission.251.1.2.2.1.3.4.2 = Gauge32: 0
SNMPv2-SMI::transmission.251.1.2.2.1.4.4.1 = INTEGER: 9
SNMPv2-SMI::transmission.251.1.2.2.1.4.4.2 = INTEGER: 0
SNMPv2-SMI::transmission.251.1.2.2.1.5.4.1 = INTEGER: 305
SNMPv2-SMI::transmission.251.1.2.2.1.5.4.2 = INTEGER: 150
SNMPv2-SMI::transmission.251.1.2.2.1.6.4.1 = INTEGER: 0
SNMPv2-SMI::transmission.251.1.2.2.1.6.4.2 = INTEGER: 0
[...]
These outputs, on their own, are not really that useful. Note especially the
251.1.2.2.1
and 94.1.1.3.1
as part of the OIDs. That indicates that
snmpbulkwalk
did not have the necessary MIBs to properly decode the information
received from my modem. This can be fixed by making those MIBs available.
Helpfully, DrayTek provides the supported MIBs for the devices on their website.
Namely, the following MIBs are supported:
All of the links go to mibbrowser, which I’ve found
a useful page to download MIBs. To make snmpbulkwalk
use those additional MIBs,
download them into a local directory and start snmpbulkwalk
like this:
snmpbulkwalk -M /path/to/downloaded/mibs/:/usr/share/snmp/mibs -m ALL -v 2c -c example 203.0.113.1
The /usr/share/snmp/mibs
is the path to some general MIBs on my Gentoo system,
so you don’t have to download the more common MIBs.
With that invocation, the above example becomes a lot clearer:
ADSL-LINE-MIB::adslAturCurrOutputPwr.4 = INTEGER: -3 tenth dBm
ADSL-LINE-MIB::adslAturCurrAttainableRate.4 = Gauge32: 51192000 bps
ADSL-LINE-MIB::adslAtucChanInterleaveDelay.4 = Gauge32: 0 milli-seconds
ADSL-LINE-MIB::adslAtucChanCurrTxRate.4 = Gauge32: 0 bps
ADSL-LINE-MIB::adslAtucChanPrevTxRate.4 = Gauge32: 0 bps
ADSL-LINE-MIB::adslAtucChanCrcBlockLength.4 = Gauge32: 0 byte
ADSL-LINE-MIB::adslAturChanInterleaveDelay.4 = Gauge32: 0 milli-seconds
ADSL-LINE-MIB::adslAturChanCurrTxRate.4 = Gauge32: 0 bps
ADSL-LINE-MIB::adslAturChanPrevTxRate.4 = Gauge32: 0 bps
ADSL-LINE-MIB::adslAturChanCrcBlockLength.4 = Gauge32: 0
VDSL2-LINE-MIB::xdsl2ChStatusUnit.4.xtur = INTEGER: xtur(2)
VDSL2-LINE-MIB::xdsl2ChStatusActDataRate.4.xtuc = Gauge32: 292016000 bits/second
VDSL2-LINE-MIB::xdsl2ChStatusActDataRate.4.xtur = Gauge32: 46718000 bits/second
VDSL2-LINE-MIB::xdsl2ChStatusPrevDataRate.4.xtuc = Gauge32: 0 bits/second
VDSL2-LINE-MIB::xdsl2ChStatusPrevDataRate.4.xtur = Gauge32: 0 bits/second
VDSL2-LINE-MIB::xdsl2ChStatusActDelay.4.xtuc = Wrong Type (should be Gauge32 or Unsigned32): INTEGER: 9
VDSL2-LINE-MIB::xdsl2ChStatusActDelay.4.xtur = Wrong Type (should be Gauge32 or Unsigned32): INTEGER: 0
VDSL2-LINE-MIB::xdsl2ChStatusActInp.4.xtuc = Wrong Type (should be Gauge32 or Unsigned32): INTEGER: 305
VDSL2-LINE-MIB::xdsl2ChStatusActInp.4.xtur = Wrong Type (should be Gauge32 or Unsigned32): INTEGER: 150
VDSL2-LINE-MIB::xdsl2ChStatusInpReport.4.xtuc = INTEGER: 0
VDSL2-LINE-MIB::xdsl2ChStatusInpReport.4.xtur = INTEGER: 0
The addition of the correct MIBs allows snmpbulkwalk
to correctly interpret
the values coming from the modem. I’m not 100% sure what the Wrong Type
errors
are about, but I assume that the modem just doesn’t implement the MIB quite
correctly.
Configuring the SNMP exporter
As I’ve noted above, the configuration of the SNMP exporter is a bit more involved.
First, a generator config needs to be created. In my case, it looked like this:
auths:
draytek:
community: example
version: 2
modules:
draytek:
walk:
- 1.3.6.1.2.1.10.94.1.1.3.1.6.4
- 1.3.6.1.2.1.10.94.1.1.3.1.4.4
- 1.3.6.1.2.1.10.94.1.1.3.1.5.4
- 1.3.6.1.2.1.1.5
- 1.3.6.1.2.1.10.251.1.2.2.1.2.4.1
- 1.3.6.1.2.1.10.251.1.2.2.1.2.4.2
As you can see, I’m only declaring a handful of values I actually want to gather. I’m skipping most of the per-interface data, because I can already get that data from the PPPoE interface on my OPNsense router. I’ve restricted my gathering to the modem specific data:
1.3.6.1.2.1.10.94.1.1.3.1.6
: Current status of the ADSL line1.3.6.1.2.1.10.94.1.1.3.1.4
: Current noise on the line1.3.6.1.2.1.10.94.1.1.3.1.5
: Current attenuation on the line1.3.6.1.2.1.1.5
: System name1.3.6.1.2.1.10.251.1.2.2.1.2
: Current actual data rate on the line, Up and Down
With that defined, the SNMP exporter config file can be generated. But first,
I needed to build the generator. For this, I cloned the SNMP exporter repo
and switched into the generator/
directory. There, I ran this command:
make generator
And then finally, I was able to run the command to generate the SNMP exporter config file:
./generator generate -m /PATH/TO/MIB/DIR -g /PATH/TO/generator.yml -o snmp.yaml
With the configuration above, the result looks like this:
# WARNING: This file was auto-generated using snmp_exporter generator, manual changes will be lost.
auths:
draytek:
community: example
security_level: noAuthNoPriv
auth_protocol: MD5
priv_protocol: DES
version: 2
modules:
draytek:
get:
- 1.3.6.1.2.1.1.5.0
- 1.3.6.1.2.1.10.251.1.2.2.1.2.4.1
- 1.3.6.1.2.1.10.251.1.2.2.1.2.4.2
- 1.3.6.1.2.1.10.94.1.1.3.1.4.4
- 1.3.6.1.2.1.10.94.1.1.3.1.5.4
- 1.3.6.1.2.1.10.94.1.1.3.1.6.4
metrics:
- name: sysName
oid: 1.3.6.1.2.1.1.5
type: DisplayString
help: An administratively-assigned name for this managed node - 1.3.6.1.2.1.1.5
- name: xdsl2ChStatusActDataRate
oid: 1.3.6.1.2.1.10.251.1.2.2.1.2
type: gauge
help: The actual net data rate at which the bearer channel is operating, if
in L0 power management state - 1.3.6.1.2.1.10.251.1.2.2.1.2
indexes:
- labelname: ifIndex
type: gauge
- labelname: xdsl2ChStatusUnit
type: gauge
enum_values:
1: xtuc
2: xtur
- name: adslAturCurrSnrMgn
oid: 1.3.6.1.2.1.10.94.1.1.3.1.4
type: gauge
help: Noise Margin as seen by this ATU with respect to its received signal in
tenth dB. - 1.3.6.1.2.1.10.94.1.1.3.1.4
indexes:
- labelname: ifIndex
type: gauge
- name: adslAturCurrAtn
oid: 1.3.6.1.2.1.10.94.1.1.3.1.5
type: gauge
help: Measured difference in the total power transmitted by the peer ATU and
the total power received by this ATU. - 1.3.6.1.2.1.10.94.1.1.3.1.5
indexes:
- labelname: ifIndex
type: gauge
- name: adslAturCurrStatus
oid: 1.3.6.1.2.1.10.94.1.1.3.1.6
type: Bits
help: Indicates current state of the ATUR line - 1.3.6.1.2.1.10.94.1.1.3.1.6
indexes:
- labelname: ifIndex
type: gauge
enum_values:
0: noDefect
1: lossOfFraming
2: lossOfSignal
3: lossOfPower
4: lossOfSignalQuality
This file defines the translation of the OID values to Prometheus metrics. The output in Prometheus format ultimately looks like this:
# HELP adslAturCurrAtn Measured difference in the total power transmitted by the peer ATU and the total power received by this ATU. - 1.3.6.1.2.1.10.94.1.1.3.1.5
# TYPE adslAturCurrAtn gauge
adslAturCurrAtn{ifIndex="4"} 5
# HELP adslAturCurrSnrMgn Noise Margin as seen by this ATU with respect to its received signal in tenth dB. - 1.3.6.1.2.1.10.94.1.1.3.1.4
# TYPE adslAturCurrSnrMgn gauge
adslAturCurrSnrMgn{ifIndex="4"} 8
# HELP adslAturCurrStatus Indicates current state of the ATUR line - 1.3.6.1.2.1.10.94.1.1.3.1.6 (Bits)
# TYPE adslAturCurrStatus gauge
adslAturCurrStatus{adslAturCurrStatus="lossOfFraming",ifIndex="4"} 0
adslAturCurrStatus{adslAturCurrStatus="lossOfPower",ifIndex="4"} 0
adslAturCurrStatus{adslAturCurrStatus="lossOfSignal",ifIndex="4"} 0
adslAturCurrStatus{adslAturCurrStatus="lossOfSignalQuality",ifIndex="4"} 0
adslAturCurrStatus{adslAturCurrStatus="noDefect",ifIndex="4"} 0
# HELP sysName An administratively-assigned name for this managed node - 1.3.6.1.2.1.1.5
# TYPE sysName gauge
sysName{sysName="foobar"} 1
# HELP xdsl2ChStatusActDataRate The actual net data rate at which the bearer channel is operating, if in L0 power management state - 1.3.6.1.2.1.10.251.1.2.2.1.2
# TYPE xdsl2ChStatusActDataRate gauge
xdsl2ChStatusActDataRate{ifIndex="4",xdsl2ChStatusUnit="1"} 2.92016e+08
xdsl2ChStatusActDataRate{ifIndex="4",xdsl2ChStatusUnit="2"} 4.6718e+07
The one thing that I could never really get to work is the adslAturCurrStatus
value. It should be showing the current state of the line, indicating whether the
DSL line itself is up or not. But I never got it to show anything. All values are
always zero, even though I would have expected the noDefect
value to be 1
when everything is alright. But it only ever gave me zeroes.
One nice thing to see: I’m actually getting slightly faster downstream service than what I’m paying for. I’ve got a 250/40 contract, but I’m getting 292 Mbit/s down and 46 Mbit/s up. I think that might be because there’s not many other users on this connection overall, as most people in the neighborhood have cable Internet instead.
With the configuration file in hand, here is the Deployment configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: snmp-exporter
spec:
replicas: 1
selector:
matchLabels:
homelab/app: snmp-exporter
strategy:
type: "Recreate"
template:
metadata:
labels:
homelab/app: snmp-exporter
annotations:
checksum/config: {{ include (print $.Template.BasePath "/exporter-config.yaml") . | sha256sum }}
spec:
automountServiceAccountToken: false
securityContext:
fsGroup: 1000
containers:
- name: snmp-exporter
image: prom/snmp-exporter:{{ .Values.appVersion }}
args:
- "--log.format=json"
- "--config.file=/etc/snmp_exporter/snmp.yml"
volumeMounts:
- name: config
mountPath: /etc/snmp_exporter
readOnly: true
resources:
requests:
cpu: 100m
memory: 50Mi
livenessProbe:
httpGet:
port: {{ .Values.port }}
path: "/-/healthy"
initialDelaySeconds: 15
periodSeconds: 30
ports:
- name: snmp-scrape
containerPort: {{ .Values.port }}
protocol: TCP
volumes:
- name: config
configMap:
name: exporter-config
The more interesting configuration is the ScrapeConfig for the Prometheus operator. Because if you look back at the generated config file, you will find something missing: A declaration of a target. This is instead done as part of the scrape config, which in my case looks like this:
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: scraping-modem
labels:
prometheus: scrape-modems
spec:
staticConfigs:
- labels:
job: modemmetrics
targets:
- 203.0.113.1
metricsPath: /snmp
scrapeInterval: 1m
params:
module:
- draytek
auth:
- draytek
relabelings:
- sourceLabels:
- "__address__"
targetLabel: __param_target
- targetLabel: instance
replacement: modemnamehere
- targetLabel: __address__
replacement: "snmp-exporter.snmp-exporter.svc.cluster.local:9116"
metricRelabelings:
- sourceLabels:
- "__name__"
action: drop
regex: snmp_scrape_.*
- sourceLabels:
- "__name__"
action: drop
regex: sysName
- sourceLabels:
- "__name__"
action: drop
regex: scrape_.*
It’s important to note the IP under targets
is the IP of the modem, not the
IP of the SNMP exporter. This is different to how e.g. the node exporter works.
There, the targets are the machines which run the exporter. Instead, the __address__
label needs to be replaced in a relabeling so that Prometheus contacts the
exporter.
The params.module
and params.auth
parameters define the sections from
the SNMP exporter’s config file to be used for this scrape job. This way,
you can have multiple sections for different types of devices in one exporter’s
config and control the target+module/auth combinations in the scrape config.
To be honest, this way of configuring is a bit weird to me. I would have
rather expected the different targets to be defined in the exporter’s config
and then being supplied with an identifying label in the metrics.
Results
Sadly, I don’t really have any interesting plots to show here, save perhaps for
this one, which shows that my line got a lot better towards the end of 2023: Attenuation and Signal-to-Noise ratio of my VDSL line. Attenuation in green starting at 10 dB, SNR in yellow starting at 6.
The quality of my line markedly improved around October 12th. I don’t know what might have changed here, but attenuation went down by 4 dB and Signal-to-Noise ratio improved by 2 dB. That didn’t come with any improvements for me, though.
And that’s it for today. Another small step in my quest to monitor absolutely everything. 😁