Skip to content

Alert Configuration

Retroview provides powerful tools for server monitoring and input stream diagnostics. However, monitoring becomes truly effective when you receive notifications about issues before they impact your users.

This dashboard allows you to configure alerts for all critical metrics and events described in the server monitoring and input capture documentation.

Server Issues and Corresponding Alerts

The Server Stats section describes the main issues that can occur with servers. For each of them, there is a corresponding alert.

High CPU Load

Issue: As described in the CPU Load section, high CPU load can lead to degradation in stream processing quality.

Alert: cpu_load

Triggers when the average CPU load exceeds 85% over the past hour.

Requires selecting a specific server.

Setup: Set up this alert for each server to timely detect overload and take measures for optimization or infrastructure scaling.

What to do when triggered:

  1. Open the Server Stats dashboard and check the CPU graph for the specific server
  2. Check if there's a "plateau" on the graph — if CPU hits 100% for an extended period, it's critical
  3. Always check the scheduler load graph — it's a more reliable indicator of the problem
  4. If load has been steadily growing over the last 30 days — start planning infrastructure expansion
  5. If it's a one-time spike — check if new streams or clients were added
  6. Consider optimization: move some streams to other servers or scale the cluster

High Scheduler Load

Issue: As explained in CPU monitoring, scheduler load is a more reliable metric for assessing streaming server health than just CPU load.

Alert: scheduler_load

Triggers when the average system scheduler load exceeds 85% over the past hour.

Requires selecting a specific server.

Setup: Make sure to configure this alert for all servers. High scheduler load is a more critical indicator than CPU load.

What to do when triggered:

  1. Open the scheduler graph for the problematic server
  2. If you see a "plateau" (scheduler hitting the limit) — this is critical, the server can't handle the load
  3. Check the number of streams and clients — there may have been a sudden load increase
  4. Urgently move some load to other servers or add a new server to the cluster
  5. If scheduler load is high but CPU is relatively low — this is normal virtual machine operation, but there's no load capacity left

Low Memory

Issue: The RAM Usage section describes how unstable or excessive RAM usage can lead to performance issues.

Alert: memory_usage

Triggers when average memory usage exceeds 85% over the past hour.

Requires selecting a specific server.

Setup: Configure for all servers, especially those processing a large number of streams or performing transcoding.

What to do when triggered:

  1. Open the memory graph for the problematic server
  2. Check if memory usage is stable or growing
  3. Growing memory usage may indicate a leak — contact support with graphs
  4. Check the number of active streams — there may be more than before
  5. Make sure swap is disabled (swap is not needed and dangerous on streaming servers)
  6. If necessary, add RAM or move some load to another server

Disk Issues

Issue: The Disk Utilization section shows how disk overload can lead to DVR write errors.

Alert: disk_io

Triggers when disk utilization exceeds 85% over the past hour.

Requires selecting a specific server.

Setup: Critical for servers with DVR. If you see collapsed writes on the write error graphs, this alert will help identify the issue before failed writes appear.

What to do when triggered:

  1. Open the Disk Utilization section
  2. Check the DVR write errors graph — are there collapsed writes or failed writes
  3. If there are collapsed writes — storage is starting to fall behind, this is a warning situation
  4. If there are failed writes — this is a serious failure, video is lost irretrievably
  5. Check which disks are overloaded (disk utilization percentage graph)
  6. For network storage — check write speed stability, the problem may be in the network
  7. Consider switching to faster disks, Flussonic RAID for load distribution, or reducing the amount of DVR on this server

Input Stream Issues and Corresponding Alerts

The Input Monitoring section describes typical input stream issues in detail. Several alert types are available to track them.

Mass Stream Failure

Issue: As shown in the provider failure examples, simultaneous failure of many streams can occur.

Alert: streams_drop

Triggers for Watcher when more than 10% of streams are dropped during a given time period.

Does not require server selection – applies to all Watcher servers.

Setup: Make sure to configure this alert if you have Watcher. It will help quickly detect systemic problems with the content provider or in your network.

What to do when triggered:

  1. Open the Input Monitoring dashboard
  2. Look at the problematic streams graph — if many streams failed simultaneously, it's a systemic issue
  3. As in the provider failure example — everything was fine and suddenly broke
  4. Call the content provider — possible equipment failure, descrambling issues, or other failure on their side
  5. Check your network — are there routing or bandwidth problems
  6. If subscribers aren't complaining during mass channel failure — perhaps you don't need these channels

Critical Stream Failure

Issue: When a stream that was previously working suddenly stops — this is a critical situation requiring immediate intervention.

Alert: stream_dead

Triggers when a stream that previously had input suddenly stops.

You can select a specific server or use All to apply globally.

Setup: Use for monitoring all streams. If you have particularly critical streams, set up a separate selected_stream_dead alert for them with stricter notification parameters.

What to do when triggered:

  1. Open the Input Monitoring dashboard and select the failed stream
  2. Look at the stream errors graph — what happened before it stopped
  3. Check source availability — the camera may be offline, provider server unavailable, or network issues
  4. If it's an RTSP camera — try reconnecting to it manually
  5. If the source is IPTV — check if authorization expired (403 error)
  6. If many streams failed simultaneously — see recommendations for the streams_drop alert

Monitoring Selected Streams

Alert: selected_stream_dead

Similar to stream_dead, but only monitors selected streams.

You can select a specific server or All, and must select which streams to monitor.

Setup: Use for VIP channels or particularly important streams, configuring a separate contact point with notifications to content managers.

What to do when triggered:

Actions are the same as for stream_dead, but with increased urgency as this is a critical stream. Start diagnostics and recovery immediately.

Unstable Streams (Flapping)

Issue: As described in the sections about evening peak and episodic network spikes, streams can periodically lose and restore connection.

Alert: flapping_streams

Triggers when a stream temporarily loses input and recovers more than 3 times within 3 hours.

This may indicate network issues, an unstable input source, or server-side instability.

You can select a specific server or use All to monitor all streams for flapping behavior.

Setup: Configure this alert to identify systemic network or provider issues. Flapping is a precursor to complete failure and should be addressed proactively.

What to do when triggered:

  1. Open the Input Monitoring dashboard with a wide time range (12-24 hours)
  2. Check if there's daily periodicity in the problems — as in the evening peak example
  3. If problems occur every evening from 7 PM to 1 AM — this is network overload from user traffic
  4. Solution for evening peak: separate the network physically or via VLAN, configure QoS
  5. If flapping is episodic without pattern — as in the network spikes example — check switch loads
  6. Expand network bandwidth or optimize routing
  7. Check source stability — the problem may be on the camera or content provider side

Selected Stream Flapping

Alert: selected_stream_flapping

Similar to flapping_streams, but only monitors selected streams.

You must choose which streams to monitor, and can also select a specific server or use All.

Flapping may be caused by network problems, source-side interruptions, or delivery infrastructure issues.

Setup: Use for critical streams that must operate with maximum stability.

What to do when triggered:

Actions are the same as for flapping_streams, but focused on the specific selected stream. Check the path from source to server for this particular stream.

Increase in Offline Streams

Issue: A gradual increase in the number of offline streams may indicate a growing problem in the network or with the provider, as shown in the episodic outages example.

Alert: input_availability_raise_offline

Triggers when the number of offline streams increases by the specified percentage, as shown on the input_availability graph.

Can be created for all servers or for a specific one.

After selecting this alert type, an additional field will appear where you must specify the percentage increase in offline streams that should trigger the alert.

Setup: Set the threshold to 20-30% to get early warning of issues. This will allow you to respond before the problem affects the majority of streams.

What to do when triggered:

  1. Open the Input Monitoring dashboard
  2. Look at the dynamics of offline stream growth — sudden spike or gradual increase
  3. If growth is gradual — the problem is escalating, possibly network or source degradation
  4. Check if the problem is localized to one server or affects all
  5. If the problem is on all servers simultaneously — most likely an issue with the content provider
  6. If only on one server — check that server's network connection
  7. Contact the content provider or check your network status before the problem affects a critical mass of streams

Increase in Bad Streams

Issue: An increase in the number of streams with input errors (as described in the Stream Errors section) may indicate degradation in network or source quality.

Alert: input_availability_raise_bad

Triggers when the number of streams with input errors increases by the specified percentage, as shown on the input_availability graph.

Can be created for all servers or for a specific one.

After selecting this alert type, an additional field will appear where you must specify the percentage increase in bad streams that should trigger the alert.

Setup: Set the threshold to 15-25% for early detection of stream quality degradation. This is especially important for IPTV services with a large number of channels.

What to do when triggered:

  1. Open the Input Monitoring dashboard
  2. Select the most problematic streams and examine error details
  3. Check the known errors list to diagnose specific problems
  4. lost_packets — network problems, need to improve the channel between source and server
  5. ts_cc (Continuity Counter) — packet loss in MPEG-TS, fix the network
  6. ts_scrambled — stream is encrypted, urgently resolve CAM module issues
  7. src_403/404/500 — source-side problems, change authorization or fix the source
  8. If errors are massive and uniform — this is a systemic problem, requires intervention in network or with provider
  9. Quality degradation leads to video artifacts for users — act quickly

How to Create an Alert

Step-by-Step Instructions

1. Select a server – At the top of the dashboard, choose the server where the alert should apply. Some alerts support the All option to apply across all servers.

2. Select streams – Required only for alerts related to specific streams (selected_stream_dead, selected_stream_flapping).

3. Choose an alert – In the Available alerts to create section, select the type of alert you want to create, based on the recommendations above.

4. Configure a contact point

  • For email, you can create a contact directly in the Create email contact point section
  • For others (Telegram, Slack, etc.), use Grafana's Alerting → Contact points page

5. Enter an alert name – Provide a meaningful name reflecting the nature of the issue, for example: "Production Server CPU Critical" or "VIP Channels Down Alert".

6. Choose an evaluation interval (pending period)

This is the time Grafana waits before triggering the alert once the condition is met.

Specify it in formats like:

  • 10s (10 seconds) – for critical alerts on VIP streams
  • 30s (30 seconds) – for important alerts
  • 1m (1 minute – recommended default)
  • 5m (5 minutes) – for non-critical warnings
  • 1h (1 hour) – for trend monitoring

Recommendation: For server alerts (CPU, memory, disk), use 1m or 5m to avoid false positives on short-term spikes. For critical streams, you can use 30s or even 10s.

7. Confirm – If all parameters are correctly filled in, the alert will be created and added to the active alerts list

Create Email Contact Point

Allows you to create a contact point for sending alert notifications to email addresses. Convenient to use directly from the dashboard without going into Grafana settings.

How to Create an Email Contact Point

1. Contact name – Enter a unique name for the contact point. This name will appear in the recipient selection dropdown.

2. Email addresses – Enter one or more email addresses to receive alert notifications.

To enter multiple addresses, separate them with commas.

Example:

example@mymail.com,example2@mymail.com

After saving, the contact point will appear in the list and be available for selection when creating alerts.

Alerts List

Displays all created alerts. Allows you to view key information at a glance and manage existing alerts.

Displayed Information

Alert name – Unique name assigned during creation

Folder – The folder or group to which the alert belongs

Status – If the alert is paused, a paused label will appear

Managing Alerts

Each alert has a button. Clicking it will delete the alert.

More Details

For full details and alert logic, go to:

Grafana → Alerting → Alert rules

Contact Points List

Displays a list of all created contact points used for sending alert notifications. Allows you to review and remove any contacts.

Displayed Information

Type – The type of contact point (e.g., Email, Webhook, etc.)

Name – The unique name given during contact creation

Managing Contact Points

Each contact point has a button. Clicking it will delete the contact.

More Details

To view or configure contact points in Grafana, go to:

Grafana → Alerting → Contact points