Alert Configuration¶
Retroview provides powerful tools for server monitoring and input stream diagnostics. However, monitoring becomes truly effective when you receive notifications about issues before they impact your users.
This dashboard allows you to configure alerts for all critical metrics and events described in the server monitoring and input capture documentation.
Server Issues and Corresponding Alerts¶
The Server Stats section describes the main issues that can occur with servers. For each of them, there is a corresponding alert.
High CPU Load¶
Issue: As described in the CPU Load section, high CPU load can lead to degradation in stream processing quality.
Alert: cpu_load
Triggers when the average CPU load exceeds 85% over the past hour.
Requires selecting a specific server.
Setup: Set up this alert for each server to timely detect overload and take measures for optimization or infrastructure scaling.
What to do when triggered:
- Open the Server Stats dashboard and check the CPU graph for the specific server
- Check if there's a "plateau" on the graph — if CPU hits 100% for an extended period, it's critical
- Always check the scheduler load graph — it's a more reliable indicator of the problem
- If load has been steadily growing over the last 30 days — start planning infrastructure expansion
- If it's a one-time spike — check if new streams or clients were added
- Consider optimization: move some streams to other servers or scale the cluster
High Scheduler Load¶
Issue: As explained in CPU monitoring, scheduler load is a more reliable metric for assessing streaming server health than just CPU load.
Alert: scheduler_load
Triggers when the average system scheduler load exceeds 85% over the past hour.
Requires selecting a specific server.
Setup: Make sure to configure this alert for all servers. High scheduler load is a more critical indicator than CPU load.
What to do when triggered:
- Open the scheduler graph for the problematic server
- If you see a "plateau" (scheduler hitting the limit) — this is critical, the server can't handle the load
- Check the number of streams and clients — there may have been a sudden load increase
- Urgently move some load to other servers or add a new server to the cluster
- If scheduler load is high but CPU is relatively low — this is normal virtual machine operation, but there's no load capacity left
Low Memory¶
Issue: The RAM Usage section describes how unstable or excessive RAM usage can lead to performance issues.
Alert: memory_usage
Triggers when average memory usage exceeds 85% over the past hour.
Requires selecting a specific server.
Setup: Configure for all servers, especially those processing a large number of streams or performing transcoding.
What to do when triggered:
- Open the memory graph for the problematic server
- Check if memory usage is stable or growing
- Growing memory usage may indicate a leak — contact support with graphs
- Check the number of active streams — there may be more than before
- Make sure swap is disabled (swap is not needed and dangerous on streaming servers)
- If necessary, add RAM or move some load to another server
Disk Issues¶
Issue: The Disk Utilization section shows how disk overload can lead to DVR write errors.
Alert: disk_io
Triggers when disk utilization exceeds 85% over the past hour.
Requires selecting a specific server.
Setup: Critical for servers with DVR. If you see collapsed writes on the write error graphs, this alert will help identify the issue before failed writes appear.
What to do when triggered:
- Open the Disk Utilization section
- Check the DVR write errors graph — are there
collapsed writesorfailed writes - If there are
collapsed writes— storage is starting to fall behind, this is a warning situation - If there are
failed writes— this is a serious failure, video is lost irretrievably - Check which disks are overloaded (disk utilization percentage graph)
- For network storage — check write speed stability, the problem may be in the network
- Consider switching to faster disks, Flussonic RAID for load distribution, or reducing the amount of DVR on this server
Input Stream Issues and Corresponding Alerts¶
The Input Monitoring section describes typical input stream issues in detail. Several alert types are available to track them.
Mass Stream Failure¶
Issue: As shown in the provider failure examples, simultaneous failure of many streams can occur.
Alert: streams_drop
Triggers for Watcher when more than 10% of streams are dropped during a given time period.
Does not require server selection – applies to all Watcher servers.
Setup: Make sure to configure this alert if you have Watcher. It will help quickly detect systemic problems with the content provider or in your network.
What to do when triggered:
- Open the Input Monitoring dashboard
- Look at the problematic streams graph — if many streams failed simultaneously, it's a systemic issue
- As in the provider failure example — everything was fine and suddenly broke
- Call the content provider — possible equipment failure, descrambling issues, or other failure on their side
- Check your network — are there routing or bandwidth problems
- If subscribers aren't complaining during mass channel failure — perhaps you don't need these channels
Critical Stream Failure¶
Issue: When a stream that was previously working suddenly stops — this is a critical situation requiring immediate intervention.
Alert: stream_dead
Triggers when a stream that previously had input suddenly stops.
You can select a specific server or use All to apply globally.
Setup: Use for monitoring all streams. If you have particularly critical streams, set up a separate selected_stream_dead alert for them with stricter notification parameters.
What to do when triggered:
- Open the Input Monitoring dashboard and select the failed stream
- Look at the stream errors graph — what happened before it stopped
- Check source availability — the camera may be offline, provider server unavailable, or network issues
- If it's an RTSP camera — try reconnecting to it manually
- If the source is IPTV — check if authorization expired (403 error)
- If many streams failed simultaneously — see recommendations for the
streams_dropalert
Monitoring Selected Streams¶
Alert: selected_stream_dead
Similar to stream_dead, but only monitors selected streams.
You can select a specific server or All, and must select which streams to monitor.
Setup: Use for VIP channels or particularly important streams, configuring a separate contact point with notifications to content managers.
What to do when triggered:
Actions are the same as for stream_dead, but with increased urgency as this is a critical stream. Start diagnostics and recovery immediately.
Unstable Streams (Flapping)¶
Issue: As described in the sections about evening peak and episodic network spikes, streams can periodically lose and restore connection.
Alert: flapping_streams
Triggers when a stream temporarily loses input and recovers more than 3 times within 3 hours.
This may indicate network issues, an unstable input source, or server-side instability.
You can select a specific server or use All to monitor all streams for flapping behavior.
Setup: Configure this alert to identify systemic network or provider issues. Flapping is a precursor to complete failure and should be addressed proactively.
What to do when triggered:
- Open the Input Monitoring dashboard with a wide time range (12-24 hours)
- Check if there's daily periodicity in the problems — as in the evening peak example
- If problems occur every evening from 7 PM to 1 AM — this is network overload from user traffic
- Solution for evening peak: separate the network physically or via VLAN, configure QoS
- If flapping is episodic without pattern — as in the network spikes example — check switch loads
- Expand network bandwidth or optimize routing
- Check source stability — the problem may be on the camera or content provider side
Selected Stream Flapping¶
Alert: selected_stream_flapping
Similar to flapping_streams, but only monitors selected streams.
You must choose which streams to monitor, and can also select a specific server or use All.
Flapping may be caused by network problems, source-side interruptions, or delivery infrastructure issues.
Setup: Use for critical streams that must operate with maximum stability.
What to do when triggered:
Actions are the same as for flapping_streams, but focused on the specific selected stream. Check the path from source to server for this particular stream.
Increase in Offline Streams¶
Issue: A gradual increase in the number of offline streams may indicate a growing problem in the network or with the provider, as shown in the episodic outages example.
Alert: input_availability_raise_offline
Triggers when the number of offline streams increases by the specified percentage, as shown on the input_availability graph.
Can be created for all servers or for a specific one.
After selecting this alert type, an additional field will appear where you must specify the percentage increase in offline streams that should trigger the alert.
Setup: Set the threshold to 20-30% to get early warning of issues. This will allow you to respond before the problem affects the majority of streams.
What to do when triggered:
- Open the Input Monitoring dashboard
- Look at the dynamics of offline stream growth — sudden spike or gradual increase
- If growth is gradual — the problem is escalating, possibly network or source degradation
- Check if the problem is localized to one server or affects all
- If the problem is on all servers simultaneously — most likely an issue with the content provider
- If only on one server — check that server's network connection
- Contact the content provider or check your network status before the problem affects a critical mass of streams
Increase in Bad Streams¶
Issue: An increase in the number of streams with input errors (as described in the Stream Errors section) may indicate degradation in network or source quality.
Alert: input_availability_raise_bad
Triggers when the number of streams with input errors increases by the specified percentage, as shown on the input_availability graph.
Can be created for all servers or for a specific one.
After selecting this alert type, an additional field will appear where you must specify the percentage increase in bad streams that should trigger the alert.
Setup: Set the threshold to 15-25% for early detection of stream quality degradation. This is especially important for IPTV services with a large number of channels.
What to do when triggered:
- Open the Input Monitoring dashboard
- Select the most problematic streams and examine error details
- Check the known errors list to diagnose specific problems
- lost_packets — network problems, need to improve the channel between source and server
- ts_cc (Continuity Counter) — packet loss in MPEG-TS, fix the network
- ts_scrambled — stream is encrypted, urgently resolve CAM module issues
- src_403/404/500 — source-side problems, change authorization or fix the source
- If errors are massive and uniform — this is a systemic problem, requires intervention in network or with provider
- Quality degradation leads to video artifacts for users — act quickly
How to Create an Alert¶
Step-by-Step Instructions¶
1. Select a server – At the top of the dashboard, choose the server where the alert should apply. Some alerts support the All option to apply across all servers.
2. Select streams – Required only for alerts related to specific streams (selected_stream_dead, selected_stream_flapping).
3. Choose an alert – In the Available alerts to create section, select the type of alert you want to create, based on the recommendations above.
4. Configure a contact point –
- For email, you can create a contact directly in the Create email contact point section
- For others (Telegram, Slack, etc.), use Grafana's Alerting → Contact points page
5. Enter an alert name – Provide a meaningful name reflecting the nature of the issue, for example: "Production Server CPU Critical" or "VIP Channels Down Alert".
6. Choose an evaluation interval (pending period) –
This is the time Grafana waits before triggering the alert once the condition is met.
Specify it in formats like:
10s(10 seconds) – for critical alerts on VIP streams30s(30 seconds) – for important alerts1m(1 minute – recommended default)5m(5 minutes) – for non-critical warnings1h(1 hour) – for trend monitoring
Recommendation: For server alerts (CPU, memory, disk), use 1m or 5m to avoid false positives on short-term spikes. For critical streams, you can use 30s or even 10s.
7. Confirm – If all parameters are correctly filled in, the alert will be created and added to the active alerts list
Create Email Contact Point¶
Allows you to create a contact point for sending alert notifications to email addresses. Convenient to use directly from the dashboard without going into Grafana settings.
How to Create an Email Contact Point¶
1. Contact name – Enter a unique name for the contact point. This name will appear in the recipient selection dropdown.
2. Email addresses – Enter one or more email addresses to receive alert notifications.
To enter multiple addresses, separate them with commas.
Example:
example@mymail.com,example2@mymail.com
After saving, the contact point will appear in the list and be available for selection when creating alerts.
Alerts List¶
Displays all created alerts. Allows you to view key information at a glance and manage existing alerts.
Displayed Information¶
Alert name – Unique name assigned during creation
Folder – The folder or group to which the alert belongs
Status – If the alert is paused, a paused label will appear
Managing Alerts¶
Each alert has a ✕ button. Clicking it will delete the alert.
More Details¶
For full details and alert logic, go to:
Grafana → Alerting → Alert rules
Contact Points List¶
Displays a list of all created contact points used for sending alert notifications. Allows you to review and remove any contacts.
Displayed Information¶
Type – The type of contact point (e.g., Email, Webhook, etc.)
Name – The unique name given during contact creation
Managing Contact Points¶
Each contact point has a ✕ button. Clicking it will delete the contact.
More Details¶
To view or configure contact points in Grafana, go to:
Grafana → Alerting → Contact points