How to Edit Public Incidents on KloudFox

Knowledgebase

Why Do We Need Incident Management and how to edit the public incidents on Kloudfox?

Step One: Detecting the Incident

The foundation of effective incident management lies in a centralized source of truth that integrates various monitoring and reporting tools into a single, easily navigable platform. Tools like kloudfox.com enable support and other teams to collaborate seamlessly in detecting, communicating, and resolving incidents.

Initiating an Incident

Incidents can be initiated in two ways within incident management solutions:

  1. Automatic Monitoring Integration: For example, an uptime monitor might create an incident when it detects that the homepage is unavailable.
  2. Manual Reporting: For instance, a customer might submit a support ticket reporting that their profile isn't loading correctly.

Automatic Monitoring for Incidents

When incidents are automatically reported, the incident management solution logs an incident once an error is detected by a monitor. The current on-call team member is then alerted, usually through automated email notifications for most incidents, while Slack, Microsoft Teams, and email alerts are used for less critical issues. Severity levels are often assigned to incidents to facilitate easier communication.

Manually Reported Incidents

For manually reported incidents, the on-call person is alerted by other team members, often from support or customer success teams. Before escalating a manually reported incident, it's essential to verify if the issue is due to a system failure or a client-side misconfiguration. This prevents unnecessary alerts and alert fatigue.

If an incident is a false positive or requires public updates, kloudfox.com can assist in updating the public incident status with accurate information by following these steps:

  1. Navigate to incidents.

  1. Select the incident to update or delete.
  2. Modify the title, description, and timing of the incident.

  1. Save the changes to make the updated incident available on the public status page.

Communicating with Stakeholders

After detecting and logging an incident, it’s crucial to communicate it both internally and externally. Effective incident communication involves not just acknowledging the incident but also providing updates during the investigation and resolution process.

A best practice is to use a status page for centralized communication, allowing both internal (password-protected pages with email subscriptions) and external (public status page) updates.

Internal Communication

Internal communication includes informing any teams within the company affected by the incident, such as sales teams giving demos of non-functioning products or marketing teams directing traffic to a downed landing page. The goal is to align company operations to minimize resource loss.

External Communication

External communication helps in saving customer support resources and maintaining customer trust. By establishing a status page as the go-to source for incident information and providing clear updates, customers are less likely to bombard support with queries and may appreciate the transparency.

Essential Tools for Incident Management

  1. Monitoring: Tools like Prometheus (open-source) or kloudfox uptime (commercial) are essential to detect system issues.
  2. Incident Tracking: A centralized incident management tool to track incidents across services.
  3. On-Call Scheduling and Alerting: Reliable on-call alerting with scheduling capabilities to ensure the right person is always alerted.
  4. Chat Room: Platforms like Slack or Microsoft Teams for timestamped, real-time communication during incidents.
  5. Video Call: Tools like Zoom or Around for rapid response calls with team members.
  6. Status Page: For communicating incident updates both externally and internally.
  7. Documentation Tool: To centralize postmortems and use them as learning resources for future incidents.
Thekloudfox

10 months ago

No votes yet