graphite - Tool for monitoring QOS -


In my project

  1. We crawl the x number of the server.
  2. The number of users for each server varies from 1 to
  3. We crawl z items from 1 to each user.

Currently we are monitoring QOS using graphite. We take the time to crawl the item. The problem with this approach is that if only one user is affected, then we get false warnings about QOS.

What will be the right tools / techniques to answer / monitor the following points:

  1. The warning only occurs when minimum K user be impressed. [No number of events]
  2. List of affected users

I think graphite and statistics are not the right tool for this. What would be a better tool to answer those two questions?

What are you asking for, often called service monitoring? For many good reasons, Want to know the effect, rather than an event happening

The benefits of this approach are exactly as you like in your needs - you can focus on events that affect a large part of your user base and you have a list of immediate users affected.

The main drawback, IMHO, is that service monitoring is usually more complicated than simple performance or event / alerts. It often depends on a service model, which is something in my experience that is hard to build and even harder to maintain till date. For example, if a server in your system is slow or shows failure, depending on your architecture, it can affect all users who use dependent services on that server, or a load instead Upon balancing system or redundancy mechanism, it can affect a very small subset or even start none.

You will need to reflect this architecture in your service monitoring model, and every time you update your system architecture or deployment, you have to change it.

If your system is sufficiently stable or important enough, it may be worth your time to assure the investment. If not, a simple agreement is just being done to update and alert the graphing, when a certain number of user or average response time on all users on the server increases with a significant amount.

This can give you more profits without investing in the extra complexity of a service monitoring solution.

If you are definitely looking to expand your monitoring approach and you open source tools, then I will start seeing Nagaos if your focus is on the infrastructure, or something like free tires Web service monitoring solutions such as pingdom:


Comments

Popular posts from this blog

import - Python ImportError: No module named wmi -

Editing Python Class in Shell and SQLAlchemy -

c# - MySQL Parameterized Select Query joining tables issue -