"rrdtool create" builds a database. It does not read your mind.
Before you can tell RRDtool what to create, you must first determine what you need and when you need it.
This example is not so much about cut and paste, it is to teach you how you could translate your idea into a working setup.
Have a close look at the numbers you are going to give to RRDtool later on. What do those numbers represent? How do they change?
A common example is a counter which is kept by a network device, showing number of octets (bytes) in or out the device. This counter is started at some point, and continues to increase. In this case, you are interested in the difference, the delta, between two moments in time. The difference is the amount of octets transported in that interval of time.
A similar but less often seen example would be the counter on an electricity meter. This too is an ever increasing counter. It results in the amount of kWh used in an interval of time. However, in this case you should recognize that kWh is just another way of counting Joules, something you really want to know later on in the process.
A completely other kind of input would be a speedometer, the device in a car which shows how fast you are moving. The number you get is already a rate. Even if you get it in mph or in km/h, it is just a number of meters per second, albeit given in a different unit, similar to kWh vs. Joules.
Eventually, RRDtool will transform the input it gets into a rate and will normalize these rates so that they fit in well defined time intervals. There's nothing you can do about this, it is how it works. See Rates, normalizing and consolidating for a more elaborate explanation of this.
There's no problem if you "abuse" RRDtool to work with data which is not a rate, such as temperature. Just remember that RRDtool will think it is a rate, and give it to RRDtool as if it is already a rate. That means counter type GAUGE.
There are many different examples to think of. If you have an example which will be of general interest, or if you are willing to pay for my time, do forward it to me and I will work it out here.
DST | when to use |
---|---|
GAUGE | The input is a rate, e.g. m/s, or should be treated as a rate, e.g. temperature |
COUNTER | The input is an ever increasing number, e.g. an octet counter in a router. RRDtool should compute the difference between the last update and the current one, and divide it by the amount of time lapsed. |
DERIVE | This is similar to COUNTER, except that the input can decrease. This is useful for instance in a kWh meter when you produce more solar power than you use. In that case you actually do get a negative rate. |
ABSOLUTE | This is to be used when the counter is reset every time it is read, when people start counting from zero, and so on. The main difference between ABSOLUTE and GAUGE is that the input is not yet a rate, it should first be divided by time to get a rate. The main difference between ABSOLUTE and COUNTER is that RRDtool should not use the previous input value to compute its delta. |
Once you've figured out how to process the input, you need to figure out how you want to store the computed rates. There are several things to look at. First of all you need to know how much time you want to be able to look at. Also very important is how you want to be able to look at this.
Say you want to be able to look back a year, you still need to know if you want to be able to "zoom in" or if you only want to look at the big picture. In other words: if now is March 1st, 2009, do you want to look at 2007-03-01 until 2009-03-01 or do you want to be able to look at 2007-03-01 midnight to next midnight.
What you need to understand here is consolidation. Say that you will be looking at two years worth of information, and that the available data is in a resolution of 300 seconds per bucket. This means you have more than 200,000 buckets. If you are going to display this in an image of 400 pixels wide, 500 of such buckets need to fit in on column of pixels. Keeping those 200,000+ buckets is not just a waste of space. If RRDtool needs to make all those buckets fit on the graph, it needs to do work. Depending on the processing capabilities of your http server, this may mean a delay in viewing your graphs.
You can instruct RRDtool to keep historic data in a consolidated way, suitable for displaying without the delay just mentioned. This requires some planning, I will elaborate further on. But what if you want to be able to zoom in? No problem, you can tell RRDtool to also keep the data in its original bucket size (e.g. those 300 seconds). Or, if you choose so, you can tell RRDtool to keep the data only in that original bucket size. It's up to you. Just know that it is difficult to change your mind once you've built your database. In many cases you will have to start all over if you do.
RRDtool uses RRAs (RRDtool Archives) to store data. Each of these RRAs is independant of the others, you can have one which stores data in a 300-second resolution and another one which stores data in a 86400-second resolution. If you want, you can each of these examples contain the same amount of time, no problem. It is just a matter of how many rows you generate in each RRA.
Say you will be generating images where the inner graph area is 360 pixels wide. It is generally best if you setup your RRD so that one of its RRAs matches the resolution on screen. You can do this (in the designing stage!) by adjusting the size of each bucket, or you can carefully plan start and end times. If you have an RRA which stores data in a 86400-second resolution, and you display 360 days of information, this is a nice fit. Display 180 days and each day will be 2 columns wide, not much of a problem. Display 720 days and RRDtool still needs to make 2 buckets fit in one pixel.
Consider looking at network statistics, so that you can know how much data is transported, which helps you to determine if it is time to expand your network capacity.
Network counters are probably going to be ever increasing numbers (until a counter wrap occurs) so that part is easy: use COUNTER. Also, quite often they come in pairs: inbound and outbound (as seen from the device). You are going to query the device roughly every 5 minutes but you give yourself (and your schedular) some slack. However, there's a limit to how much slack you are prepared to give. If updates are further apart than 10 minutes, something went wrong and you can't rely on the accuracy of the network counters. An outage may have occured, resulting in a reboot of the device. You know the device won't ever transport more than 100Mbps, so any rate higher than that is the result of some unknown fault somewhere in an unknown place. You don't know why it would happen (if at all), you just never want it to show up. It's just like a safety net.
ds-name | ds0 | one of the counters |
---|---|---|
DST | COUNTER | Data Source Type |
heartbeat | 600 | Those 10 minutes |
min | 0 | Rate no less than 0 |
max | 12500000 | Rate no more than 12,500,000 bytes per second, which is 100,000,000 bits per second |
The other counter is just the same, except for its name. Give the following to rrdtool create:
Say you want to be able to display the last 2 years, the last 2 months, the last 2 weeks and the last 2 days. The database uses the default step size of 300 seconds per interval.
First thing I notice: each time "last" is mentioned. This means no extra rows need to be present to allow zooming in in the past. For example, there's no need to keep data in a 300-second resolution for the entire two years.
I also immediately notice the use of year, month, week and day. These are not fixed intervals of time. A year can be 365 or 366 days. A month can be 28 days, 29 days, 30 days, 31 days, it can even be one hour less or one hour more, or 30 minutes less/more, depending on how daylight saving works out for you. Similarly a week is not always 7 days and a day is not always 24 hours. This is unworkable, and for the purpose of this example it is also unnecessary. This means I can, should and will modify the request to show the last 720 days, the last 60 days, the last 14 days and the last 2 days all based on UTC time (with no daylight saving to consider).
Now it is time to determine the width of each graph. First thing to do is look at the amount of time. With a step size of 300 seconds, one day equals 288 steps.
last 720 days | 720*288 = 207360 steps |
last 60 days | 60*288 = 17280 steps |
last 14 days | 14*288 = 4032 steps |
last 2 days | 2*288 = 576 steps |
I could decide that each graph should be 576 pixels wide, resulting in the following amount of steps per pixel column:
720 days | =720*288 | =360 * 576 steps |
60 days | = 60*288 | = 30 * 576 steps |
14 days | = 14*288 | = 7 * 576 steps |
2 days | = 2*288 | = 1 * 576 steps |
It does not always turn out to be such a nice fit. Decide for yourself what you like more: look at more (or less) time than initially planned, or have different image widths. Whatever you do, make sure the numbers are whole numbers. Not because RRDtool needs it (it doesn't) but because it makes life easier.
The numbers 360, 30, 7 and 1 are the amount of steps to fill in when creating each RRA. The number 576 is the amount of rows to fill in. This leaves two other parameters: CF and xff, which I explain shortly. Give this to RRDtool:
MIN, AVERAGE and MAX determine how RRDtool should consolidate multiple rates into one. More on this in Rates, normalizing and consolidating if you're interested. Do you notice I have 3 MIN and MAX RRAs and 4 AVERAGE? This is because the minimum, average and maximum of just one rate will always be the same. I only need one of them, not all three.
XFF, the X-Files Factor, got its name because it is unscientific if you have it set to any other number than zero. It has to do with unknown data and how this is processed. What is the average of {1,1,unknown,1}? The one and only true answer is: "unknown". Still, many people want answer "1". XFF determines how much of the original data may be unknown and still produce 1 (or any other rate). A common value is 0.5 which means {1,unknown,unknown,1} results in 1 but {1,unknown,unknown,unknown} does not. For this example, 0.5 would be suitable but if you are using the data for billing purposes it would not be. Consider average{100,100,100,100,U,U,U}. With XFF set to 0.5, this would be 100 on average. Chances are the unknowns are the result of a connection problem, in which case it would be unfair to bill your customer for a rate of 100 during those unknown intervals. For capacity planning on the other hand, you would most likely have seen a rate of 100 when no outage would have occured. Then again, you may want to know there was an outage, even when looking at the graph showing 2 years.
"rrdtool create" will also accept a start time. This is only important if you have historic data which you are going to import. If you do not do this, just skip the parameter and RRDtool will do the right thing. If you are going to import historic data, set this to slightly before the oldest data you're going to import.
Other parameters are step size (300 seconds by default) and the name of the file to create.
The entire example is now finished. This is the command to give to RRDtool, for this particular example case:
I like to write my scripts slightly different. I use a unix shell, and write my script like so:
rrdtool create database.rrd \ DS:ds0:COUNTER:600:0:12500000 \ DS:ds1:COUNTER:600:0:12500000 \ RRA:MIN:0:360:576 \ RRA:MIN:0:30:576 \ RRA:MIN:0:7:576 \ RRA:AVERAGE:0:360:576 \ RRA:AVERAGE:0:30:576 \ RRA:AVERAGE:0:7:576 \ RRA:AVERAGE:0:1:576 \ RRA:MAX:0:360:576 \ RRA:MAX:0:30:576 \ RRA:MAX:0:7:576
This makes no difference, when the shell fires up rrdtool, it gets to see the same command (maybe with some more whitespace, I don't even know).
Do you like this information? Tell others! Don't you? Tell me!
This page was created by Alex van den Bogaerdt, an independent IT consultant. If you want to provide feedback or if you want to hire me please see