RRDtool stores rates during time intervals. Sometimes you do not want to know the rate (how many bytes per second), you just want to know the total amount of bytes. This can be done, no problem.
Please make sure you understand how RRDtool normalizes and consolidates its data. This is explained on my RRDtool - Rates, normalizing and consolidating page. That page already covers most of what you need to know about this subject. You are expected to know its contents in order to understand this page.
I'm going to use bytes per second in my explanation. This is just for explaining, you could easily do the same examples with meters per second, or messages per second. As a matter of fact, RRDtool doesn't even know about its input.
Each row in an RRA represents an amount of time and a rate. The rate is measured in bytes per second, the time in seconds. Some easy math shows that unless the amount of time is zero, you can get rid of it: (byte/second)*second = (byte*second)/second = byte*(second/second) = byte*1=byte.
(If "second" is zero, then "second/second" is undefined. This is why it doesn't work when second equals zero.)
When dealing with other rates, it works similar. If you have multiplied the input by 3600 (for instance to get messages per hour) then the result of the computation in the paragraph above is also 3600 times what it should be. Remember that when you continue...
You now know how you can compute the amount of data using one single row from the RRA. Each row in an RRA used in fetch or graph represents the same amount of time. You won't receive rows from multiple RRAs, possibly having another amount of time per row. This makes it easy to do computations. In the next formula, "r" represents a rate, "t" represents an amount of time: total_amount_of_bytes = r*t. Now do the same computation or multiple rows: total_amount_of_bytes = r1*t1 + r2*t2 + r3*t3 ... rn*tn. Because each amount of time is the same, we can replace each time by one single time: total_amount_of_bytes = r1*t + r2*t + r3*t ... rn*t. This can be rewritten again: total_amount_of_bytes = t*(r1+r2+r3 ... rn).
This total amount of bytes was sent during the total amount of time. This total amount of time is (t1+t2+t3 ... tn). The total amount of bytes, divided by the total amount of time, gives us a rate again: the average rate.
Now comes the nice part: RRDtool can compute the average for you. You can know the average, you can know the amount of time, therefore you can know the amount of bytes!
This is how it works: RRDtool has a function to print the average. This alone is of no use, as it is average multiplied by time what you are interested in. You cannot multiply inside the print statement. However, you can influence the data from which the average is calculated!
The average is computed similar to what is described in the previous chapter. Average = total_bytes/total_time. Now, if you multiply the amount of bytes by the amount of time, you get: Average = (total_bytes*total_time)/total_time = total_bytes * (total_time/total_time) = total_bytes. This is exactly what we wanted. You cannot multiply the total amount of bytes directly but you can multiply the data from which it is built.
Multiplying the data by the amount of time is easy, provided that you know the amount of time. This isn't as hard as it seems but you will need to be careful. When you create a graph, you are specifying start and/or end and/or duration (two out of three). Maybe you are relying on defaults but that is still specifying times. When you choose these times carefully, RRDtool will not have to adjust them. Remember, RRDtool will always work with fixed amounts of time. If each interval is 300 seconds, you cannot have start and/or end times that are not a multiple of 300 seconds. If necessary, RRDtool will shift them, thereby enlarging the total amount of time. The other way around: if you do specify good values for start and end, RRDtool does not have to move them around. Now you do know the amount of time, which is what you need to know to compute the total amount of data using the average.
Suppose the amount of time is 144000 seconds (480 times 300). Suppose your data source is named "ds0". You can display the total amount in your graph using the following statements:
This will multiply each row in your RRA by 144000, then compute the average from it as described above and print it. The average is therefore 144000 times to high and this "happens" to be the amount of time.
Life so far has been easy. We wanted the total amount of bytes transfered and got it. Or did we?
The problem is that not every row in an RRA is filled with a known rate. For several reasons, data can become unknown. These unknown rates are not taken into consideration by the averaging computation. They are simply discarded and therefore the amount of time used in average calculation is not necessarily the same as what we entered in our CDEF.
The nett result is that whenever a rate is unknown, our computation considers it to be the same as the average. This is especially noticeable when large amounts of data are unknown, such as when you first setup your database. In this case the workaround is to not specify a start time that falls before the real start of your monitoring. But when such unknown intervals fall in between normal data, you can't do that.
Discarding unknown data is what bothers us. So, make sure there is no unknown data. Unless of course you actually like the behaviour, after all it isn't that bad an estimate. If you do want to get rid of unknown data, you will have to make a choice. You could alter unknown data into zero: this would result in a lower average (same amount of data, divided by a larger amount of time, resulting in a lower average). Or you alter unknown data into a certain maximum, thereby increasing the average and thus the total amount. Whichever you choose is up to you.
Altering unknowns isn't hard. Just modify the CDEF we already have:
This will alter any unknown into zero, then multiply as we did before. You can of course alter that zero into any number you need. This RPN expression means: For each of the values in array ds0: if the value of ds0 is unknown, use zero, else use the value of ds0 multiplied by 144000. Store the resulting value in array ds0total.
In RRDtool 1.2 I made a start with VDEF processing. VDEFs aren't arrays like CDEFs are, they are single values. The work is far from finished, however certain functions are already usable.
The function you really want is called TOTAL. For every row in the RRA that doesn't have an unknown rate, it takes that rate, multiplies it by time and increases its total counter. In other words, it performs the calculation that was described in the first chapter. It also does discard unknowns but, contrary to the "old" method, it also discards the time component in that case.
There's no need to know about start and end times. The TOTAL function looks at every row used and computes the amount of time itself. This makes life really easy.
Also new in RRDtool 1.2 is the ability to print VDEFs. This means it is quite easy to print our amount:
Notice how the print command doesn't have a consolidation function anymore. You need to specify one if you print DEF or CDEF results and you must leave it out if you print VDEF results.
If you do like the behaviour described in the previous chapter, where unknown intervals are computed as if they had the average rate, you need to alter unknowns into average rates. Here's how to do that:
This computes the average rate, then uses this to cover up unknowns. The modified array is then used to compute the total amount of data.
This completes the tutorial on total amount of data. I hope you find it useful.
Do you like this information? Tell others! Don't you? Tell me!This page was created by Alex van den Bogaerdt, an independent IT consultant. If you want to provide feedback or if you want to hire me please see .