I had to implement a job that involved saving the Oracle performance stats to CSV files on a disk. It was an interesting problem to say the least because speed was a consideration and obviously any technology could be choosen. But I choose python, mainly because python (along with most common libraries) comes automatically shipped on most commercial linux servers.
The full souce code is available [at this link] (https://bitbucket.org/varmarakesh/fullstack/src/0c08ed6ea9e8a7c02ba702775725f60e37dddde4/python/awr/?at=master)
AWR (automatic workload repository) collects and stores all runtime stats generated in Oracle database. It is an effective tool used by database community to troubleshoot and address performance problems. In short all the data related to application sql’s, connections, resource usages, etc is stored in Oracle and is accessible thru a number of views (DBA_HIST_). Since this data tends to be historical, it is grouped by a certain interval and it is known as snapshot.
This gives the list of snapshot Id’s that would group all the stats generated today. Usually the interval time would be every 30 min or 1 hour.
Once there is the list of snapshot Id’s, we can get the runtime stats for all the sql’s in the specific snapshot using this query like this.
#Save Results to a CSV File
So, basically need to get the details snapshot details iteratively and save the results to a local CSV File. I used the csv module in python for it.
So, the need is issue Oracle queries multiple times (once every snapshot) and save the results in multiple CSV files (once file per snapshot). This is a classic IO bound problem, because, the program issues a database call, waits for the response, save the data to disk, then makes another call and so on. So, I implemented threading and you can in the end that the performance gain using threading is pretty amazing.
So, get a list of snapshots, then spawn a thread for each snapshot. Each thread spawn invokes a method called save_awr to get the snapshot details and saves the results to a file.
This program saves the results to file system.
If multi-threading could not be implemented, this is how a single threaded version would like.
Elapsed time differences between multi-threading and single threaded versions are very stark because this is a classic example of an IO bound problem. Oracle connections and saving the AWR data to disk are two IO bound tasks. Saving the AWR data is very IO intensive given the volume of data. These are the results when I ran this script against my database.
@timefn:generate_awr_multithreads took 58.2649428844 seconds
@timefn:generate_awr_singlethread took 361.62610507 seconds
Please check the source at https://bitbucket.org/varmarakesh/fullstack/src and let me know the feedback. Anybody that has an Oracle client installed, the setup is quite minimal and it will work without issues.