Why You Need Histograms ??

It is always recommended we have a good table statistics to get a good explain plan which in turn provides us better execution

But do you think almost perfect (+ or – 5% change) stats always gives good plan ?

You will be surprised to see the difference it makes

I have a below table with more or less good stats

Lets check the count on the table :

SQL> select count(*) from NMANOJ.TEST_TABLE;
COUNT(*)
————
178183

SQL> select num_rows,last_analyzed from dba_tables where table_name=’TEST_TABLE’;

NUM_ROWS LAST_ANALAYZED
—————— ——————-
173052 05-NOV-19

We see a difference of 5000 rows difference between the actual rows and the statistical count

We use autotrace traceonly explain , which doesn’t actually runs the command instead it predicts the plan only with the help of existing stats


Here you can see the oracle is predicting the number of rows to be returned would be 2834

How do we land up to this number without executing the query just depending on the statistics

Oracle has a concept called NDV (Number of Distinct Values) , In this scenario we are executing the query with condition CALC_RESULT=1

The Number of rows on the table is 173052 and oracle thinks the number of distinct values on column CLAC_RESULT to be 61

The expected number of rows to be returned by the given condition =NUmberof Rows/NDV

No of rows=173052/61 which is close to 2836

Lets check the actual value for the given condition

Now we have almost perfect Stats but still we still we see a huge difference in rows predicted and rows retrieved , The reason being is we don’t have any idea how the data is distributed among the distinct values , Which value is repeated more and which value is not repeated much

Lets gather stats again

We will see the below Magic by collecting Stats

Lets try explain plan again

This is almost close to the actual rows much better compared to what we had earlier 2834

Note that there is no change in plan ,except you get accurate rows fetched by the statement

What changed before and after Stats 🙂

We can see the Histogram is created on the Column CALC_RESULT automatically by Oracle while stats was gathered because the NDV is less compared to Number of rows

Histogram helps to understand how the data is distributed on a column where predicate(where condition) is used very frequently

Practical Example :

Running a sales report on a table with million rows with where condition on columns like CITY or Pincode

Lets see more details on upcoming blogs how we get almost exact value 🙂

3 thoughts on “Why You Need Histograms ??

Leave a comment