Hosmer-Lemeshow
test of goodness-of-fit can be performed by using the lackfit option after the model statement. This test divides subjects into deciles based on predicted probabilities, then computes a chi-square from observed and expected frequencies.
It tests the null hypothesis that there is no difference between the observed and predicted values of the response variable.Therefore, when the test is not significant, as in this example, we can not reject the null hypothesis and say that the model fits the data well. We can also request the generalized R-square measure for the model by
using rsquare option after the model statement. SAS gives the likelihood-based
pseudo R-square measure and its rescaled measure.
Categorical Data Analysis Using The SAS System, by M. Stokes, C. Davis
and G. Koch offers more details on how the generalized R-square measures that
you can request are
constructed and how to interpret them.
proc logistic data = hsb2;
class prog(ref='1') /param = ref;
model hiwrite(event='1') = female prog read math / rsq lackfit;
run;
Wednesday, March 28, 2007
Thursday, March 22, 2007
Cumulation of the data
%let name=200311+200312+200401+200402+200403+200404;
%let first_d=1031101+1031201+1040101+1040201+1040301+1040401;
%let last_d=1031201+1040101+1040201+1040301+1040401+1040501;
options mprint mlogic;
%macro peulot;
data nohesh.netunim_200311_200404;delete ;run;
%do i=1 %to 6;
proc sql;
connect to teradata
(user=xxx password=123 tdpid=DWPROD);
create table nohesh.peulot_%scan(&name,&i,+) as
select * from connection to teradata
(select branch_cust_ip,
count(*) as peulot
from bo_vall.V0500_1_FINANCIAL_EVENT as a,
bo_vall.VBM845_FINANCIAL_EVENT_CUST as b
where event_start_date ge %scan(&first_d,&i,+)
and event_start_date lt %scan(&last_d,&i,+)
and a.event_id=b.event_id
group by 1
);
disconnect from teradata;
quit;
data nohesh.netunim_200311_200404;
set nohesh.netunim_200311_200404
nohesh.peulot_%scan(&name,&i,+);
run;
%end;
%mend;
%peulot;
%let first_d=1031101+1031201+1040101+1040201+1040301+1040401;
%let last_d=1031201+1040101+1040201+1040301+1040401+1040501;
options mprint mlogic;
%macro peulot;
data nohesh.netunim_200311_200404;delete ;run;
%do i=1 %to 6;
proc sql;
connect to teradata
(user=xxx password=123 tdpid=DWPROD);
create table nohesh.peulot_%scan(&name,&i,+) as
select * from connection to teradata
(select branch_cust_ip,
count(*) as peulot
from bo_vall.V0500_1_FINANCIAL_EVENT as a,
bo_vall.VBM845_FINANCIAL_EVENT_CUST as b
where event_start_date ge %scan(&first_d,&i,+)
and event_start_date lt %scan(&last_d,&i,+)
and a.event_id=b.event_id
group by 1
);
disconnect from teradata;
quit;
data nohesh.netunim_200311_200404;
set nohesh.netunim_200311_200404
nohesh.peulot_%scan(&name,&i,+);
run;
%end;
%mend;
%peulot;
Jarque-Bera hypothesis test of normality
Function JBTest(ReturnVector, SignificanceLevel)
Jarque-Bera hypothesis
test of normality:
' Andreas Steiner, March 2006
' http://www.andreassteiner.net/performanceanalysis
n = WorksheetFunction.Max(ReturnVector.Columns.Count, ReturnVector.Rows.Count)
ReturnVectorMean = WorksheetFunction.Average(ReturnVector)
ReturnVectorStDev = WorksheetFunction.StDev(ReturnVector)
' Normalize returns
ReDim NormalizedReturns(1 To n)
For i = 1 To n
NormalizedReturns(i) = (ReturnVector(i) - ReturnVectorMean) / ReturnVectorStDev
Next i
' Calculate 3rd and 4th moments (skewness and kurtosis)
S = 0
K = 0
For i = 1 To n
S = S + NormalizedReturns(i) ^ 3
K = K + NormalizedReturns(i) ^ 4
Next i
S = S / n
K = K / n - 3
JB = n * ((S ^ 2) / 6 + (K ^ 2) / 24)
pValue = WorksheetFunction.ChiDist(JB, 2)
JBTest = (SignificanceLevel < pValue)
End Function
Function JBCriticalValue(ReturnVector, SignificanceLevel)
' Jarque-Bera hypothesis test of normality.
'
' Andreas Steiner, March 2006
' http://www.andreassteiner.net/performanceanalysis
JBCriticalValue = WorksheetFunction.ChiInv(SignificanceLevel, 2)
End Function
Function JBpValue(ReturnVector, SignificanceLevel)
' Jarque-Bera hypothesis test of normality.
'
' Andreas Steiner, March 2006
' http://www.andreassteiner.net/performanceanalysis
n = WorksheetFunction.Max(ReturnVector.Columns.Count, ReturnVector.Rows.Count)
ReturnVectorMean = WorksheetFunction.Average(ReturnVector)
ReturnVectorStDev = WorksheetFunction.StDev(ReturnVector)
' Normalize returns
ReDim NormalizedReturns(1 To n)
For i = 1 To n
NormalizedReturns(i) = (ReturnVector(i) - ReturnVectorMean) / ReturnVectorStDev
Next i
' Calculate 3rd and 4th moments (skewness and kurtosis)
S = 0
K = 0
For i = 1 To n
S = S + NormalizedReturns(i) ^ 3
K = K + NormalizedReturns(i) ^ 4
Next i
S = S / n
K = K / n - 3
JB = n * ((S ^ 2) / 6 + (K ^ 2) / 24)
JBpValue = WorksheetFunction.ChiDist(JB, 2)
End Function
Function JBStat(ReturnVector, SignificanceLevel)
' Jarque-Bera hypothesis test of normality.
'
' Andreas Steiner, March 2006
' http://www.andreassteiner.net/performanceanalysis
n = WorksheetFunction.Max(ReturnVector.Columns.Count, ReturnVector.Rows.Count)
ReturnVectorMean = WorksheetFunction.Average(ReturnVector)
ReturnVectorStDev = WorksheetFunction.StDev(ReturnVector)
' Normalize returns
ReDim NormalizedReturns(1 To n)
For i = 1 To n
NormalizedReturns(i) = (ReturnVector(i) - ReturnVectorMean) / ReturnVectorStDev
Next i
' Calculate 3rd and 4th moments (skewness and kurtosis)
S = 0
K = 0
For i = 1 To n
S = S + NormalizedReturns(i) ^ 3
K = K + NormalizedReturns(i) ^ 4
Next i
S = S / n
K = K / n - 3
JBStat = n * ((S ^ 2) / 6 + (K ^ 2) / 24)
End Function
EXCEL FUNCTION
Jarque-Bera hypothesis
test of normality:
' Andreas Steiner, March 2006
' http://www.andreassteiner.net/performanceanalysis
n = WorksheetFunction.Max(ReturnVector.Columns.Count, ReturnVector.Rows.Count)
ReturnVectorMean = WorksheetFunction.Average(ReturnVector)
ReturnVectorStDev = WorksheetFunction.StDev(ReturnVector)
' Normalize returns
ReDim NormalizedReturns(1 To n)
For i = 1 To n
NormalizedReturns(i) = (ReturnVector(i) - ReturnVectorMean) / ReturnVectorStDev
Next i
' Calculate 3rd and 4th moments (skewness and kurtosis)
S = 0
K = 0
For i = 1 To n
S = S + NormalizedReturns(i) ^ 3
K = K + NormalizedReturns(i) ^ 4
Next i
S = S / n
K = K / n - 3
JB = n * ((S ^ 2) / 6 + (K ^ 2) / 24)
pValue = WorksheetFunction.ChiDist(JB, 2)
JBTest = (SignificanceLevel < pValue)
End Function
Function JBCriticalValue(ReturnVector, SignificanceLevel)
' Jarque-Bera hypothesis test of normality.
'
' Andreas Steiner, March 2006
' http://www.andreassteiner.net/performanceanalysis
JBCriticalValue = WorksheetFunction.ChiInv(SignificanceLevel, 2)
End Function
Function JBpValue(ReturnVector, SignificanceLevel)
' Jarque-Bera hypothesis test of normality.
'
' Andreas Steiner, March 2006
' http://www.andreassteiner.net/performanceanalysis
n = WorksheetFunction.Max(ReturnVector.Columns.Count, ReturnVector.Rows.Count)
ReturnVectorMean = WorksheetFunction.Average(ReturnVector)
ReturnVectorStDev = WorksheetFunction.StDev(ReturnVector)
' Normalize returns
ReDim NormalizedReturns(1 To n)
For i = 1 To n
NormalizedReturns(i) = (ReturnVector(i) - ReturnVectorMean) / ReturnVectorStDev
Next i
' Calculate 3rd and 4th moments (skewness and kurtosis)
S = 0
K = 0
For i = 1 To n
S = S + NormalizedReturns(i) ^ 3
K = K + NormalizedReturns(i) ^ 4
Next i
S = S / n
K = K / n - 3
JB = n * ((S ^ 2) / 6 + (K ^ 2) / 24)
JBpValue = WorksheetFunction.ChiDist(JB, 2)
End Function
Function JBStat(ReturnVector, SignificanceLevel)
' Jarque-Bera hypothesis test of normality.
'
' Andreas Steiner, March 2006
' http://www.andreassteiner.net/performanceanalysis
n = WorksheetFunction.Max(ReturnVector.Columns.Count, ReturnVector.Rows.Count)
ReturnVectorMean = WorksheetFunction.Average(ReturnVector)
ReturnVectorStDev = WorksheetFunction.StDev(ReturnVector)
' Normalize returns
ReDim NormalizedReturns(1 To n)
For i = 1 To n
NormalizedReturns(i) = (ReturnVector(i) - ReturnVectorMean) / ReturnVectorStDev
Next i
' Calculate 3rd and 4th moments (skewness and kurtosis)
S = 0
K = 0
For i = 1 To n
S = S + NormalizedReturns(i) ^ 3
K = K + NormalizedReturns(i) ^ 4
Next i
S = S / n
K = K / n - 3
JBStat = n * ((S ^ 2) / 6 + (K ^ 2) / 24)
End Function
EXCEL FUNCTION
Descriptive statistics
Measures of Skewness and Kurtosis
Skewness is a measure of symmetry, or more precisely, the lack of
symmetry. A distribution, or data set, is symmetric if it looks the
same to the left and right of the center point.
Subscribe to:
Posts (Atom)