Thursday, February 08, 2007

Tuesday, February 06, 2007

SAS/Excel Tricks

ods noresults;
ods listing close;
ods html body="c:\temp\classods.xls";
proc print data=sashelp.class(obs=10);
run;
ods html close;
ods html body="c:\temp\shoesods.xls";
proc print data=sashelp.shoes(obs=10);
run;
ods html close;
ods html body="c:\temp\zipcodeods.xls";
proc print data=sashelp.zipcode(obs=10);
run;
ods html close;
ods listing;
ods results;



Macro to Combine Worksheets:



%macro many2one(in=,out=);
options noxwait;
x erase "&out";
options xwait;

data _null_;
file "c:\temp\class.vbs";
put 'Set XL = CreateObject("Excel.Application")' /
'XL.Visible=True';
%let n=1;
%let from=%scan(&in,&n," ");
%do %while("&from" ne "");
%let fromwb=%scan(&from,1,"!");
%let fromws=%scan(&from,2,"!");
put "XL.Workbooks.Open ""&fromwb""";
%if &n=1 %then
put "XL.ActiveWorkbook.SaveAs ""&out"", -4143"%str(;);
%else %do;
put "XL.Workbooks(""%scan(&fromwb,-1,'\')"").Sheets(""&fromws"").Copy ,XL.Workbooks(""%scan(&out,-1,'\')"").Sheets(%eval(&n-1))";
put "XL.Workbooks(""%scan(&fromwb,-1,'\')"").Close";
%end;
%let n=%eval(&n+1);
%let from=%scan(&in,&n, " ");
%end;
put "XL.Workbooks(""%scan(&out,-1,'\')"").sheets(1).activate";
put "XL.Workbooks(""%scan(&out,-1,'\')"").Save";
put "XL.Quit";
run;

x 'c:\temp\class.vbs';
%mend;

Example:

%many2one(in=c:\temp\classods.xls!classods
c:\temp\shoesods.xls!shoesods c:\temp\zipcodeods.xls!zipcodeods,
out=c:\temp\combined.xls);

sas-excel-tricks

Stratified Random Sampling

Stratified Random Sampling, also sometimes called proportional or quota random sampling, involves dividing your population into homogeneous subgroups and then taking a simple random sample in each subgroup. In more formal terms:



Objective: Divide the population into non-overlapping groups (i.e., strata)
N1, N2, N3, ... Ni,
such that N1 + N2 + N3 + ... + Ni = N.
Then do a simple random sample of
f = n/N in each strata.




There are several major reasons why you might prefer stratified sampling over simple random sampling. First, it assures that you will be able to represent not only the overall population, but also key subgroups of the population, especially small minority groups.If the subgroup is extremely small, you can use different
sampling fractions (f) within the different strata to randomly over-sample the small group although you'll then have to weight the within-group estimates using the sampling fraction whenever you want overall population estimates). When we use the same sampling raction within strata we are conducting proportionate stratified random sampling.
When we use different sampling fractions in the strata, we call this disproportionate stratified random sampling. Second, stratified random sampling will generally have more statistical precision than simple random sampling. This will only be true if the strata or groups are homogeneous. If they are, we expect that the variability within-groups is lower than the variability for the population as a whole. Stratified sampling capitalizes on that fact.

Probability Sampling



A probability sampling method is any method of sampling that utilizes some form of random selection. In order to have a random selection method, you must set up some process or procedure that assures that the different units in your population have equal probabilities of being chosen.



Some Definitions :


N = the number of cases in the sampling frame
n = the number of cases in the sample
NCn = the number of combinations (subsets) of n from N
f = n/N = the sampling fraction

Many computer programs can generate a series of random numbers.
After that you have rearrange the list in random order from the lowest to the highest random number. Then, all you have to do is take the first hundred names in this sorted list.Simple random sampling is not the most statistically efficient method of sampling and you may, just because of the luck of the draw, not get good representation of subgroups in a population

Sunday, February 04, 2007

The way of creating the same distribution

Sometimes We want to create the same distribution .
We can do it in this way (The ttt limits the commulative distribution of another variable,which we can receive from proc freq )

DATA resh2;
SET halvaot.resh2;
IF TARGET =0 THEN DO;
ttt=ranuni(31311115)*100;
if ttt<=5 then kod=200502;
else if ttt<=28 then kod=200503;
else if ttt<=40 then kod=200504;
else if ttt<=52 then kod=200505;
else if ttt<=62 then kod=200506;
else if ttt<=71 then kod=200507;
else if ttt<=80 then kod=200508;
else if ttt<=88 then kod=200509;
else if ttt<=93 then kod=200510;
else if ttt<=100 then kod=200511;
end;
if target=1 then kod=100* year( Loan_Value_Date)+month(Loan_Value_Date);
run;


This is more wise way to do the same:



proc freq DATA=halvaot.new_halv;
tables Loan_Value_Date /out=outkod outcum noprint;
run;

data _null_ ; length kod_str pct_str $5000;
set outkod end=eof;
retain kod_str pct_str;
kod_str=compress(kod_str||','||Loan_Value_Date);
pct_str=compress(pct_str||','||cum_pct);
if eof then do;
call symput('a1',substr(pct_str,2));
call symput('a2',substr(kod_str,2));
call symput('nn',_n_);
end;
run;

DATA resh222;
SET halvaot.resh2;
array a1 {&nn} _temporary_ (&a1);
array a2 {&nn} _temporary_ (&a2);
IF TARGET =0 THEN DO;
ttt=ranuni(31311115)*100;
do i=1 to dim(a1);
if i=1 then do;
if ttt<=a1[i] then Loan_Value_Date=a2[i];
end;
else do;
if a1[i-1]<ttt<=a1[i] then Loan_Value_Date=a2[i];
end;
end;
end;

kod=100* year( Loan_Value_Date)+month(Loan_Value_Date);
run;

Saturday, February 03, 2007

Factor Analyses with Sas

In the beginning of the process we have to transpose the data in order to receive one column to each category.
proc transpose data =jjj1 out=toz prefix=peul;
by branch_cust_ip;
id Event_Costing_Activity_Type_Co;
var count;
run;

We want fill missing values with 0:


data toz;
set toz;
array toz{*} _NUMERIC_ ;
do i = 1 to dim(toz);
if toz{i} = . then toz{i} = 0;
end;
drop i;
run;

Definition of factors:
proc factor score data=rehishot.ishit method=p rotate=orthomax nfactors=10 outstat=fact_ish;
var peul: ;
run;
Scoring of the data:
proc score data=rehishot.ishit score=fact_ish out=scores_ishit;
var peul: ;
run;

In the end we want to find the most influent (max Factor)
and the less influent (min Factor)
data scores_ishit;
set scores_ishit ;
max=max(Factor1,Factor2,
Factor3,Factor4,Factor5,
Factor6,Factor7,Factor8,
Factor9,Factor10)
;
min=min(Factor1,Factor2,
Factor3,Factor4,Factor5,
Factor6,Factor7,Factor8,
Factor9,Factor10)
;
run;

data scores_ishit;
set scores_ishit;
array factor Factor1-factor10;
do i=1 to dim(factor);
if max=factor [i] then factor_max=i;
if min=factor [i] then factor_min=i;
end;
run;

Friday, February 02, 2007

proc logistic

ods trace on;

It helps for us to receive all possible outputs
for example Type3

proc logistic data = k outest=jj;
class ses race schtyp/param =glm;
model female = age ses race schtyp science write/outroc =roc lackfit;
units age=35 45 55;
ods output ParameterEstimates = model_female
Type3=chisq;
run;

I want to keep only significent variables:

data chisq1;
set chisq;
/*format ProbChiSq;*/
if ProbChiSq>0.05 then delete;
run;


PROC SQL ;
SELECT effect INTO :mm separated BY " "
from chisq1;
quit;




%put &mm;
proc logistic data = k outest=jj;
class &mm/param =glm;
model female = &mm/outroc =roc lackfit;
units age=35 45 55;
ods output ParameterEstimates = model_female
TypeIII=chisq;
run;

Export using scan

%let name=1+3+6+12;
%macro stam;
%do
i=1 %to 4;
data kim_%scan(&name,&i,+) ;
set kim;
if vetek=%scan(&name,&i,+)*1;
run;

PROC EXPORT DATA= kim_%scan(&name,&i,+)
OUTFILE= "\\c\kim_%scan(&name,&i,+).csv"
DBMS=CSV REPLACE;
RUN;
%end
;
%mend
;
%stam;