The Magic Of Information Systems

Thursday, 28 March 2013

Plotting in R

Session 10, March 26th, 2013

Assignment 1:

Create 3 vectors x,y,z and choose any random values for them.

T<-cbind(x,y,z)

Create 3-dimentional plot of the same

plot3d(T)

plot3d(T,col=rainbow(1000))

plot3d(T,col=rainbow(1000),type='s')

Assignment 2 :

Create 2 random variables x,y & create a few plots showing

X-Y
X-Y|Z ( Introduce a variable z and cbind it to z and y with 5 different categories ... Hint : ?factor)
Color code and draw the graph
Smooth and fit line for the curve

qplot(x,y)

qplot(x,z)

qplot(x,z,alpha=I(1/10))

qplot(x,z,alpha = I(1/30))

qplot(x,y,geom=c("boxplot", "jitter"))

qplot(x,y,geom=c("point", "smooth"))

qplot(x,y,colour=z)

qplot(log(x),log(y), colour=z)

Saturday, 23 March 2013

ITBAL : 9th Session

Want to Know your FB Page better : Check out Wolfram Alpha

Wolfram Alpha is a computational knowledge engine, which by doing dynamic computations based on a vast collection of built-in data, algorithms, and methods generates a detailed Facebook report complete with statistical insights about how many links, photos and updates you ever posted on your page last year.

The Wolfram Alpha Facebook report does some very basic things, like adding to the information it learns from Facebook, such as noting the population of the city you live in or calculating the number of months and days to your next birthday. Then it offers calculations on your Facebook usage. What words do you use most frequently? How often do you upload photos or post links–and how has that changed over time? How many characters is your average post? It tells you your most liked and most commented-on posts, as well as those who most frequently share and comment on your posts.

1. Connect with facebook, sign in for free, and get unique personalized information and analysis on your social data.

2. What do you frequently talk about on facebook and most liked post and photos.

3. When do you use facebook? When you are most active?

4. Where are your friends?

7. Clustering of your friends

8. Who plays the special role in your network?

This software requires you to login to facebook and only then gathers information, thereby respecting the privacy the privacy settings also.

Friday, 15 March 2013

Panel Data Analysis

QUESTION 1

Do Panel Data Analysis of "Produc" data analyzing on three types of model :
a. Pooled affect model
b. Fixed affect model
c. Random affect model

Determine which model is the best by using functions:
pFtest : Fixed vs Pooled
plmtest : Pooled vs Random
phtest: Random vs Fixed

Pooled Model

Command:

pool<-plm( log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

, data= Produc, model = ("pooling"), index = c("state","year"))

Fixed Model

Command:

fixed<-plm( log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

, data= Produc, model = ("within"), index = c("state","year"))

Random Model

Command:

random<-plm( log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

, data= Produc, model = ("random"), index = c("state","year"))

Pooled vs Fixed

Null Hypothesis: Pooled Model

Alternate Hypothesis : Fixed Model

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Model is better than Pooled Model

Pooled vs Random

Null Hypothesis: Pooled Model

Alternate Hypothesis: Random Model

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Model is better than Pooled Model

Random vs Fixed

Null Hypothesis: No Correlation . Random Model

Alternate Hypothesis: Fixed Model

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Model.

Conclusion:

So after making all the comparisons we come to the conclusion that Fixed Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation.

Wednesday, 13 February 2013

Day 6 : IT Lab

Date : 12-Feb-2013

Assignment 1 - Download data for NIFTY index from 1st Jan , 2012 to 31st Jan 2013.. Calculcate the log of returns data and find out the historical volatility.

Soln -:

Commands used - :

readData<-read.csv(file.choose() , header=T)
closePrice<-readData[,5] // Reading Closing Price Column
closePrice.ts<-ts(closePrice , frequenxy=252) // making a time series
varLag<- lag(closePrice.ts , k=-1) // calculating stock price for time (t-1)
logNum<- log(closePrice.ts , base=exp(1)) - log(varLag , base=exp(1)) // Calculating log
LogReturns<-logNum/log(varLag , base=exp(1)) // calculating log for returns data

// To calculate Historical volatility
sqrt<-(252)^0.5
histVolaitility<-sd(logreturns)*sqrt

Assignment 2 :

To create an acf plot for the log returns data calculated previously. Also do and adf test and interpret the findings.

Soln -:

// to create acf plot

acf(logReturns)

Grahical Interpreation
- the blue dotted lines represent confidence interval for the hypothesis (95% in default case)
- As all the co-relations plots(vertical lines) lie inside those two blue dotted lines , we can safely suggest that the returns data is "Stationary" in nature. This is visual inspection method for determining stationarity.

Using ADF test
Command used
adf.test(logReturns)

Interpretation from ADF test
Null Hypothesis -: The returns data is not Stationary
Alternative Hypothesis -: Returns Data is stationary

As from the test results p-value = 0.01 which is less than 0.05 value as stated for 95%confidence interval.
Hence Null Hypothesis is rejected.

Results -: given data is stationary in nature

Thursday, 7 February 2013

Day 5 : 5th Feb 2013

We converted data into Time series and calculated returns

Assignment 1:
Find Returns of NSE data for > 6 months having selected the 10th data point as start and 95th data point as end
Also Plot the return
Data set : S&P CNX NIfty data from 1st july 2012 - 31st December 2012 ( 6 months)

Assignment 2 :

Data is available from 1-700. Predict the data from 701-850, using the GLM estimation using LOGIT analysis for the same

Wednesday, 23 January 2013

LECTURE 3 - 22nd Jan 2013

ASSIGNMENT 1a) - GROOVE MILAGE DATA

Fit 'lm' and comment on the applicability of lm.
Plot : Residual vs Independent curve
Plot: Standard residual vs Independent curve

As the plot is random, Linearity is applicability.

ASSIGNMENT 2) : Using the given data of alpha - pluto, Fit 'lm' and comment on its applicability

Plot is Random.

Hence Linearity is applicable.

ASSIGNMENT 3 : Justify NUll Hypothesis using ANOVA

Analysis of the result :
P value is 0.687%
High P value. Thus there is not sufficient information to reject the null hypothesis.

Wednesday, 16 January 2013

LECTURE 2 - 16TH JAN 2013

QUESTION 1 :

CREATE 2 MATRICES OF DIMENSION [3X3]. SELECT THE SECOND COLUMN IN THE FIRST MATRIX & THE THIRD COLUMN IN THE SECOND MATRIX. USE CBIND TO COMBINE THE SELECTED COLUMNS

QUESTION 2 :

MULTIPLY 2 MATRICES

QUESTION 3 :

DOWNLOAD NSE DATA FOR A PERIOD OF 31 DAYS, SAY FROM 1ST DECEMBER 2012 TO 31ST DECEMBER 2012. FIND REGRESSION.

QUESTION 4 :

GENERATE & PLOT NORMAL DISTRIBUTION GRAPH

> x<-seq(0,300)
> y<-dnorm(x,mean=100,sd=20)
> plot(x,y,type=”l”)