Tuesday, 11 September 2018

Basic calculations: Missing data and logical operators in R Language

Missing data

R represents missing observations through the data value NA
We can detect missing values using is.na

> x  <-  NA             # assign NA to variable x
> is.na (x)               # is it missing ?
   [1]    TRUE

Now try a vector to know if any value is missing?

> x <-  c(11, NA, 13)
> is.na (x)
  [1] FALSE TRUE FALSE














Example : How to work with missing data

> x  <-  c(11, NA, 13)  # vector
> mean (x)     11 + NA + 13/2
  [1]   NA
> mean (x, na.rm = TRUE )  # NAs can be removed 
 [1]  12
                    11 + 13/2 = 12
The null object, called NULL, is returned by some functions and expressions.

Note that NA and NULL are not the same.

NA is a placeholder for something that exists but is missing.

NULL stands for something that never existed at all.





Logical Operators and Comparisons

The following table shows the operations and functions for logical comparisons (True or False)

TRUE and FALSE are reserved words denoting logical constants.


Logical Operators and Comparisons



  • The shorter form performs element-wise comparisons in almost the same way as arithmetic operators.
  • The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined.
  • The longer form is appropriate for programming control-flow and typically preferred in if clauses (conditional).
TRUE and FALSE are reserved words denoting logical constants


Example

 > x  <- 5
Is x less than 10 or x is greater than 5 ?
 > (x < 10) | | (x > 5)   # | | means OR
 [1]  TRUE

Is x greater than 10 or x is greater than 5 ?
> (x > 10) | |  (x > 5)
[1] FALSE


Monday, 10 September 2018

Statisticsl Functions - Correlation and Example in R Language

Descriptive Statistics :

First hand tools which gives first hand information.
  • Central tendency of data
  • Variation in data
  • Structure and shape of data tendency
  • Relationship study (correlation coefficient, rank correlation, correlation ratio, regression etc.)
Bivariate Data

Quantitative measures provide quantitative measure of relationship.

Graphical plots provide first hand visual information about the nature and degree of relationship between two variables.

Relationship can be linear or nonlinear.



x, y : Two data vectors

Data    x = (x1,x2,....,xn)                       y = (y1,y2,...,yn)

cov (x,y) :    covariance between x and y
var (x)Variance of x


Correlation coefficient

Measures the degree of linear relationship between the two variables.
cor (x,y) : correlation between x and y




Example :-

Covariance:

Example :-

Correlation coefficient:
Exact positive linear dependence

> cor ( c(1,2,3,4) , c(1,2.3,4)  )
 [1]  1



Data on Daily water Demand




Statistical Function bivariate three dimensional plot in R Language

Bivariate Plot :

Provide first hand visual information about the nature and degree of relationship between two variables.

Relationship can be linear or nonlinear.

We discuss several types of plots through example.


Scatter Plot :

plot command:
x, y : Two data vectors
plot (x,y)
plot (x, y, type)



Get more details from help: help ("type")
Other options:

main             an overall title for the plot.
suba              sub title for the plot.
xlaba             title for the x axis.
ylaba             title for the y axis.
aspthe           y/x aspect ratio.

Example :

Daily water demand in a city depends upon weather temperature.

We know from experience that water consumption increase as weather temperature increase. 

Date on 27 days is collected as follows:
Daily water demand (in million liters)
water <- c (33710, 31666, 33495, 32758, 34067, 36069, 37497, 33044, 35216, 35383, 37066, 38037, 38495, 39895, 41311, 42849, 43038, 43873, 43923, 45078, 46935, 47951, 46085, 48003, 45050, 42924, 46061)

Temperature (in centigrade)
temp <- c (23,25,25,26,27,28,30,26,29,32,33,34,35,38,39,42,43,44,45,45,.5,
45, 46,44,44,41,37,40)


Plot command:
 
x, y :  Two data vectors
Various type of plot are possible to draw.

plot (x, y)

plot (water, temp)

 

plot (water, temp, "1")

"1" for lines,






plot (water, temp, "0")

"0" for both 'overplotted'

 


plot (water, temp, "h")

"h" for 'histogram' like 
(or 'high-density')
vertical lines 


 


plot (water, temp, "s")

"s" for stair steps.





Smooth Scatter plot

scatter.smooth (x, y) provides scatter plot with smooth curve 
Example: scatter.smooth (water, temp)


Matrix Scatter plot

The command pairs ( ) allows the simple creation of a matrix of scatter plots.
> pairs ( cbind (water, temp) )


3 Dimensional Scatter Plot:

Scatterplot3d ( ) Plots a three dimensional (3D) point cloud
> install.packages ("sccatterplot3d")
> library (scatterplot3d)
> setwd ("c: /RCourse/")
> data3d <- read.csv ("data-age-height-weight.csv")
> data3d
> scatterplot3d (data3d [, 1: 3])


More functions
  • contour ( )        for contour lines
  • dotchart ( )       for dot charts (replacement for bar charts)
  • image ( )           pictures with colors as third dimension
  • mosaicplot ( )   mosaic plot for (multidimensional) diagrams of of categorical variables (contingency tables)
  • persp ( )           perspective surfaces over the x-y plane


Sunday, 9 September 2018

Association Rule Mining in R Language

Association Rule Mining
  • In idea mining, Association Rule Learning is a popular and well researched method for discovering interesting relations between variables in large database.
  • It is intended to identify strong rules discovered in database using different measures of interests.
  • The rule found in the sales data of a supermarket would indicated that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat.
  • Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements.

Constraints on below measures are used to select useful and best rules of all rules by R. After analyzing these values for all the rules, best rules for WB have been obtained.


E.g. :- Consider rule: {Jack the Ripper (1988)} => {Strawberry Blonde}
Let Jack the Ripper =X and Strawberry Blonde =Y, Then

Support (X U Y) = No of transactions involving both Jack the Ripper and Strawberry Blonde/Total no of transactions.

Confidence= No of transactions where Strawberry Blonde was also bought when Jack the Ripper was bought/ No of transactions where Jack the Ripper was bought

Lift = Ratio of observed support to the expected support


Friday, 7 September 2018

Statistical Function-Boxplots, Skewness and Kurtosis in R Language

Summary of observation

In R, quartiles, minimum and maximum values can be easily obtained by the summary command

summary (x)    x: data vector
It gives information on
  • minimum
  • maximum
  • first quartile
  • second quartile (median) and
  • third quartile.

Boxplot

Boxplot is a graph which summarizes the distribution of a variable by using its median, quartiles, minimum and maximum values.



boxplot ( ) draws a box plot





Descriptive Statistics:

First hand tools which gives first hand information.
  • Structure and shape of data tendency (symmetricity, skewness, kurtosis etc.)
  • Relationship study (correlation coefficient, rank correlation, correlation ratio, regression etc.)

Skewness

Measures the shift of the hump of frequency curve.
Coefficient of skewness based on values x1,x2,....,xn.






Kurtosis

Measures the peakedness of the frequency curve.
Coefficient of kurtosis based on values x1,x2,...,xn.





Skewness and Kurtosis

First we need to install a package 'moments'
> install.packages ("moments")
> library (moments)
skewness  ( )  :  computes coefficient of skewness
kurtosis    ( )   :  computes coefficient of kurtosis



Wednesday, 5 September 2018

Basics Calculations: Matrix Operations in R Language

In R, a 4 𝗑 2-matrix X can be created with a following command:

> x <-  matrix (nrow=4,   ncol=2,  data=c(1,2,3,4,5,6,7,8)  )

> x
                [,1]       [,2]
[1,]             1          5
[2,]             2          6
[3,]             3          7
[4,]             4          8

Properties of a Matrix

We can get specific properties of a matrix:


> dim (x)         # tells the
[1]   4   2             dimension of matrix

> nrow (x)       # tells
[1]  4                    the number of rows

> ncol (x)        # tells 
[1]  2                  the number of columns

> mode (x)      # Informs the type or storage mode of an object, e.g., numerical, logical etc.
[1]   "numeric"
attributes provides all the attributes of an object

> attributes (x)    # Informs the dimension of matrix 
$dim   [1]    4   2

Help on the Object "Matrix"

To know more about these important objects, we use R-help on "matrix".
> help ("matrix")
matrix     package:base            R Documentation
Matrices
Description :
'matrix'  creates a matrix from the given set of values.
'as.matrix' attempts to turn its argument into a matrix.
'is.matrix'  tests if its argument is a (strict) matrix. It is generic: you can write methods to handle specific classes of objects, see Internal Methods.

Then we get an overview on how a matrix can be created and what parameters are available:

Usage :
   matrix(data  [= NA, nrow = 1 , ncol = 1, byrow = FALSE, dimension = NULL)
  as.matrix (x)
  is. matrix (x)

Arguments :
  data: an optional data vector.
  nrow: the desired number of rows
  ncol: the desired number of columns
  byrow: logical. If 'FALSE' (the default) the matrix is filled by columns, otherwise the matrix is filled by rows.

dimnames:  A 'dimnames'  attribute for the matrix: a 'list' of length 2.
        x: an R object.

Finally, references and cross-references are displayed...
References :
  Becker, R. A.,  Chambers, J. M. and wilks, A.
  R. (1988)  _The New S Language_. wadsworth & Books/Cole.

See Also:
  'data.matrix' , which attempts to convert to a numeric matrix.
.... as well as an example:

Examples :
  is.matrix (as.matrix (1 : 10) )
  data (warpbreaks)
  ! is.matrix(warpbreaks) #  data.frame, NOT matrix!
  warpbreaks [1 : 10,]
  as.matrix(warpbreaks[1 : 10,])  #using
      as.matrix.data.frame(.) method


Matrix Operations 

Assigning a specified number to all matrix elements:

> x  <-  matrix (nrow=4, ncol=2, data=2 )
> x 
             [,1]    [,2]
[1,]         2        2
[2,]         2        2
[3,]         2        2
[4,]         2        2

Construction of a diagonal matrix, here the identity matrix of a dimension 2:

> d  <-  diag (1,  nrow=2,  ncol=2)
> d
        [,1]   [,2]
[1,]    1       0
[2,]    0       1




Transpose of a matrix x:  x'

>  x  <- matrix (nrow=4, ncol=2, data=1:8,  byrow=T )
>  x
                [,1]      [,2]
[1,]             1          2
[2,]             3          4
[3,]             5          6
[4,]             7          8

Multiplication of a matrix with a constant



Popular Posts

Categories

100 Python Programs for Beginner (56) AI (34) Android (24) AngularJS (1) Assembly Language (2) aws (17) Azure (7) BI (10) book (4) Books (174) C (77) C# (12) C++ (82) Course (67) Coursera (228) Cybersecurity (24) data management (11) Data Science (128) Data Strucures (8) Deep Learning (21) Django (14) Downloads (3) edx (2) Engineering (14) Excel (13) Factorial (1) Finance (6) flask (3) flutter (1) FPL (17) Google (34) Hadoop (3) HTML&CSS (47) IBM (25) IoT (1) IS (25) Java (93) Leet Code (4) Machine Learning (60) Meta (22) MICHIGAN (5) microsoft (4) Nvidia (3) Pandas (4) PHP (20) Projects (29) Python (935) Python Coding Challenge (368) Python Quiz (27) Python Tips (2) Questions (2) R (70) React (6) Scripting (1) security (3) Selenium Webdriver (4) Software (17) SQL (42) UX Research (1) web application (8) Web development (4) web scraping (2)

Followers

Person climbing a staircase. Learn Data Science from Scratch: online program with 21 courses