Help for package SenSrivastava

Version:

2015.6.25.1

Date:

2015-06-25

Title:

Datasets from Sen & Srivastava

Author:

Kjetil B Halvorsen <kjetil1001@gmail.com>

Maintainer:

Kjetil B Halvorsen <kjetil1001@gmail.com>

Depends:

R (≥ 2.10), stats

Suggests:

leaps, car

LazyData:

TRUE

Description:

Collection of datasets from Sen & Srivastava: "Regression Analysis, Theory, Methods and Applications", Springer. Sources for individual data files are more fully documented in the book.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Encoding:

UTF-8

Repository:

CRAN

NeedsCompilation:

Packaged:

2023-12-11 07:59:24 UTC; hornik

Date/Publication:

2023-12-11 08:22:35 UTC

Data on density of vehicles and average speed

Description

The E1.1 data frame has 24 rows and 2 columns.

Usage

data(E1.1)

Format

This data frame contains the following columns:

DENSITY: a numeric vector, vehicles per mile.
SPEED: a numeric vector, miles per hour.

Details

Example 1.1 page 2 in Sen and Srivastava.

Source

Huber, M.J (1957) Effect of temporary bridge on Parkway performance. Highway Research Board Bulletin 167 63–74.

Examples

data(E1.1)
attach(E1.1)
plot(DENSITY, sqrt(SPEED))
E1.1.m1 <- lm(sqrt(SPEED) ~ DENSITY + I(DENSITY^2), data=E1.1)
summary(E1.1.m1)

Data on violent and property crimes in 22 metropolitan areas of the U.S.

Description

The E1.11 data frame has 23 rows and 4 columns.

Usage

data(E1.11)

Format

This data frame contains the following columns:

Metro.Area: a character vector, names and state of each metropolitan area.
Violent.Crimes: a numeric vector, units of measurenment not given.
Property.Crimes: a numeric vector, units of measurement not given.
Population: a numeric vector, in thousands.

Source

Dacey, M.F.(1983) Social science Theories and Methods I: Models of data, Evanston: Northwestern University.

Examples

data(E1.11)
attach(E1.11)
plot(Population, Violent.Crimes)
detach()

Stevens Experiment to compare notes against a standard (80 Db)

Description

The E1.15 data frame has 10 rows and 3 columns. Stevens (1956) asked a number of persons to compare notes of various decibel levels against a standard (80 decibels) and to assign them a loudness rating with the standard note being a 10. logy is the response variable and x the stimulus.

Usage

data(E1.15)

Format

This data frame contains the following columns:

x: a numeric vector, the stimulus.
y: a numeric vector, the median response at x
logy: a numeric vector, the log of y.

Source

Dacey,M.F. (1983) Social science Theories and Methods I: Models of Data Evanston: Northwestern University, fromStevens (1956).

Examples

data(E1.15)
attach(E1.15)
plot(x, logy)
abline(lm( logy ~ x, data=E1.15))
detach()

Earnings and Prices of Selected Paper Company Stocks

Description

The E1.16 data frame has 10 rows and 3 columns.

Usage

data(E1.16)

Format

This data frame contains the following columns:

Company: a character vector, name of the company
Earn.Share: a numeric vector, 1972 earnings per share, in dollars.
Price.Share: a numeric vector, prive per share, in dollars, in may, 1973.

Source

Dacey (1983, ch 1) from Moodys's Stock Survey, June 4, 1973, p 610.

Examples

with(E1.16, plot(Price.Share, Earn.Share))

Data on Population Density and Vehicle Thefts

Description

The E1.17 data frame has 18 rows and 3 columns.

Usage

data(E1.17)

Format

This data frame contains the following columns:

D: a numeric vector, district of Chicago. 1 is downtown Chicago.
pd: a numeric vector, population density of each district.
vtt: a numeric vector, vehicle thefts per thousand residents.

Source

Mark Buslik, Chicago Police Department.

Examples

data(E1.17)
attach(E1.17)
plot(pd, vtt)
cat("Use the mouse to identify the outlier in the plot (click on the outlier)\n")
## Not run: identify(pd, vtt)

Data on Simsbury Marriages

Description

The E1.18 data frame has 8 rows and 3 columns with data on the number of marriages (ma) that occurred between residents of each of 8 annular zones and residents of Simsbury, Connecticut, for the period 1930–39. The number of residents of each zone is given as pop and the midpoint of distance between Simsbury and the band is given as d.

Usage

data(E1.18)

Format

This data frame contains the following columns:

d: a numeric vector, distance between Simsbury and midpoint of annular zone.
pop: a numeric vector, population of annular zone.
ma: a numeric vector, number of marriages.

Source

Dacey (1983, ch 4) from Ellsworth (1948).

Examples

data(E1.18)
summary(E1.18)

Data on Book Prices, Pages and Type of Binding

Description

The E1.19 data frame has 20 rows and 3 columns. Compiled from the catalog of one publisher of American Government books.

Usage

data(E1.19)

Format

This data frame contains the following columns:

Price: a numeric vector, price of book.
P: a numeric vector, number of pages of book.
B: a factor with levels c p, c is cloth and p is paperback.

Source

Compiled by one of the authors.

Examples

   data(E1.19)
   summary(E1.19)

Data on Physical Quality of Life Index (PQLI) Scores and Infant Mortality Rates (IMR) for Selected Indian States

Description

The E1.20 data frame has 13 rows and 7 columns.

Usage

data(E1.20)

Format

This data frame contains the following columns:

State: a character vector, name of state.
PQLI: a numeric vector, Physical Quality of Life Index, a measure of average wealth.
Comb.IMR: a numeric vector, combined infant mortality rate.
Rur.M.IMR: a numeric vector, rural male infant mortality rate.
Rur.F.IMR: a numeric vector, rural female infant mortality rate.
Urb.M.IMR: a numeric vector, urban male infant mortality rate.
Urb.F.IMR: a numeric vector, urban female infant mortality rate.

Source

Dr. T.N.K.Raju, Department of Neonatology, University of Illinois at Chicago.

Examples

data(E1.20)
## Some data reorganization before analysis:
## Maybe reshape could have been used here?
 e1.20 <- data.frame(rbind(as.matrix(E1.20[,c(2,4)]), 
                          as.matrix(E1.20[,c(2,5)]),
                          as.matrix(E1.20[,c(2,6)]),
                          as.matrix(E1.20[,c(2,7)])),row.names=1:52)
  attr(e1.20,"names")[[2]] <- "IMR"
 e1.20$Female <- c(rep(0,13), rep(1,13),rep(0,13),rep(1,13))
 e1.20$Urban  <- c(rep(0,26),rep(1,26))
## Now the analysis can start.
summary(e1.20)

Data on Loads and Deformation of a Bar

Description

The E1.21 data frame has 24 rows and 2 columns. Data are on loads, in pounds, and corresponding deformation, in inches, of a mild steel bar, of length 8 inches and average diameter .564 inches.

Usage

data(E1.21)

Format

This data frame contains the following columns:

L: a numeric vector, load, in pounds.
D: a numeric vector, corresponding deformation, in inches.

Source

M.R. Khavanin, Department of Mechanical Engineering, University of Illinois at Chicago.

Examples

data(E1.21)
attach(E1.21)
plot(L, D)
detach()

Data on Population and Number of Telephones

Description

The E1.7 data frame has 6 rows and 2 columns. The relation between population and number of telephones have been used to estimate the population in non-census years.

Usage

data(E1.7)

Format

This data frame contains the following columns:

RES: a numeric vector, number of residents.
MAINS: a numeric vector, number of telephones.

Source

Prof. Edwin Thomas, Department of Geography, University of Illinois at Chicago.

Examples

data(E1.7)
attach(E1.7)
plot(RES, MAINS)
plot(sqrt(RES), sqrt(MAINS))

Multicollinear Data

Description

The E10.1 data frame has 10 rows and 5 columns. The responses were obtained by adding a N(0, 0.01) pseudorandom variate to x.1+0.5x.2. The data were made up by the authors.

Usage

data(E10.1)

Format

This data frame contains the following columns:

x.1: a numeric vector, predictor 1.
x.2: a numeric vector, predictor 2.
y.1: a numeric vector, response 1.
y.2: a numeric vector, response 2.
y.3: a numeric vector, response 3.

Source

The data were made up by the authors.

Examples

data(E10.1)
attach(E10.1)
plot(x.1, x.2)
names(E10.1)
hascar <- require(car)
if (hascar) {
   mod <- lm(y.1 ~ x.1+x.2, data=E10.1)
   vif(mod)
}

Longley's Data

Description

The E10.11 data frame has 16 rows and 7 columns. This is a selection of Longley's multicollinear data (1967).

Usage

data(E10.11)

Format

This data frame contains the following columns:

Def: a numeric vector, a price index.
GNP: a numeric vector, gross national product.
Unemp: a numeric vector, unemployment rate.
AF: a numeric vector, employment in the armed forces.
Pop.: a numeric vector, noninstitutional population.
Year: a numeric vector, the year.
Total: a numeric vector, the response, total employment.

Source

Reproduced from the Journal of the American Statistical Association, 62.

Examples

data(E10.11)
summary(E10.11)
plot(E10.11)

Supervisor Rating Data

Description

The E10.3 data frame has 30 rows and 6 columns. This is part of a larger data set gathered for other purposes. The six variables are each composites obtained from responses to a questionare. The dependent variable y is a composite of responses towards thr respondent's supervisor and on job satisfaction. The highest possible score is 20. The predictor variables are defined below.

Usage

data(E10.3)

Format

This data frame contains the following columns:

x.1: a numeric vector, measures the level of social contact each respondent felt he or she had with the supervisor, was based on questions like "Do you see your supervisor outside of your work place?"
x.2: a numeric vector, measures the perceived level of interest from the supervisor in the employees personal life. Based on questions like "Would you discuss a personal problem with your supervisor?"
x.3: a numeric vector, measures the level of support the employee feels from the supervisor. Based on questions like "Is your supervisor supportive of your work?"
x.4: a numeric vector, together with x.5 measure the drive of the supervisor. Based on the emplotees perception of this drive.
x.5: a numeric vector, based on questions like "Does your supervisor encourage you to learn new skills?"
y: a numeric vector, the response.

Source

Sen and Srivastava (1990) Regression Analysis, Theory, Methods and Applications. Springer-verlag.

Examples

data(E10.3)
summary(E10.3)
plot(E10.3)

Artificially Created Data for an Example on Variable Search

Description

The E11.1 data frame has 20 rows and 5 columns.

Usage

data(E11.1)

Format

This data frame contains the following columns:

x.1: a numeric vector, predictor 1.
x.2: a numeric vector, predictor 2.
x.3: a numeric vector, predictor 3.
x.4: a numeric vector, predictor 4.
y: a numeric vector, response.

Source

Data made up by the authors.

Examples

data(E11.1)
exleaps <- require("leaps", quietly=TRUE)
if (exleaps) {
   E11.1.m1 <- regsubsets(y ~x.1+x.2+x.3+x.4, data=E11.1)
   summary(E11.1.m1)
   plot(E11.1.m1)
}

Data on Grade Point Average and SAT Scores

Description

The E2.1 data frame has 9 rows and 3 columns.

Usage

data(E2.1)

Format

This data frame contains the following columns:

GPA: a numeric vector, grade point average (maximum=4)
SATV: a numeric vector, SAT verbal score.
SATM: a numeric vector, SAT mathematical score.

Source

Dacey (1983).

Examples

data(E2.1)
summary(E2.1)

Demographic Data for the 50 States of the U.S.

Description

The E2.11 data frame has 50 rows and 27 columns, this combines exhibits E2.10 and E2.11 in the book. The data are for 1980 except as noted.

Usage

data(E2.11)

Format

This data frame contains the following columns:

State: a character vector, two-letter state code.
POP: a numeric vector, total population (1000's).
UR: a numeric vector, per mil of population living in urban areas.
MV: a numeric vector, per mil who moved between 1965 and 1970.
BL: a numeric vector, number of blacks (1000's).
SP: a numeric vector, number of spanish speaking (1000's)
AI: a numeric vector, number of native americans (100's).
IN: a numeric vector, number of inmates of all institutions (correctional, mental, TB, etc) in 1970, (1000's).
PR: a numeric vector, number of inmates of correctional institutions in 1970 (100's)
MH: a numeric vector, Homes and schools for the mentally handicapped (100's)
B: a numeric vector, births per thousand.
HT: a numeric vector, death rate from hearth disease per 100000 residents.
S: a numeric vector, suicide rate, 1978, per 100000.
DI: a numeric vector, death rate from diabetes, 1978, per 100000.
MA: a numeric vector, marriage rate, per 10000.
D: a numeric vector, divorce rate, per 10000.
DR: a numeric vector, physicians per 100000.
DN: a numeric vector, dentists per 100000.
HS: a numeric vector, per mil high school grads.
CR: a numeric vector, crime rate per 100000 population.
M: a numeric vector, murder rate oer 100000 population.
PI: a numeric vector, prison rate (federal and state) per 100000 residents.
RP: a numeric vector,
VT: a numeric vector,
PH: a numeric vector, telephjones per 100 (1979).
INC: a numeric vector, per capita income in 1972 dollars.
PL: a numeric vector, per mil of population below poverty label.

Source

Compiled by Prof. Siim Soot, Department of Geography, University of Illinois at Chicago, from Statistical Abstract of the United States, 1981, U.S. Bureau of the Census, Washington, D.C.

Examples

data(E2.11)
summary(E2.11)

Data on House Prices

Description

The E2.2 data frame has 26 rows and 14 columns, data on house prices in different zones of Chicago.

Usage

data(E2.2)

Format

This data frame contains the following columns:

Price: a numeric vector, selling price of house in thousands of dollars.
BDR: a numeric vector, number of bedrooms.
FLR: a numeric vector, floor space in sq. feet.
FP: a numeric vector, number of fireplaces.
RMS: a numeric vector, number of rooms.
ST: a numeric vector, storm windows (1 present, 0 absent).
LOT: a numeric vector, front footage of lot in feet.
TAX: a numeric vector, annual taxes.
BTH: a numeric vector, number of bathrooms.
CON: a numeric vector, construction (0 if frame, 1 if brick).
GAR: a numeric vector, garage size (0=no garage, 10 1 auto garage, etc.).
CDN: a numeric vector, condition (1=needs work, 0 otherwise).
L1: a numeric vector, indicator for zone A.
L2: a numeric vector, indicator for zone B.

Source

Ms. Terry Tasch of Long-Kogan Realty, Chicago.

Examples

data(E2.2)
summary(E2.2)

International Car Ownership Data

Description

The E2.4 data frame has 24 rows and 8 columns, all data are for 1978.

Usage

data(E2.4)

Format

This data frame contains the following columns:

Country: a character vector, name of each country.
AO: a numeric vector, cars per person.
POP: a numeric vector, population of country in millions.
DEN: a numeric vector, population density.
GDP: a numeric vector, per capita income in U.S. dollars.
PR: a numeric vector, gasoline price in U.S. cents per liter.
CON: a numeric vector, Tonnes of gasoline consumed per car per year.
TR: a numeric vector, thousands of passenger-kilometers per person of bus and rail use.

Details

Develop a model with AO as the response variable.

Source

OECD (1982)

Examples

data(E2.4)
summary(E2.4)

Voltage Data

Description

The E2.6 data frame has 10 rows and 2 columns.

Usage

data(E2.6)

Format

This data frame contains the following columns:

V.a: a numeric vector, actual voltage.
V.c: a numeric vector, voltage computed from the measured power outout (using light output from electronic flash).

Details

A definition of efficiency is the ratio V.c/V.a. Obtain a model for efficiency E as a regresion in V.a. Use a quadratic polynomial. Examine the fit.

Source

Armin Lehning, Speedotron Corporation.

Examples

data(E2.6)
E2.6.m1 <- lm(V.c/V.a ~ V.a + I(V.a^2), data=E2.6)
plot(E2.6.m1)

Korean Auto Ownership Data

Description

The E2.7 data frame has 10 rows and 5 columns.

Usage

data(E2.7)

Format

This data frame contains the following columns:

Year: a numeric vector, year.
AO: a numeric vector, number of cars per person.
GNP: a numeric vector, per capita GNP in 1000 korean Wons.
CP: a numeric vector, average car price in 1000 korean Wons.
OP: a numeric vector, gasoline price after taxes, in wons per liter.

Source

KRIHS, (1985) Study of Road User Charges. Seoul: Korea Research Institute for Human Settlements.

Examples

data(E2.7)
summary(E2.7)

Data on per Capita Output of Workers in Shanghai

Description

The E2.8 data frame has 17 rows and 4 columns, data for 17 factories in Shanghai.

Usage

data(E2.8)

Format

This data frame contains the following columns:

Output: a numeric vector, per capita output in Chinese Yuan.
SI: a numeric vector, number of workers in the factory.
SP: a numeric vector, land area of the factory in sq. meters per worker.
I: a numeric vector, investments in Yuan per worker.

Source

Prof. Zhang Tingwei of Tongji University, Shanghai.

Examples

   data(E2.8)
   summary(E2.8)

Data on Capital, Labour and Value Added for Three Sectors

Description

The E2.9 data frame has 15 rows and 10 columns. The three sectors are "20": Food and kindred products, "36": Equipment and supplies and "37": Transportation equipment.

Usage

data(E2.9)

Format

This data frame contains the following columns:

YEAR: a numeric vector, year without first two digits "19".
Cap.20: a numeric vector, capital of sector 20.
Cap.36: a numeric vector, capital of sector 36.
Cap.37: a numeric vector, capital of sector 37.
Lab.20: a numeric vector, labour of sector 20.
Lab.36: a numeric vector, labour of sector 36.
Lab.37: a numeric vector, labour of sector 37.
Val.20: a numeric vector, real value added of sector 20.
Val.36: a numeric vector, real value added of sector 36.
Val.37: a numeric vector, real value added of sector 37.

Source

Dr. Phillip Israelovich of the Federal Reserve Bank.

Examples

data(E2.9)
summary(E2.9)

Men's Worlds Record Times for Running and Corresponding Distances

Description

The E3.4 data frame has 13 rows and 2 columns. World record times as of 1974.

Usage

data(E3.4)

Format

This data frame contains the following columns:

Dist.: a numeric vector, distance in meters.
Time: a numeric vector, time in seconds.

Source

Encyclopædia Britannica, 15th Edition, 1974, Micropædia, IX, page 485.

Examples

data(E3.4)
summary(E3.4)

Women's World Record Times for Running and Corresponding Distances

Description

The E3.5 data frame has 6 rows and 2 columns. Records are for 1974.

Usage

data(E3.5)

Format

This data frame contains the following columns:

Dist.: a numeric vector, distance run, in meters.
Time: a numeric vector, time used, in seconds.

Source

Encyclopædia Britannica, 15th Edition, 1974, Micropædia, IX, page 487.

Examples

data(E3.5)
data(E3.4)
summary(E3.5)
summary(E3.4)
records <- rbind(E3.5,E3.4)
sex <- factor(c(rep("F", 6), rep("M", 13)))
records$sex <- sex
summary(records)

Data on Corporations and Corporation Chairmen

Description

The E3.6 data frame has 50 rows and 6 columns.

Usage

data(E3.6)

Format

This data frame contains the following columns:

Y84: a numeric vector, salary 1984, in dollars.
Y83: a numeric vector, salary 1983, in dollars.
SHARES: a numeric vector, number of shares the chairman holds.
REV: a numeric vector, total revenue of the company.
INC: a numeric vector, total income of the company.
AGE: a numeric vector, age of chairman, in years.

Source

Examples

data(E3.6)
summary(E3.6)

Data on Oxygen Demand in Dairy Wastes

Description

The E3.7 data frame has 20 rows and 7 columns.

Usage

data(E3.7)

Format

This data frame contains the following columns:

Day: a numeric vector, day of measurement, all measurements are on the same sample.
x.1: a numeric vector, biological oxygen demand, mg/liter.
x.2: a numeric vector, total Kjeldahl nitrogen, mg/liter.
x.3: a numeric vector, total solids, mg/liter.
x.4: a numeric vector, total volatile solids, a component of x.3, in mg/liter.
x.5: a numeric vector, chemical oxygen demand, mg/liter.
y: a numeric vector, the response, log of oxygen demand, mg oxygen per minute.

Details

This is data from an experiment to construct a model for total oxygen demand in dairy wastes as a dunction of five laboratory measurements. Data were collected on samples kept in suspension in water in a laboratory for 220 days. All observations given here were taken on the same sample over time, so are probably dependent.

Source

Moore (1975) Total Biochemical Oxygen Demand of Animal Manures. Ph. D. thesis, University of Minnesota, Dept. of Agricultural Engineering.

Examples

data(E3.7)
summary(E3.7)

Map reading Test scores and Route Finding Scores

Description

The E3.8 data frame has 20 rows and 3 columns. 20 student volunteers where given a map reading test and a test of route finding on transit maps.

Usage

data(E3.8)

Format

This data frame contains the following columns:

y: a numeric vector, ability to find routes to a given destination on a transit route map where scored y.
sc: a numeric vector, scores on a map reading ability test.
Use: a factor with levels Non.users Users, users and non-users of transit.

Source

Preof. Siim Soot, Department of Geography, University of Illinois at Chicago.

Examples

data(E3.8)
summary(E3.8)

Blood Velocity Data

Description

The E3.9 data frame has 18 rows and 4 columns. All the observations are for the same person.

Usage

data(E3.9)

Format

This data frame contains the following columns:

x.1: a numeric vector, cardiac output.
x.2: a numeric vector, carbon dioxide level in the blood.
y: a numeric vector, blood flow velocity in the brain.
Aminophylline: a factor with levels no with, Aminophylline used or not. The hypothesis is that aminophylline retards blood flow.

Source

Tonse Raju, M.D., Department of Neonatology, University of Illinois at Chicago.

Examples

data(E3.9)
summary(E3.9)

Traffic Fatality Data for Illinois

Description

The E4.1 data frame has 10 rows and 3 columns. Deaths are in deaths per 100 million vehicle miles.

Usage

data(E4.1)

Format

This data frame contains the following columns:

Year: a numeric vector, the year.
Deaths: a numeric vector, number of deaths.
DFR: a numeric vector, deaths.t - deaths.(t-1).

Details

The interest are in possible changes after new safety regulations where in effect after 1966.

Source

Illinois Department of Transportation (1972).

Examples

data(E4.1)
summary(E4.1)

Votes from Chicago's Twenty-second Ward by Precinct

Description

The E4.10 data frame has 27 rows and 7 columns.

Usage

data(E4.10)

Format

This data frame contains the following columns:

Pr.: a numeric vector, precinct number.
LATV: a numeric vector, number of latin voters.
NONLV: a numeric vector, number of non-latin voters.
TURNOUT: a numeric vector, total number of votes cast.
GARCIA: a numeric vector, number of votes for Garcia.
MARTINEZ: a numeric vector, number of votes for Martinez.
YANEZ: a numeric vector, number of votes for Yanez.

Details

Note that the votes for the three candidates may not add to the total turnout because of write-in votes, spoilt ballots, etc.

Source

Ray Flores, The Latino Institute, Chicago.

Examples

data(E4.10)
summary(E4.10)

Data on Cost of Repairing Starters, Ring Gears or Both in Diesel Engines

Description

The E4.11 data frame has 133 rows and 2 columns.

Usage

data(E4.11)

Format

This data frame contains the following columns:

Cost: a numeric vector, the repair cost in dollars.
Part: a factor with levels Both Ring gear Starter, the type of part being repaired.

Source

M.R.Khavanin, Department of Mechanical Engineering, University of Illinois at Chicago.

Examples

data(E4.11)
E4.11.m1 <- lm(Cost ~ Part - 1, data=E4.11)
summary(E4.11.m1)

Time taken by Professional Dieticians and Interns for Four Patient Contact Activities

Description

The E4.12 data frame has 24 rows and 6 columns. Each row is the activities and time taken by one dietician.

Usage

data(E4.12)

Format

This data frame contains the following columns:

Time: a numeric vector, sum of time taken for all activities.
SC: a numeric vector, number of patient contacts for screening.
DC: a numeric vector, number of patient contacts for diet class.
MR: a numeric vector, number of patient contacts for meal rounds.
TR: a numeric vector, number of patient contacts for team rounds.
Dietician: a factor with levels Intern Prof, dietician is professional or intern.

Source

The data where made available to one of the authors by a student.

Examples

m1 <- lm(Time ~ SC+DC+MR+TR-1, data=E4.12, subset=Dietician=="Prof")
summary(m1)

Data on Hospital Charges

Description

The E4.13 data frame has 49 rows and 5 columns. Data on hospital charges for patients with an identical diagnosis.

Usage

data(E4.13)

Format

This data frame contains the following columns:

Sex: a factor with levels F M , male and female.
MD: a factor with levels 499 730 1021, three different medical doctors.
Svty: a factor with levels 1 2 3 4, severity of illness.
Chrg: a numeric vector, total hospital charge in dollars.
Age: a numeric vector, age of patient in years.

Source

Dr. Joseph Feinglass, Northwestern Memorial Hospital, Chicago.

Examples

data(E4.13)
summary(E4.13)

Measures of Quality for Agencies Delivering Transportation for the Elderly and the Handicapped

Description

The E4.4 data frame has 40 rows and 3 columns.

Usage

data(E4.4)

Format

This data frame contains the following columns:

QUAL: a numeric vector, a quality measure made using psychometric methods from results of questionares.
X.1: a numeric vector, an indicator variable for private ownership.
X.2: a numeric vector, an indicator variable for private for profit ownership.

Details

The quality data, QUAL, is constructed from questionares given to users of such services in the state of Illinois. Multiple services in the state of Illinois was scored using this method. The indicator variables was constructed to give first (X.1) a comparison between private and public services, then (X.2) a comparison between private not-for-profit and private for profit services.

Source

Slightly modified version of data supplied by Ms. Claire McKnight of the Department of Civil Engineering, City University of New York.

Examples

data(E4.4)
summary(E4.4)

Data on Per-Capita Income and Life Expectancy

Description

The E4.7 data frame has 101 rows and 3 columns.

Usage

data(E4.7)

Format

This data frame contains the following columns:

Country: a character vector, containing names of the countries.
LIFE: a numeric vector, life expectancy, years. Early 1970's.
INC: a numeric vector, per capita income in 1974 dollars. Early 1970's.

Source

From the New York Times (September, 28, 1975, p E-3).

Examples

data(E4.7)
attach(E4.7)
plot(INC, LIFE)
plot(log(INC), LIFE)
detach()

Data on Automobile Speed and Distance Covered to Come to a Standstill after Breaking

Description

The E6.1 data frame has 62 rows and 2 columns.

Usage

data(E6.1)

Format

This data frame contains the following columns:

d.: a numeric vector, distance covered to come to a standstill after breaking.
sp.: a numeric vector, speed before breaking.

Source

Examples

data(E6.1)
attach(E6.1)
plot(sp., d.)
detach()

Data on Perceived and Computed Travel Times by Bus

Description

The E6.10 data frame has 32 rows and 3 columns.

Usage

data(E6.10)

Format

This data frame contains the following columns:

n: a numeric vector, number of respondents, weights for the linear regression.
x: a numeric vector, computed travel times between a pair of zones in Chicago.
y: a numeric vector, perceived travel times, as reported to the U.S. Census Bureau.

Details

x where computed from bus timetables, adding an average waiting time at the stop, and an average walking time from zone center to bus stop. y is the average reported by n travelers, to the US census bureau. The variable t introduced in the example below is the one for multiple bus transfers, used in example 8.1 page 161.

Source

The data where selected by one of the authors from a larger data set compiled by Cæsar Singh from census tapes, timetables and maps.

Examples

data(E6.10)
## Manipulations of the data for example 8.1, page 161:
t <- c(0,1,rep(0,20),1,rep(0,5),1,rep(0,3))
e6.10 <- data.frame(E6.10, t=t)
rm(t)
summary(e6.10)

Heights of Fathers and Sons

Description

The E6.11 data frame has 12 rows and 3 columns.

Usage

data(E6.11)

Format

This data frame contains the following columns:

Height.Father: a numeric vector, height of father to the nearest inch.
Aver.Height.Son: a numeric vector, average heights of sons.
No.Fathers: a numeric vector, number of fathers in each group.

Source

dacey (1983, Ch. 1) from McNemar (1969, p. 130), Psycological Statistics.

Examples

data(E6.11)
summary(E6.11)

Dial-a-ride Data

Description

The E6.8 data frame has 54 rows and 7 columns. It has 7 variables describing 54 dial-a-ride services in U.S. and Canada. It needs weighted regression.

Usage

data(E6.8)

Format

This data frame contains the following columns:

POP: a numeric vector, population of area where service where operating.
AR: a numeric vector, area of the place where service where provided.
RDR: a numeric vector, number of riders using the system.
HR: a numeric vector, hours of operation.
VH: a numeric vector, number of vehicles in operation.
F: a numeric vector, the fare used.
IND: a numeric vector, a composite index, 1 when several ridership enhancing features where present, and 0 elsewhere.

Source

Collected by Louise Stanton-Maston, from 54 services in U.S. and Canada.

Examples

data(E6.8)
summary(E6.8)

Data on Dental Measurements

Description

The E7.1 data frame has 4 rows and 12 columns. Dental measurements for girls from 8 to 14 years old. Each measurement is the distance, in mm, from the center of the pituary to the ptery-maxilliary fissure.

Usage

data(E7.1)

Format

This data frame contains the following columns:

Age: a numeric vector, age of girl when measurement was taken.
S.1: a numeric vector, measurements for girl 1.
S.2: a numeric vector, measurements for girl 2.
S.3: a numeric vector, measurements for girl 3.
S.4: a numeric vector, measurements for girl 4.
S.5: a numeric vector, measurements for girl 5.
S.6: a numeric vector, measurements for girl 6.
S.7: a numeric vector, measurements for girl 7.
S.8: a numeric vector, measurements for girl 8.
S.9: a numeric vector, measurements for girl 9.
S.10: a numeric vector, measurements for girl 10.
S.11: a numeric vector, measurements for girl 11.

Source

Pothoff and Roy (1964).

Examples

data(E7.1)
summary(E7.1)

Prices of Crude Oil, Natural Gas, Bituminous Coal and Lignite, and Anthracite by Year.

Description

The E7.2 data frame has 32 rows and 5 columns. Prices are in 1972 cents (U.S) by 1000 BTU.

Usage

data(E7.2)

Format

This data frame contains the following columns:

year: a numeric vector, year of observation.
Oil: a numeric vector, price of oil.
Gas: a numeric vector, price of Gas.
Bit.: a numeric vector, price of Bituminous Coal and Lignite.
Anth.: a numeric vector, price of Anthracite.

Source

Darrel Sala, Institute of Gas Technology, Chicago.

Examples

data(E7.2)
summary(E7.2)

Data on Intake/Output Ratio

Description

The E7.3 data frame has 19 rows and 6 columns. It gives the ratios u of fluid intake to urine output over five consecutive 8-hour periods for 19 babies divided in a control and treatment group.

Usage

data(E7.3)

Format

This data frame contains the following columns:

G: a factor with levels surfactant placebo
u.1: a numeric vector, u for time period 1.
u.2: a numeric vector, u for time period 2.
u.3: a numeric vector, u for time period 3.
u.4: a numeric vector, u for time period 4.
u.5: a numeric vector, u for time period 5.

Source

Rama Bhat, M.D., Department of Pediatrics, University of Illinoi at Chicago. This data is part of a larger data set.

Examples

data(E7.3)
summary(E7.3)

Data on PCO2 and Cerebral Blood Flow for Five Regions of the Brain of five Chimpanzees

Description

The E7.4 data frame has 5 rows and 11 columns. Five baby chimpanzees were injected with a heavy dose of HIV infection. After six months, the radio-active microsphere technique was used to measure brain blood flow in ml per 100 grams of brain tissue, from five regions of the brain. The partial pressure of carbon dioxide in millimeters of mercury was also obtained.

Usage

data(E7.4)

Format

This data frame contains the following columns:

Ch.: a numeric vector, id number of chimpanzee.
Fron.x: a numeric vector, Frontal partial pressure of carbon dioxide.
Fron.y: a numeric vector, Frontal blood flow.
Pari.x: a numeric vector, Parietal partial pressure of carbon dioxide.
Pari.y: a numeric vector, Parietal blood flow.
Occi.x: a numeric vector, Occipital partial pressure of carbon dioxide.
Occi.y: a numeric vector, Occipital blood flow.
Temp.x: a numeric vector, Temporal partial pressure of carbon dioxide.
Temp.y: a numeric vector, Temporal blood flow.
Cere.x: a numeric vector, Cerebellum partial pressure of carbon dioxide.
Cere.y: a numeric vector, Cerebellum blood flow.

Source

Tonse Raju, M.D. Department of pediatrics, University of Illinois at Chicago.

Examples

data(E7.4)
summary(E7.4)

Data on Static Weights and Weight in Motion of Trucks

Description

The E7.5 data frame has 26 rows and 6 columns.

Usage

data(E7.5)

Format

This data frame contains the following columns:

sw.1: a numeric vector, static weight of axle 1.
wim.1: a numeric vector, weight in motion of axle 1.
sw.23: a numeric vector, static weight of axles 2–3.
wim.23: a numeric vector, weight in motion of axles 2–3.
sw.45: a numeric vector, static weight of axles 4–5.
wim.45: a numeric vector, weight in motion of axles 4–5.

Details

Trucks can be weighted by two methods. In one, a truck needs to go into a weighting station and each axle is weighted by conventional means. The other is newer and a somewhat experimental method where a thin pad is placed on the highway and axles are weighted as trucks pass over it. Former weight are called static weights (sw) while later are called weights in motion (wim).

Source

Saleh Mumayiz, Urban Transportation Center, University of Illinois at Chicago, who compiled the data from a data set provided by the Illinois Department of Transportation.

Examples

data(E7.5)
summary(E7.5)
plot(E7.5)

Community Area Data for the North Part of the City of Chicago

Description

The E7.6 data frame has 34 rows and 5 columns.

Usage

data(E7.6)

Format

This data frame contains the following columns:

Area.Name: a character vector, name of area.
PB: a numeric vector, percentage of population which are black.
PS: a numeric vector, percentage of population which are spanish speaking.
PA: a numeric vector, percentage of population over 65.
Income: a numeric vector, median family income for each area.

Source

Data set were constructed by Prof. Siim Soot, Dept. of Geography, University of Illinois at Chicago.

Examples

data(E7.6)
summary(E7.6)

The Contiguity Matrix for the 34 Areas in Northern Chicago

Description

This is the contiguity matrix for the 34 areas in northern Chicago, given in E7.6. Contains only 0's and 1's with the obvious interpretation.

Usage

data(E7.7)

Data on Lung Cancer Deaths and Cigarette Smoking

Description

The E8.12 data frame has 11 rows and 3 columns.

Usage

data(E8.12)

Format

This data frame contains the following columns:

Country: a character vector, the country.
y: a numeric vector, male deaths in 1950 for lung cancer, per million.
x: a numeric vector, per capita cigarette consumption in 1930.

Source

Tufte, (1974) Data Analysis for Politics and Policy. Englewood Cliffs, N.J.: Prentice-Hall. Data are adapted.

Examples

data(E8.12)
summary(E8.12)

Florida Cumulus Experiment Data

Description

The E8.13 data frame has 20 rows and 7 columns, giving data on the effects of cloud seeding by silver iodide crystals on precipitation. Each data point is one day.

Usage

data(E8.13)

Format

This data frame contains the following columns:

A: a factor with levels NoSeed Seed
T: a numeric vector, number of days after the first day of the experiment.
S: a numeric vector, relates to heights of clouds.
C: a numeric vector, percentage of clode cover in the experimental area.
P: a numeric vector, total rainfall in the study area one hour before seeding (in $10^7$ cubic meters).
E: a factor with levels Moving Stationary , indicating if the radar echo was mowing or not.
y: a numeric vector, the response, natural logarithm of precipitation in the target area in a 6-hour period (in $10^7$ cubic meters).

Source

Examples

data(E8.13)
summary(E8.13)
plot(E8.13)

Data on Transit Privatization

Description

The E9.11 data frame has 17 rows and 10 columns.

Usage

data(E9.11)

Format

This data frame contains the following columns:

V1: a numeric vector, average capacity of buses in service.
V2: a numeric vector, ratio of buses in use during non-peak periods to those in use in peak periods.
V3: a numeric vector, average speed.
V4: a numeric vector, vehicle-miles contracted.
V5: a numeric vector, distance of center from metroploitan area.
V6: a numeric vector, population of metropolitan area.
V7: a numeric vector, percentage of work trips in the metropolitan area that are made by transit.
V8: a numeric vector, Buses owned by sponsor / buses owned by contractor
V9: a numeric vector, per capita income for metropolitan area.
PCS: a numeric vector, per cent savings that occurred when some transit lines was given to private companies.

Source

Prof E.K.Morlok, Dept. of Systems Engineering, University of Pennsylvania.

Examples

data(E9.11)
summary(E9.11)
plot(E9.11)

Data Travel Times and Usage for Automobiles and Public Transportation

Description

The E9.18 data frame has 51 rows and 4 columns.

Usage

data(E9.18)

Format

This data frame contains the following columns:

t.a: a numeric vector, travel time by car, in tenth of minutes.
t.r: a numeric vector, travel time by public transportation, in tenth of minutes.
m.a: a numeric vector, number of those who used a car or van either as driver or passenger.
m.r: a numeric vector, number of people using any kind of public transportation.

Details

Travel times modified by one of the authors to reflect the cost of parking. For downtown zones (Chicago) this amounted to about 60 minutes.

Source

Selected by Robert Drozd from Census (US) Urban Transportation Planning Package, for the Chicago area.

Examples

data(E9.18)
summary(E9.18)
plot(E9.18)

Acceleration data

Description

The E9.19 data frame has 50 rows and 4 columns.

Usage

data(E9.19)

Format

This data frame contains the following columns:

ACC: a numeric vector, Acceleration of different vehicles.
WHP: a numeric vector, weight-to-horsepower ratio.
SP: a numeric vector, speed at which they were travelling.
G: a numeric vector, Grade of road, G=0 implies road was horizontal.

Source

Raj Tejwaney, Department of civil Engineering, University of Illinoi at Chicago.

Examples

data(E9.19)
summary(E9.19)
plot(E9.19)

Stadium Cleanup Data

Description

The E9.20 data frame has 16 rows and 3 columns.

Usage

data(E9.20)

Format

This data frame contains the following columns:

C: a numeric vector, cost of cleanup. Units forgotten.
R.HD: a numeric vector, sales at hot-dog stands. Units forgotten.
R.B: a numeric vector, sales at beer stands. Units forgotten.

Source

The authors of the book.

Examples

data(E9.20)
summary(E9.20)
plot(E9.20)

Depreciation in Market Value of Large Factories

Description

The E9.21 data frame has 11 rows and 2 columns.

Usage

data(E9.21)

Format

This data frame contains the following columns:

Age: a numeric vector, units not given, probably years.
Depr: a numeric vector, units not given.

Source

Diamond-stars Motors, Normal, Il. Gary Shultz, general Counsel, made this data available.

Examples

data(E9.21)
summary(E9.21)
plot(E9.21)

"Areas", lengths and widths of rectangles

Description

The E9.3 data frame has 50 rows and 3 columns. Made by random sampling numbers.

Usage

data(E9.3)

Format

This data frame contains the following columns:

y: a numeric vector, area of the rectangle.
x1: a numeric vector, length of the rectangle.
x2: a numeric vector, width of the ractangle.

Examples

data(E9.3)
E9.3.m1 <- lm(y ~ x1 + x2, data=E9.3)
attach(E9.3)
plot(x1, resid(E9.3.m1))
plot(x2, resid(E9.3.m1))
detach(E9.3)

Data on monthly rent, annual income and househould size

Description

The E9.8 data frame has 27 rows and 3 columns.

Usage

data(E9.8)

Format

This data frame contains the following columns:

R: a numeric vector, Monthly rent in dollars.
I: a numeric vector, annual income in 1000\$.
S: a numeric vector, household size.

Details

Example 9.8 in Sen and Srivastava, page 201.

Source

Selected by one of the authors from a much larger data set, collected from several sources about 20 years ago.

Examples

data(E9.8)
attach(E9.8)
E9.8.m1 <- lm(R ~ I + S, data=E9.8)
summary(E9.8.m1)
plot(I, resid(E9.8.m1, type="partial")[,"I"])
plot(S, resid(E9.8.m1, type="partial")[,"S"])
detach()

Data on asylum requests to the U.S. by country of origen of applicant

Description

The Ec.8 data frame has 112 rows and 5 columns.

Usage

data(Ec.8)

Format

This data frame contains the following columns:

Country: a character vector, containing country of origen of applicant.
APR: a numeric vector, number of successful applications.
DEN: a numeric vector, number of denied applications.
H: a numeric vector, 1 if country is considered hostile to the U.S., 0 en other case.
E: a numeric vector, 1 if country is European or mainly inhabited by people of european descent.

Source

Prof. Barbara Yarnold, Dept. of political science, Saginaw Valley State University, Saginaw, Michigan.

Examples

data(Ec.8)
summary(Ec.8)
attach(Ec.8)
Ec.8.m1 <- glm(cbind(APR, DEN) ~ E + H, data=Ec.8, family=binomial)
summary(Ec.8.m1)
detach()

U.S. Population in thousands, for exercise 7.7

Description

The Ex.7.7 data frame has 19 rows and 2 columns.

Usage

data(Ex.7.7)

Format

This data frame contains the following columns:

y: a numeric vector, U.S. population in thousands.
t: a numeric vector, year.

Source

Sen and Srivastava.

Examples

##---- Should be DIRECTLY executable !! ----
data(Ex.7.7)
with(Ex.7.7, plot(y ~ t))
summary(Ex.7.7)

Data on Effects of Air Pollution on Interpersonal Attraction

Description

The Ex4.4 data frame has 24 rows and 3 columns. An experiment was conducted to examine the effects of air pollution on interpersonal attraction. Twenty-four subjects were each placed with a stranger for a 15-minute period in a room which was either odor free or contaminated with ammonium sulfide. The stranger came from a culture which was similar or dissimilar to that of the subject. At the end of the encounter, each subject was asked to assess his degree of attraction towards the stranger on a likert scale of 1–10 with 10 indicating strong attraction.

Usage

data(Ex4.4)

Format

This data frame contains the following columns:

Likert: a numeric vector, attraction on a likert scale.
Odor: a factor with levels Free Odor, room was contaminated or not.
Culture: a factor with levels Dissimilar Similar, similar or dissimilar culture.

Source

The full data set is given in Srivastava and Carter (1983).

Examples

data(Ex4.4)
summary(Ex4.4)
plot(Ex4.4)