A neural network prediction of the time it takes to sell an item since listing¶
A sample of 10000 listed items was taken from German database and exported to CSV from Hive
We will use nnet and caret packages to create a neural network and train it
In [1]:
require(nnet)
require(caret)
Items are read and the metrics are normalised for more convenient analysis
In [3]:
items <- read.csv("listing.csv")
items$trukme <- as.numeric(as.Date(items$sale_date)-as.Date(items$local_date))
items$when <- as.numeric(items$days_since_registration - items$trukme)
items$when_n <- (items$when-min(items$when))/(max(items$when)-min(items$when))
items$trukme_n <- (items$trukme-min(items$trukme))/(max(items$trukme)-min(items$trukme))
items$days_since_registration_n <- (items$days_since_registration-min(items$days_since_registration))/(max(items$days_since_registration)-min(items$days_since_registration))
items$listing_price_eur_fixed_n <- (items$listing_price_eur_fixed-min(items$listing_price_eur_fixed))/(max(items$listing_price_eur_fixed)-min(items$listing_price_eur_fixed))
foo <- data.frame(items$status, items$category, items$trukme_n, items$days_since_registration_n, items$listing_price_eur_fixed_n, items$when_n)
names(foo) <- c("status", "category", "trukme", "days_since_reg", "price", "when")
We split data into 70% training set and 30% test set
In [4]:
index <- sample(1:nrow(foo),round(0.7*nrow(foo)))
train <- foo[index,]
test <- foo[-index,]
Here is the Neural Network model itself
In [5]:
model <- train(trukme ~ status + price + when,
train, method='nnet', linout=1, trace=F, maxit=10000, MaxNWts=30000,
tuneGrid=expand.grid(.size=c(2, 3, 5),.decay=c(0.001, 0.01, 0.1)))
model
Out[5]:
Do the predictions based on the model and plot the results
In [7]:
ps <- predict(model, test)
plot(ps, test$trukme, xlab="NN predicted", ylab="Real data")
Compare this to the simple linear regression
In [8]:
lm.fit <- glm(trukme ~ status + price + when, data=train)
summary(lm.fit)
pr.lm <- predict(lm.fit,test)
plot(pr.lm, test$trukme)
Out[8]:
And a comparison of MSE of residuals in Neural Network and Linear Regression model
In [9]:
MSE.lm <- sum((pr.lm-test$trukme)^2)/nrow(test)
MSE.nn <- sum((ps-test$trukme)^2)/nrow(test)
c(MSE.lm, MSE.nn)
Out[9]:
Conclusion: Neural network does predict the listing to sale time better than a simple linear model
In [ ]: