Deep Fashion
Imitating the shopper's mind
This is a work I did more than a year ago. In this post I document very briefly how I pursued a machine learning idea. I started out by using Dl4J a Java based deep learning library.
I soon realized that the community support (stackoverlow, medium), tools (scikitlearn, matplotlib, numpy) and infrastructure (aws, floydhub) are quite limited. It made more sense to do this in Python.
I have followed examples of my Java exploration with my jupyter notebooks downloaded from floydhub where they ran.
This just a sneak preview into the larger codebase. You may go to the Github links to see the code.
Can a machine imitate the mind of a fashion shopper?
When an retail user views an image of a fashion apparel, her/his/their mind (yes mind, not brain) goes through a complex decision making process. This leads to the decision on whether to buy or pass on the object. Can a neural network imitate this decision making process?
Can you predict the sales of an apparel based on its catalog photograph?
This is a machine learning project. This is a Java project that uses DL4J neural networks. This project has the code to fetch, clean and create the dataset from a relational database.
This product is intended as a Business Intelligence software. It uses convolutional neural networks to "look" at images of products and corresponding historical sales data.
What if we reformulate this question into a multilabel classification problem?
Can a simple feed forward MLP do this?
I started out with DL4J
public MultiLayerNetwork getModel() { | |
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().seed(6) | |
.activation(Activation.SIGMOID).weightInit(WeightInit.XAVIER).updater(new Sgd(0.1)) | |
.l2(1e-4).list().layer(0, new DenseLayer.Builder().nIn(494).nOut(300).build()) | |
.layer(1, new DenseLayer.Builder().nIn(300).nOut(300).build()) | |
.layer(2, new DenseLayer.Builder().nIn(300).nOut(300).build()) | |
.layer(3, new DenseLayer.Builder().nIn(300).nOut(300).build()) | |
.layer(4, new DenseLayer.Builder().nIn(300).nOut(300).build()) | |
.layer(5, new DenseLayer.Builder().nIn(300).nOut(10).build()) | |
.layer(6, new DenseLayer.Builder().nIn(10).nOut(3).build()) | |
.layer(7, | |
new OutputLayer.Builder( | |
LossFunctions.LossFunction.MEAN_SQUARED_LOGARITHMIC_ERROR) | |
.activation(Activation.RELU).nIn(3).nOut(1).build()) | |
// .backprop(true) | |
// .pretrain(false) | |
.build(); | |
MultiLayerNetwork model = | |
new MultiLayerNetwork(conf); | |
// load("datafashion\\model-amount.txt"); | |
return model; | |
} |
Can VGG16 imitate the mind of a shopper?
public void run(String[] args) throws Exception { | |
enableCuda(); | |
log.info("Load data...."); | |
/** | |
* cd Data Setup -> organize and limit data file paths: - mainPath = path to | |
* image files - fileSplit = define basic dataset split with limits on format - | |
* pathFilter = define additional file load filter to limit size and balance | |
* batch content | |
**/ | |
ParentPathLabelGenerator labelMaker = new ParentPathLabelGenerator(); | |
File mainPath = new File("datafashion\\images-resize-240-320\\Number\\resize"); | |
FileSplit fileSplit = new FileSplit(mainPath, NativeImageLoader.ALLOWED_FORMATS, rng); | |
int numExamples = toIntExact(fileSplit.length()); | |
int numLabels = fileSplit.getRootDir().listFiles(File::isDirectory).length; // This only works if your root is | |
// clean: only label subdirs. | |
System.out.println(numLabels); | |
System.out.println(numExamples); | |
BalancedPathFilter pathFilter = new BalancedPathFilter(rng, labelMaker, numExamples, numLabels, | |
maxPathsPerLabel); | |
InputSplit[] inputSplit = fileSplit.sample(pathFilter, splitTrainTest, 1 - splitTrainTest); | |
InputSplit trainData = inputSplit[0]; | |
InputSplit testData = inputSplit[1]; | |
log.info("Build model...."); | |
ZooModel zooModel = VGG16.builder().build(); | |
ComputationGraph vgg16 = (ComputationGraph) zooModel.initPretrained(PretrainedType.IMAGENET); | |
log.info(vgg16.summary()); | |
// Decide on a fine tune configuration to use. | |
// In cases where there already exists a setting the fine tune setting will | |
// override the setting for all layers that are not "frozen". | |
FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder().updater(new Nesterovs(5e-5)).seed(seed) | |
.build(); | |
// Construct a new model with the intended architecture and print summary | |
ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(vgg16).fineTuneConfiguration(fineTuneConf) | |
.setFeatureExtractor("fc2") | |
// the specified layer and below are "frozen" | |
.removeVertexKeepConnections("predictions") | |
// replace the functionality of the final vertex | |
.addLayer("predictions", | |
new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD).nIn(4096) | |
.nOut(numLabels).weightInit(new NormalDistribution(0, 0.2 * (2.0 / (4096 + numLabels)))) | |
.activation(Activation.SOFTMAX).build(), | |
"fc2") | |
.build(); | |
log.info(vgg16Transfer.summary()); | |
// vgg16Transfer.setListeners(new ScoreIterationListener(100)); | |
// UIServer uiServer = UIServer.getInstance(); | |
// StatsStorage statsStorage = new InMemoryStatsStorage(); | |
// uiServer.attach(statsStorage); | |
// vgg16Transfer.setListeners(new StatsListener(statsStorage), new ScoreIterationListener(1)); | |
ImageRecordReader recordReader = new ImageRecordReader(height, width, channels, labelMaker); | |
DataSetIterator trainIter; | |
recordReader.initialize(trainData, null); | |
trainIter = new RecordReaderDataSetIterator(recordReader, batchSize, 1, numLabels); | |
recordReader = new ImageRecordReader(height, width, channels, labelMaker); | |
DataSetIterator testIter; | |
recordReader.initialize(testData, null); | |
testIter = new RecordReaderDataSetIterator(recordReader, batchSize, 1, numLabels); | |
Evaluation eval; | |
eval = vgg16Transfer.evaluate(testIter); | |
log.info("Eval stats BEFORE fit....."); | |
log.info(eval.stats() + "\n"); | |
testIter.reset(); | |
int iter = 0; | |
while (trainIter.hasNext()) { | |
vgg16Transfer.fit(trainIter.next()); | |
if (iter % 10 == 0) { | |
log.info("Evaluate model at iter " + iter + " ...."); | |
eval = vgg16Transfer.evaluate(testIter); | |
log.info(eval.stats()); | |
testIter.reset(); | |
} | |
iter++; | |
} | |
log.info("Model build complete"); | |
} |
Python & Keras
Hang on, Java is just to cumbersome for such experimentation. It's a bit inconvenient. Let's try this in Python. We can deploy an enterprise level system in Java later.
Inception
Turns out that Inception gives a better result over VGG16. It appears that the network can tell if a product WILL NOT do well. But it can not tell if it can do well.
Notice the Precision and Recall of "False" or 0" are relatively high.
Inception plus Other Data points
When you add other data points such as price, there is some improvement. Yet, it is a good negative guesser. It can guess what the shopper does not like.
Only Data Points?
Oddly, even just the tabular db data and no images also yield acceptable result. Is this just a simple logistical regression problem? Is the shopper that predictable?
Please read the code for detailed exploration. I have no added my other visualizations. I do not want to reveal the retail chain whose images and data I am using. Nor do I want to make public the data set.