On Loss Functions for Deep Neural Networks in Classification
Choose format
RIS BIB ENDNOTEOn Loss Functions for Deep Neural Networks in Classification
Publication date: 24.03.2017
Schedae Informaticae, 2016, Volume 25, pp. 49-59
https://doi.org/10.4467/20838476SI.16.004.6185Authors
On Loss Functions for Deep Neural Networks in Classification
Deep neural networks are currently among the most commonly used classifiers. Despite easily achieving very good performance, one of the best selling points of these models is their modular design – one can conveniently adapt their architecture to specific needs, change connectivity patterns, attach specialised layers, experiment with a large amount of activation functions, normalisation schemes and many others. While one can find impressively wide spread of various configurations of almost every aspect of the deep nets, one element is, in authors’ opinion, underrepresented – while solving classification problems, vast majority of papers and applications simply use log loss. In this paper we try to investigate how particular choices of loss functions affect deep models and their learning dynamics, as well as resulting classifiers robustness to various effects. We perform experiments on classical datasets, as well as provide some additional, theoretical insights into the problem. In particular we show that L1 and L2 losses are, quite surprisingly, justified classification objectives for deep nets, by providing probabilistic interpretation in terms of expected misclassification. We also introduce two losses which are not typically used as deep nets objectives and show that they are viable alternatives to the existing ones.
[1] Larochelle H., Bengio Y., Louradour J., Lamblin P., Exploring strategies for training deep neural networks. Journal of Machine Learning Research, 2009, 10 (Jan), pp. 1–40.
[2] Krizhevsky A., Sutskever I., Hinton G.E., Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 2012, pp. 1097–1105.
[3] Oord A.v.d., Dieleman S., Zen H., Simonyan K., Vinyals O., Graves A., Kalchbrenner N., Senior A., Kavukcuoglu K., Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
[4] Silver D., Huang A., Maddison C.J., Guez A., Sifre L., Van Den Driessche G., Schrittwieser J., Antonoglou I., Panneershelvam V., Lanctot M., et al., Mastering the game of go with deep neural networks and tree search. Nature, 2016, 529 (7587), pp. 484–489.
[5] Clevert D.A., Unterthiner T., Hochreiter S., Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
[6] Kingma D., Ba J., Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[7] Tang Y., Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239, 2013.
[8] Lee C.Y., Xie S., Gallagher P., Zhang Z., Tu Z., Deeply-supervised nets. In:AISTATS. vol. 2., 2015, pp. 6.
[9] Choromanska A., Henaff M., Mathieu M., Arous G.B., LeCun Y., The loss surfaces of multilayer networks. In: AISTATS, 2015.
[10] Czarnecki W.M., Jozefowicz R., Tabor J., Maximum entropy linear manifold for learning discriminative low-dimensional representation. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2015, pp. 52–67. 59
[11] LeCun Y., Cortes C., Burges C.J., The mnist database of handwritten digits, 1998.
[12] Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R., Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15 (1), pp. 1929–1958.
[13] Principe J.C., Xu D., Fisher J., Information theoretic learning. Unsupervised adaptive filtering, 2000, 1, pp. 265–319.
Information: Schedae Informaticae, 2016, Volume 25, pp. 49-59
Article type: Original scientific article
Faculty of Mathematics and Computer Science, Jagiellonian University, Krakow, Poland
Department of Mathematics Faculty of Mathematics and Computer Science Jagiellonian University, ul. Łojasiewicza 6, 30-348 Kraków, Poland
Published at: 24.03.2017
Article status: Open
Licence: None
Percentage share of authors:
Article corrections:
-Publication languages:
EnglishView count: 5505
Number of downloads: 12436