On Loss Functions for Deep Neural Networks in Classification

Katarzyna Janocha; Wojciech Marian Czarnecki

Titles

Abstract

Deep neural networks are currently among the most commonly used classifiers. Despite easily achieving very good performance, one of the best selling points of these models is their modular design – one can conveniently adapt their architecture to specific needs, change connectivity patterns, attach specialised layers, experiment with a large amount of activation functions, normalisation schemes and many others. While one can find impressively wide spread of various configurations of almost every aspect of the deep nets, one element is, in authors’ opinion, underrepresented – while solving classification problems, vast majority of papers and applications simply use log loss. In this paper we try to investigate how particular choices of loss functions affect deep models and their learning dynamics, as well as resulting classifiers robustness to various effects. We perform experiments on classical datasets, as well as provide some additional, theoretical insights into the problem. In particular we show that L1 and L2 losses are, quite surprisingly, justified classification objectives for deep nets, by providing probabilistic interpretation in terms of expected misclassification. We also introduce two losses which are not typically used as deep nets objectives and show that they are viable alternatives to the existing ones.

Keywords

loss function, deep learning, classification theory.

References

[1] Larochelle H., Bengio Y., Louradour J., Lamblin P., Exploring strategies for training deep neural networks. Journal of Machine Learning Research, 2009, 10 (Jan), pp. 1–40.
[2] Krizhevsky A., Sutskever I., Hinton G.E., Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 2012, pp. 1097–1105.
[3] Oord A.v.d., Dieleman S., Zen H., Simonyan K., Vinyals O., Graves A., Kalchbrenner N., Senior A., Kavukcuoglu K., Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
[4] Silver D., Huang A., Maddison C.J., Guez A., Sifre L., Van Den Driessche G., Schrittwieser J., Antonoglou I., Panneershelvam V., Lanctot M., et al., Mastering the game of go with deep neural networks and tree search. Nature, 2016, 529 (7587), pp. 484–489.
[5] Clevert D.A., Unterthiner T., Hochreiter S., Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.

[6] Kingma D., Ba J., Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.

[7] Tang Y., Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239, 2013.

[8] Lee C.Y., Xie S., Gallagher P., Zhang Z., Tu Z., Deeply-supervised nets. In:AISTATS. vol. 2., 2015, pp. 6.
[9] Choromanska A., Henaff M., Mathieu M., Arous G.B., LeCun Y., The loss surfaces
of multilayer networks. In: AISTATS, 2015.
[10] Czarnecki W.M., Jozefowicz R., Tabor J., Maximum entropy linear manifold for learning discriminative low-dimensional representation. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2015, pp. 52–67. 59
[11] LeCun Y., Cortes C., Burges C.J., The mnist database of handwritten digits, 1998.
[12] Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R., Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15 (1), pp. 1929–1958.
[13] Principe J.C., Xu D., Fisher J., Information theoretic learning. Unsupervised
adaptive filtering, 2000, 1, pp. 265–319.

Information

Information: Schedae Informaticae, 2016, Volume 25, pp. 49 - 59

DOI: https://doi.org/10.4467/20838476SI.16.004.6185

Article type: Original article

Titles:

Polish:

On Loss Functions for Deep Neural Networks in Classification

English:

On Loss Functions for Deep Neural Networks in Classification

Authors

Katarzyna Janocha

Faculty of Mathematics and Computer Science, Jagiellonian University, Krakow, Poland

Wojciech Marian Czarnecki

Department of Mathematics Faculty of Mathematics and Computer Science Jagiellonian University, ul. Łojasiewicza 6, 30-348 Kraków, Poland

Published at: 24.03.2017

Article status: Open

Licence: None

Percentage share of authors:

Katarzyna Janocha (Author) - 50%

Wojciech Marian Czarnecki (Author) - 50%

Article corrections:

-

Publication languages:

English

View count: 4329

Number of downloads: 11236

<p> On Loss Functions for Deep Neural Networks in Classification</p>

Pobierz pełny tekst

Titles

Abstract

Keywords

References

Information

Authors