Abstract
Networked data are ubiquitous in this era of the social, economical and technological revolution resting on the backbone of the internet. With the spread of mobile phones, sensors, embedded devices and industrial robots, the ability to collect and generate interdependent data is on an all time high. This enormity of data is supplemented with social networks like Facebook, Tumblr, LinkedIn, Twitter and many others, that connect people around the globe as a network, allowing them to share the data they collect, generate or distribute, in real-time. Further, with the 'Internet of things', appliances, vehicles and wearable technological devices can communicate with each other. The resulting trend is better and bigger ways of "collecting, creating, managing and storing of data" also known as Big Data (White House, 2014). Most often, such big data capture a rich structure of inter-relationships, resulting in either an extrinsic graphical pattern as in social and sensor networks or dependencies that can be modelled as a graph, like neighbouring pixels sharing similar intensities, in real-time 3D scene capture images from the autonomous vehicle cameras. More often, the nature of such data is such that it is streamed in real-time from multiple sources as in the case of the network of sensors. The data has a sequential nature as seen in product recommendation based on the user clickthrough rate. The data is dynamic as in the case of evolving blog communities or shifting in pattern as often seen in the example of trending tweets. In reality, even with the abundance of data, only a small percentage of it has available labelled information or annotation that can be used to model and categorize the vast quantities of unlabelled data. Naturally, the questions that arise are: how to program computers to automatically learn the underlying model and predict on the fly from streaming data? How does the computer algorithm capture the sequential nature of events? Further, more complex questions are can the algorithms guarantee that they will be efficient regardless of any sequence of data they see, in any order? Are these methods adaptive enough to predict based on dynamic real-time changes or how can the algorithms learn the unknown labels from the few available labels for very large and sparse networked data?
- Ghosh, S., Lovell, C. J., and Gunn, S. R. (2013). Towards pareto descent directions in sampling experts for multiple tasks in an on-line learning paradigm. In AAAI Spring Symposium, volume 13 of SS-13-05. AAAI Press.Google Scholar
- Ghosh, S. and Prügel-Bennett, A. (2015a). Ising bandits with side information. In ECML PKDD, volume 9284 of LNCS, pages 448--463. Springer.Google Scholar
- Ghosh, S. and Prügel-Bennett, A. (2015b). Online mean field approximation for automated experimentation. In ICML MLIS, volume 43, pages 31--35. JMLR.Google Scholar
- Herbster, M., Pasteris, S., and Ghosh, S. (2015). Online prediction at the limit of zero temperature. In NIPS, pages 2917--2925.Google Scholar
- Picard, J.-C. and Queyranne, M. (1980). On the structure of all minimum cuts in a network and applications. Springer.Google Scholar
- White House. (2014). Big data: Seizing opportunities, preserving values. Washington, DC: Executive Office of the President.Google Scholar
Index Terms
- Online machine learning for networked data
Recommendations
Scalable machine-learning algorithms for big data analytics: a comprehensive review
Big data analytics is one of the emerging technologies as it promises to provide better insights from huge and heterogeneous data. Big data analytics involves selecting the suitable big data storage and computational framework augmented by scalable ...
Big data for online learning systems
In recent years, Online learning systems have met big challenges, especially due to rapid changes in technology, the gigantic amounts of data to be stored and manipulated, the large number of learners and the diversity of educational resources. As a ...
Comments