skip to main content
10.1145/1389095.1389100acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Scaling ant colony optimization with hierarchical reinforcement learning partitioning

Published: 12 July 2008 Publication History

Abstract

This paper merges hierarchical reinforcement learning (HRL) with ant colony optimization (ACO) to produce a HRL ACO algorithm capable of generating solutions for large domains. This paper describes two specific implementations of the new algorithm: the first a modification to Dietterich's MAXQ-Q HRL algorithm, the second a hierarchical ant colony system algorithm. These implementations generate faster results, with little to no significant change in the quality of solutions for the tested problem domains. The application of ACO to the MAXQ-Q algorithm replaces the reinforcement learning, Q-learning, with the modified ant colony optimization method, Ant-Q. This algorithm, MAXQ-AntQ, converges to solutions not significantly different from MAXQ-Q in 88% of the time. This paper then transfers HRL techniques to the ACO domain and traveling salesman problem (TSP). To apply HRL to ACO, a hierarchy must be created for the TSP. A data clustering algorithm creates these subtasks, with an ACO algorithm to solve the individual and complete problems. This paper tests two clustering algorithms, k-means and G-means. The results demonstrate the algorithm with data clustering produces solutions 20 times faster with 5-10% decrease in solution quality due to the effects of clustering.

References

[1]
A. G. Barto and S. Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(1-2):41--77, 2003.
[2]
E. Bonabeau, M. Dorigo, and G. Theraulaz. Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, Inc., 198 Madison Avenue, New York, New York 10016, 1999.
[3]
T. G. Dietterich. Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artificial Intelligence, (13):227--303, Nov. 2000.
[4]
M. Dorigo, M. Birattari, and T. Stutzle. Ant colony optimization. Computational Intellligence Magazine, IEEE, 1(4):28--39, 2006.
[5]
M. Dorigo and L. M. Gambardella. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1):53--66, April 1997.
[6]
L. M. Gambardella and M. Dorigo. Ant-q: A reinforcement learning approach to the traveling salesman problem. In International Conference on Machine Learning, pages 252--260, 1995.
[7]
G. Hamerly and C. Elkan. Learning the k in k-means. In Proceedings of the Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), pages 281--288, 2003.
[8]
T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Technical report, Cambridge, MA, USA, 1993.
[9]
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Symposium on Mathematical Statistics and Probabilities, pages 281--297, 1967.
[10]
R. Parr and S. Russell. Reinforcement learning with hierarchies of machines, Jan. 17 1997.
[11]
D. Pelleg and A. Moore. X-means: Extending k-means with e±cient estimation of the number of clusters. ICML 2000, 2000.
[12]
G. Reinelt. Tsplib 95, 2007.
[13]
W. D. Smart. Explicit manifold representations for value-functions in reinforcement learning. In Proceedings of the Eighth International Symposium on Artificial Intelligence and Mathematics, January 2004. Paper number AI&M 25-2004.
[14]
T. Stutzle and H. Hoos. The max-min ant system and local search for combinatorial optimization problems, 1999.
[15]
R. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, 1998.
[16]
R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif. Intell., 112(1-2):181--211, 1999.
[17]
C. Walshaw. A Multilevel Approach to the Travelling Salesman Problem. Tech. Rep. 00/IM/63, Comp. Math. Sci., Univ. Greenwich, London SE10 9LS, UK, August 2000.
[18]
C. Watkins. Learning From Delayed Rewards. PhD thesis, King's College, Oxford, May 1989.

Cited By

View all
  • (2015)Swarm Intelligence in Multiple and Many Objectives Optimization: A Survey and Topical Study on EEG Signal AnalysisMulti-objective Swarm Intelligence10.1007/978-3-662-46309-3_2(27-73)Online publication date: 11-Mar-2015
  • (2011)WoLF Ant2011 IEEE Congress of Evolutionary Computation (CEC)10.1109/CEC.2011.5949726(995-1002)Online publication date: Jun-2011
  • (2010)Adaptive decision making in ant colony system by reinforcement learningProceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I10.5555/1939659.1939738(609-617)Online publication date: 22-Nov-2010
  • Show More Cited By

Index Terms

  1. Scaling ant colony optimization with hierarchical reinforcement learning partitioning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation
      July 2008
      1814 pages
      ISBN:9781605581309
      DOI:10.1145/1389095
      • Conference Chair:
      • Conor Ryan,
      • Editor:
      • Maarten Keijzer
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 July 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. ant colony optimization
      2. hierarchical reinforcement learning
      3. swarm intelligence

      Qualifiers

      • Research-article

      Conference

      GECCO08
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 15 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Swarm Intelligence in Multiple and Many Objectives Optimization: A Survey and Topical Study on EEG Signal AnalysisMulti-objective Swarm Intelligence10.1007/978-3-662-46309-3_2(27-73)Online publication date: 11-Mar-2015
      • (2011)WoLF Ant2011 IEEE Congress of Evolutionary Computation (CEC)10.1109/CEC.2011.5949726(995-1002)Online publication date: Jun-2011
      • (2010)Adaptive decision making in ant colony system by reinforcement learningProceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I10.5555/1939659.1939738(609-617)Online publication date: 22-Nov-2010
      • (2010)Adaptive Decision Making in Ant Colony System by Reinforcement LearningNeural Information Processing. Theory and Algorithms10.1007/978-3-642-17537-4_74(609-617)Online publication date: 2010

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media