ABSTRACT
The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are effficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms.
The biggest challenge in proving complexity results is to establish hardness of improper learning (a.k.a. representation independent learning). The difficulty in proving lower bounds for improper learning is that the standard reductions from NP-hard problems do not seem to apply in this context. There is essentially only one known approach to proving lower bounds on improper learning. It was initiated in [21] and relies on cryptographic assumptions.
We introduce a new technique for proving hardness of improper learning, based on reductions from problems that are hard on average. We put forward a (fairly strong) generalization of Feige's assumption [13] about the complexity of refuting random constraint satisfaction problems. Combining this assumption with our new technique yields far reaching implications. In particular,
• Learning DNF's is hard.
• Agnostically learning halfspaces with a constant approximation ratio is hard.
• Learning an intersection of ω(1) halfspaces is hard.
Supplemental Material
- M. Alekhnovich. More on average case vs approximation complexity. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on, pages 298--307. IEEE, 2003. Google ScholarDigital Library
- B. Applebaum, B. Barak, and D. Xiao. On basing lower-bounds for learning on worst-case assumptions. In Foundations of Computer Science, 2008. FOCS'08. IEEE 49th Annual IEEE Symposium on, pages 211--220. IEEE, 2008. Google ScholarDigital Library
- S. Arora, L. Babai, J. Stern, and Z. Sweedyk. The hardness of approximate optima in lattices, codes, and systems of linear equations. In Foundations of Computer Science, 1993. Proceedings., 34th Annual Symposium on, pages 724--733. IEEE, 1993. Google ScholarDigital Library
- B. Barak, G. Kindler, and D. Steurer. On the optimality of semidefinite relaxations for average-case and generalized constraint satisfaction. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 197--214. ACM, 2013. Google ScholarDigital Library
- P. Beame, R. Karp, T. Pitassi, and M. Saks. On the complexity of unsatisfiability proofs for random k-cnf formulas. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 561--571. ACM, 1998. Google ScholarDigital Library
- P. Beame and T. Pitassi. Simplified and improved resolution lower bounds. In Foundations of Computer Science, 1996. Proceedings., 37th Annual Symposium on, pages 274--282. IEEE, 1996. Google ScholarDigital Library
- E. Ben-Sasson and A. Wigderson. Short proofs are narrow-resolution made simple. In Proceedings of the thirty-first annual ACM symposium on Theory of computing, pages 517--526. ACM, 1999. Google ScholarDigital Library
- Q. Berthet and P. Rigollet. Computational lower bounds for sparse pca. In COLT, 2013.Google Scholar
- C. M. Bishop. Neural networks for pattern recognition. Oxford university press, 1995. Google ScholarCross Ref
- A. Blum, A. Kalai, and H. Wasserman. Noise-tolerant learning, the parity problem, and the statistical query model. Journal of the ACM (JACM), 50(4):506--519, 2003. Google ScholarDigital Library
- A. Coja-Oghlan, C. Cooper, and A. Frieze. An efficient sparse regularity concept. SIAM Journal on Discrete Mathematics, 23(4):2000--2034, 2010. Google ScholarDigital Library
- A. Daniely, N. Linial, and S. Shalev-Shwartz. More data speeds up training time in learning halfspaces over sparse vectors. In NIPS, 2013.Google Scholar
- U. Feige. Relations between average case complexity and approximation complexity. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 534--543. ACM, 2002. Google ScholarDigital Library
- U. Feige and E. Ofek. Easily refutable subformulas of large random 3cnf formulas. In Automata, languages and programming, pages 519--530. Springer, 2004.Google ScholarCross Ref
- V. Feldman, P. Gopalan, S. Khot, and A. Ponnuswami. New results for learning noisy parities and halfspaces. In In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, 2006. Google ScholarDigital Library
- A. Haken. The intractability of resolution. Theoretical Computer Science, 39:297--308, 1985.Google ScholarCross Ref
- J. Håstad. Some optimal inapproximability results. Journal of the ACM (JACM), 48(4):798--859, 2001. Google ScholarDigital Library
- S. Huang. Approximation resistance on satisfiable instances for predicates strictly dominating parity. 2012.Google Scholar
- S. Huang. Approximation resistance on satisfiable instances for predicates with few accepting inputs. In STOC, 2013. Google ScholarDigital Library
- A. T. Kalai, A. R. Klivans, Y. Mansour, and R. A. Servedio. Agnostically learning halfspaces. SIAM Journal on Computing, 37(6):1777--1805, 2008. Google ScholarDigital Library
- M. Kearns and L. G. Valiant. Cryptographic limitations on learning Boolean formulae and finite automata. In STOC, pages 433--444, May 1989. Google ScholarDigital Library
- M. Kharitonov. Cryptographic hardness of distribution-specific learning. In Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pages 372--381. ACM, 1993. Google ScholarDigital Library
- S. Khot. On the power of unique 2-prover 1-round games. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 767--775. ACM, 2002. Google ScholarDigital Library
- S. Khot and R. Saket. On the hardness of learning intersections of two halfspaces. Journal of Computer and System Sciences, 77(1):129--141, 2011. Google ScholarDigital Library
- A. R. Klivans and R. O'Donnell. Learning intersections and thresholds of halfspaces. In Foundations of Computer Science, 2002. Proceedings. The 43rd Annual IEEE Symposium on, pages 177--186. IEEE, 2002. Google ScholarDigital Library
- A. R. Klivans and R. Servedio. Learning dnf in time 2O(n1/3). In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 258--265. ACM, 2001. Google ScholarDigital Library
- A. R. Klivans and A. A. Sherstov. Cryptographic hardness for learning intersections of halfspaces. In FOCS, 2006. Google ScholarDigital Library
- N. Linial, Y. Mansour, and N. Nisan. Constant depth circuits, Fourier transform, and learnability. In FOCS, pages 574--579, Oct. 1989. Google ScholarDigital Library
- Y. Mansour. An o(n log log n) learning algorithm for dnf under the uniform distribution. Journal of Computer and System Sciences, 50(3):543--550, 1995. Google ScholarDigital Library
- L. Pitt and L. Valiant. Computational limitations on learning from examples. Journal of the Association for Computing Machinery, 35(4):965--984, Oct. 1988. Google ScholarDigital Library
- P. Raghavendra. Optimal algorithms and inapproximability results for every csp? In Proceedings of the 40th annual ACM symposium on Theory of computing, pages 245--254. ACM, 2008. Google ScholarDigital Library
- F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386--407, 1958. (Reprinted in Neurocomputing (MIT Press, 1988).). Google ScholarDigital Library
- R. Schapire. The strength of weak learnability. In FOCS, pages 28--33, Oct. 1989. Google ScholarDigital Library
- L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134--1142, Nov. 1984. Google ScholarDigital Library
- V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.Google ScholarDigital Library
Index Terms
- From average case complexity to improper learning complexity
Recommendations
Relations between Average-Case and Worst-Case Complexity
The consequences of the worst-case assumption NP=P are very well understood. On the other hand, we only know a few consequences of the analogous average-case assumption “NP is easy on average.” In this paper we establish several new results on the worst-...
Structural Average Case Complexity
Levin introduced an average-case complexity measure, based on a notion of “polynomial on average,” and defined “average-case polynomial-time many-one reducibility” among randomized decision problems. We generalize his notions of average-case complexity ...
Comments