Cloud computing is gaining acceptance among mainstream technology users. Storage cloud providers often employ Storage Area Networks (SANs) to provide elasticity, rapid adaptability to changing demands, and policy based automation. As storage capacity grows, the storage environment becomes heterogeneous, increasingly complex, harder to manage, and more expensive to operate.
This paper presents PGML (Policy Generation for largescale storage infrastructure configuration using Machine Learning), an automated, supervised machine learning framework for generation of best practices for SAN configuration that can potentially reduce configuration errors by up to 70% in a data center. A best practice or policy is nothing but a technique, guideline or methodology that, through experience and research, has proven to lead reliably to a better storage configuration. Given a standards-based representation of SAN management information, PGML builds on the machine learning constructs of inductive logic programming (ILP) to create a transparent mapping of hierarchical, object-oriented management information into multi-dimensional predicate descriptions. Our initial evaluation of PGML shows that given an input of SAN problem reports, it is able to generate best practices by analyzing these reports. Our simulation results based on extrapolated real-world problem scenarios demonstrate that ILP is an appropriate choice as a machine learning technique for this problem.