Autonomic Systems and e-Business Platforms

Overview: 

Cloud Computing will undoubtedly be the platform that will dominate the Business applications during the next decade. A Cloud can be composed of different types of platforms (heterogeneous clusters of hybrid hardware) and organizations (private, public, hybrid).  These will interact with millions of smart electronic devices and sensors that will generate a huge amount of data.  Furthermore it needs to evolve very fast in order to meet the new requirements of new heterogeneous business workloads. These include different types of jobs, like CPU-intensive, streaming, transactional, and especially data-intensive jobs.
In this scenario new requirements arise: Performance/Latency/Scalability, Availability/Reliability/Consistency, Energy/Power,  Privacy/Security amongst others. This has produced huge challenges for IT Resource Management. 
For BSC, Self-management is considered to be the solution to this complexity and a way to increase the adaptability of the execution environments to the dynamic behaviour of the reality dictated by this new paradigm.

Objectives: 

The aim of this department is to research autonomic and intelligent resource management for today's business applications. The objective is to create new components at middleware level that provides holistic solutions for some of the new IT challenges in the industry: Cloud Computing, Big Data, Business Analytics, High Performance Computing or Sustainable Computing.

 

Projects/Areas: 

Below is a list of some the main interrelated areas that are currently being looked at within our department:
 

  • Workload Management: focuses on workload management of heterogenenous workloads for next generation data centers. Special attention is put on Web workloads and extended Data Analytics programming models, such as MapReduce and Pregel, allowing for the introduction of user defined high level performance goals associated to the analytics processes.
  • Massively Distributed Data Stores: focuses on resource management strategies for BigData applications in the Cloud and faces three different challenges: a) defining  policies that enables Distributed Data Stores (e.g. NoSQL databases) to meet high level performance goals of BigData applications, that are a combination of consistency levels, energy consumption and response times; b) designing a non-centralized highly scalable resource management architecture; and c) to define interfaces with in-memory databases.
  • Embedded Domain Specific Languages for High Performance Computing: (eDSL for HPC)
    The hybrid and heterogeneous nature of Cloud Computing and HPC infrastructures challenge the programming models traditionally used to exploit these environments.  New paradigms are required to fill the growing gap between the vast amount of computational resources available and the large number of end users with disparate requirements. The separation of concerns is the key concept behind eDSL, which enables end users to focus on what to do and expert programmers on how to do it efficiently. The adoption of this paradigm shift has the potential to greatly improve the productivity of end users and Cloud environments in a unprecedented way.
  • Exploiting Heterogeneous Hardware: Power and complexity constraints are driving the development of new hardware architectures that clearly depart from traditional ones.
    To make the most of these new hardware designs a vertical research approach that encompasses runtime systems, OS and hardware is required. This holistic approach, which take into account both software and hardware components, is required to keep pace with the high scalability and efficiency demands of Cloud and HPC environments.
  • BLO-driven Management: Manage a Cloud provider in a federated scenario considering several factors in a synergistic way depending on Cloud provider’s business-level objectives (BLO), including the usage of TREC factors (Trust, Risk, Energy, Cost).  These type of management ease the Cloud usage for non-IT experts and allow providers to better exploit their resources.
  • High performance architectures for Big Data: focuses on exploring systems and software strategies for leveraging in-memory key/value databases to accelerate data intensive tasks, with particular attention to the IBM BlueGene Active Storage Fabrics (ASF) architecture.
  • Online predictors: Today's business systems require strategies and algorithms that let systems to adapt in front of new situations and scenarios. In our work, this adaption is based in learning patterns from those systems, users and processes, by obtaining information and discovering knowledge directly from the same system (e.g. using Machine Learning). Thanks to this learning, autonomic systems are able to improve their self-adaption over complex environments where finding experts is quite difficult or not possible.
  • Service-aware VM Management: Manage services efficiently on a federated Cloud scenario, aiming for service elasticity, SLA enforcement, and service high-availability by exploiting virtualization capabilities. This research is the base of some of the previous areas allowing to manage resources in the most cost-effective way on our scenarios.
  • Energy-aware Management: Manage a Cloud provider aiming to maximize energy and ecological efficiency when dealing with heterogeneous workloads and heterogeneous hardware. We also consider use green energy to power datacenters to allow saving money and reduce its CO2 footprint. Nevertheless, operating a datacenter with green energy is not straightforward because of the variability of the green sources. To overcome this problem, we propose different policies to manage this variability (e.g. develop scheduling policies that would allow running them when the green energy is available minimizing the usage of brown energy while fulfilling the tasks deadlines).

The group research distributions are:

  • EMOTIVE Cloud: Resource Management and Virtualization Framework.  BSC In Virtualized Environments) framework, which allows to simplify the development of new mis contributing the research community with the EMOTIVE (Elastic Management Of Tasksiddleware services for the Cloud. EMOTIVE framework is an open-source software infrastructure for implementing Cloud computing solutions that provides elastic and fully customized virtual environments in which to execute Cloud services. EMOTIVE abstracts a Cloud architecture using different layers and provides users with basic primitives for supporting the execution of services (features for resource allocation and monitoring, data management, live migration, and checkpointing, etc.). The core layer wraps each virtualized node and monitors its state, granting full control to the application of its execution environment without any risks to the underlying system or the other applications. In addition, it allows both local virtual machines (i.e. running in the provider's nodes) and remote virtual machines running in third-party providers such as Amazon EC2 to be managed in a federated environment. One of the main distinguishing features of EMOTIVE framework are their functionalities that ease the development of new resource management proposals, thus contributing to the innovation in this research area. At the moment there are different scheduler implementations with different capabilities using EMOTIVE framework, such as SERA, which supports semantic descriptions, EERM, which takes into account economical parameters, or proposals that take into account power-aware parameters. All the information related with installation and deployment can be found at www.emotivecloud.net. The emotive Cloud is available at https://emotivecloud.svn.sourceforge.net/svnroot/emotivecloud/. The EMOTIVE framework is done in collaboration with the Grid Computing and Clusters research group.
     
  • CellMT: A multi-threaded library for the CellBE. CellMT is a cooperative multithreading library that naturally overlaps the computation time of one thread with the transfer time of other threads inside the same SPU. This library provides a cooperative multi-threading model. So it relies on the threads themselves to relinquish control once they are at a context switch point. This cooperative multi-threading model is a perfect fit for any processor with a managed local store, such as the Cell processor, because the context switch points are easily identified. In fact, all the applications written for the Cell have this points explicitly identified by the memory flow control (MFC) operations used to wait for DMA request or Mailbox messages. The CellMT library provides a familiar and well understood programming model that is similar to the model used to split work across SPUs, so it does not increase the complexity of the application. Moreover, this technique is more prevalent than double buffering techniques because it does not need to know the next DMA request to be performed in advance, hence it is specially well suited for applications with non predictable memory accesses. This core library, is described in detail in the paper "Vicenc Beltran, David Carrera, Jordi Torres and Eduard Ayguade. CellMT: A Cooperative Multithreading Library for the Cell/B.E. To appear in the proceedings of the 16th Annual IEEE International Conference on High Performance Computing (HiPC09). Cochin, India. December 2009", the source code can be downloaded from http://sourceforge.net/projects/cellmt/, and in http://gsi.ac.upc.edu/reports/2009/27/tr_cellmt.pdf there is a detailed description of the library APIs, as well as, some additional code examples.

  • Adaptive MapReduce Scheduler: The Adaptive MapReduce Scheduler is an application-centric multi-job task scheduler for MapReduce workloads developed at BSC. It is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. The proposed scheduler relies on estimates of individual job completion times given a particular resource allocation, and uses these estimates so as to maximize each jobs chances of meeting its performance goal. The main objective of the task scheduling mechanism is to enable a MapReduce runtime to dynamically allocate resources in a cluster of machines based on the observed progress rate achieved by the jobs, and the completion time goal associated with each job. The scheduling technique targets a highly dynamic environment in which new jobs can be submitted at any time, and in which MapReduce workloads share physical resources with other workloads, either MapReduce or not. Thus, the actual amount of resources available for MapReduce applications can vary over time. The dynamic scheduler adjusts the resource allocation to all jobs, according to estimates on the completion time given a particular resource allocation. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. The current scheduler prototype is described in  paper (J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguade, M. Steinder, I. Whaley, Performance-driven task co-scheduling for mapreduce environments, in: NOMS 10: Proceedings of the 12th IEEE/IFIP Network Operations and Management Symposium, IEEE, Osaka, Japan, 2010) and downloadable at https://issues.apache.org/jira/browse/MAPREDUCE-1380.

 

Current involved projects:

  • NUBA project (Normalized Usage of Business-oriented Architectures) (2009-2012). NUBA is a strategic research  program (MITyC TSI-020301-2009-30) funded by the Avanza2  R&D Plan of the Spanish Ministry of Industry, Tourism and Trade and coordinated by  Telefonica I+D with 8 partners. The aim of NUBA is to advance the state-of-the-art in business models and technology for the real-time deployment  of federated Cloud platforms, integrating infrastructure from different providers, to execute elastic  business services with  the required QoS and minimizing the energy consumption. http://nuba.morfeo-project.org.
  • EU COST IC804 action "Energy efficiency in large scale distributed systems". (2009-2012) Member of the Management Committee and WG3 of this action that aims to foster original research initiatives  addressing energy awareness/savings in the field of energy efficiency in distributed systems. More information about this action can be found http://www.cost804.org.

  • IBM SOW-Active Storage Fabrics (ASF) is a collection of components that surround a parallel in-memory database (PIMD). PIMD is a parallel client, parallel server, key/value object store. This research is part of the MareIncognito research framework between IBM and BSC.

  • Barrelfish project, which is a new research operating system being built from scratch to explore how to structure an OS for future multi- and many-core systems. The design principles of Barrelfish are motivated by two closely related trends in hardware design: first, the rapidly growing number of cores, which leads to a scalability challenge, and second, the increasing diversity in computer hardware, requiring the OS to manage and exploit heterogeneous hardware resources.

  • OPTIMIS aims at optimizing cloud services using techniques that take advantage of an architectural framework and a development toolkit that take trust, risk, eco-efficiency, cost and legal issues into account. Our group contributes in the self-management of Cloud infrastructures using business information.

  • VENUS-C is focused on developing and deploying a Cloud Computing service for research and industry communities in Europe by offering an industrial-quality service-oriented platform based on virtualization technologies. Our group contributes with tools that allow user scenarios to exploit the facilities of Cloud infrastructures. .

Previous involved projects:

PEOPLE

PUBLICATIONS AND COMMUNICATIONS

2012

Goiri, Í., et al. Energy-efficient and Multifaceted Resource Management for Profit-driven Virtualized Data Centers. Future Generation Computer Systems Vol. 28 (5), 718-731 (2012).
Goiri, Í., et al. GreenHadoop: leveraging green energy in data-processing frameworks. 7th ACM european conference on Computer Systems (EuroSys'12) 57–70 (2012).doi:http://dx.doi.org/10.1145/2168836.2168843
Vaqué, A., Goiri, Í., Guitart, J. & Torres, J. EMOTIVE Cloud: The BSC's IaaS Open Source Solution for Cloud Computing. Open Source Cloud Computing Systems: Practices and Paradigms 44-60 (2012).doi:http://dx.doi.org/10.4018/978-1-4666-0098-0.ch003
Ferrer, A.J., et al. OPTIMIS: A holistic approach to cloud service provisioning. Future Generation Computer Systems Vol. 28 (1), 66 - 77 (2012).
Carrera, D., Steinder, M., Whalley, I., Torres, J. & Ayguadé, E. Autonomic Placement of Mixed Batch and Transactional Workloads. IEEE Transactions on Parallel and Distributed Systems 23, 219-231 (2012).
Goiri, Í., Julià, F., Fitó, O., Macías, M. & Guitart, J. Supporting CPU-based Guarantees in Cloud SLAs via Resource-level QoS Metrics. Future Generation Computer Systems (2012).doi:http://dx.doi.org/10.1016/j.future.2011.11.004
Goiri, Í., Guitart, J. & Torres, J. Economic Model of a Cloud Provider Operating in a Federated Cloud. Information Systems Frontiers (2012).doi:http://dx.doi.org/10.1007/s10796-011-9325-x

2011

Polo, J., et al. Resource-aware Adaptive Scheduling for MapReduce Clusters. ACM/IFIP/USENIX 12th International Middleware Conference (Middleware 2011) (2011).
Macías, M. & Guitart, J. Client Classification Policies for SLA Negotiation and Allocation in Shared Cloud Datacenters. Lecture Notes on Computer Science (LNCS), 8th International Workshop on Economics of Grids, Clouds, Systems, and Services (GECON'11) Vol. 7150, 90-104 (2011).
Alonso, J., Goiri, Í., Guitart, J., Gavalda, R. & Torres, J. Optimal Resource Allocation in a Virtualized Software Aging Platform with Software Rejuvenation. 22th IEEE International Symposium on Software Reliability Engineering (ISSRE'11) 250-259 (2011).doi:http://dx.doi.org/10.1109/ISSRE.2011.30

Pages