Data

From CEDPS

Jump to: navigation, search

Contents

Goals of the CEDPS Data Services Project

  • Develop tools and techniques for reliable, high-performance, secure, and policy-driven placement of data within a distributed science environment
    • Data placement and distribution services that implement different data distribution and placement behaviors
    • Managed Object Placement Service—enhancement to today’s GridFTP—that allows for management of
      • Space
      • Bandwidth
      • Connections
      • Other resources needed at endpoints of data transfers
    • Services to move computation to data

Data Placement

Documents and Presentations

Higher Level Data Placement Services

  • Decide where to place objects and replicas in the distributed Grid environment
  • Policy-driven, based on needs of application and the Virtual Organization
  • Effectively creates a placement workflow that is then passed to the Reliable Distribution *Service Layer for execution
  • Currently designing the first-generation data placement service
  • Seeking application input on the type of placement services they need
  • Options for Higher-Level Data Placement Services
    • Simplest: push or pull-based service that places explicit list of data items
    • Similar to existing Globus Data Replication Service (DRS)
    • Metadata- or subscription-based placement
      • Decide where data objects are placed based on results of metadata queries for data with certain attributes or subscriptions
      • Goal is to place data where it is likely to be accessed by scientists and/or used in performing computations
      • For example, to reduce data movement by a workflow engine
      • Examples: LIGO replication via LDR, PhedEx
    • N-Copies: maintain N copies of data items
      • Placement service checks existing replicas, creates/delete replicas to maintain N copies of each
      • Keeps track of lifetime of allocated storage space, migrates data as necessary
      • Goal is to maintain a required level of availability, durability for data
      • Example: UK QCDGrid
      • Some combination of these policies and others
      • Seeking applications that want help with data placement and management
      • Provide requirements to drive placement service design
      • Testing of placement services in application environments

Notes from July 20th F2F Meeting at Fermi

Rob presents service design document Question: should this be a stage-in or stage-out service, or just a staging service

Reliable Distribution Layer

• Responsible for carrying out the distribution or placement “plan” generated by higher-level services

• Examples:

• Globus Reliable File Transfer Service

• U Wisconsin Stork

• LBNL Data Mover Light

• EGEE FTS

• Provide feedback to higher level placement services on the outcome of the placement workflow

• Call on lower-level services to coordinate

The Managed Object Placement Service

Managed Object Placement Service (MOPS):


Basic MOPS Architecture Diagram


Building blocks:

• GridFTP server (Globus): add resource management

• Dan Fraser, John Bresnahan, Mike Link (ANL)


• NeST storage appliance (U Wisconsin): provides storage and connection management

• Nick LeRoy, Miron Livny (U Wisc)


• dCache storage management (Fermi): improve scalability and fault tolerance; jointly develop interfaces and interaction with GridFTP

• Gene Oleynick, Don Petravick, Ruth Podres (Fermi)


MOPS Features:

• Better internal resource management

• To overcome issues with GridFTP servers overwhelming resources

• Management interface

• System admins will prescribe resource limits for GridFTP service (maximum CPU, memory usage, connections, bandwidth)

• MOPS will report back on current state of its resources to administrative services, troubleshooting

• MOPS 0.9 release: • Provide automatic storage & network management

Data Technology Discussions

IRODS Data Discussion

Personal tools