Data
From CEDPS
Contents |
Goals of the CEDPS Data Services Project
- Develop tools and techniques for reliable, high-performance, secure, and policy-driven placement of data within a distributed science environment
- Data placement and distribution services that implement different data distribution and placement behaviors
- Managed Object Placement Service—enhancement to today’s GridFTP—that allows for management of
- Space
- Bandwidth
- Connections
- Other resources needed at endpoints of data transfers
- Services to move computation to data
Data Placement
Documents and Presentations
- Data Placement Service Design draft (PDF) by Robert Schuler, ISI
- Data Placement for Scientific Applications in Distributed Environments, Ann Chervenak, Ewa Deelman, Miron Livny, Mei-Hui Su, Rob Schuler, Shishir Bharathi, Gaurang Mehta, Karan Vahi, to appear in Proceedings of 8th IEEE/ACM International Conference on Grid Computing (Grid 2007), Austin, TX, 2007.
Higher Level Data Placement Services
- Decide where to place objects and replicas in the distributed Grid environment
- Policy-driven, based on needs of application and the Virtual Organization
- Effectively creates a placement workflow that is then passed to the Reliable Distribution *Service Layer for execution
- Currently designing the first-generation data placement service
- Seeking application input on the type of placement services they need
- Options for Higher-Level Data Placement Services
- Simplest: push or pull-based service that places explicit list of data items
- Similar to existing Globus Data Replication Service (DRS)
- Metadata- or subscription-based placement
- Decide where data objects are placed based on results of metadata queries for data with certain attributes or subscriptions
- Goal is to place data where it is likely to be accessed by scientists and/or used in performing computations
- For example, to reduce data movement by a workflow engine
- Examples: LIGO replication via LDR, PhedEx
- N-Copies: maintain N copies of data items
- Placement service checks existing replicas, creates/delete replicas to maintain N copies of each
- Keeps track of lifetime of allocated storage space, migrates data as necessary
- Goal is to maintain a required level of availability, durability for data
- Example: UK QCDGrid
- Some combination of these policies and others
- Seeking applications that want help with data placement and management
- Provide requirements to drive placement service design
- Testing of placement services in application environments
Notes from July 20th F2F Meeting at Fermi
Rob presents service design document Question: should this be a stage-in or stage-out service, or just a staging service
Reliable Distribution Layer
• Responsible for carrying out the distribution or placement “plan” generated by higher-level services
• Examples:
• Globus Reliable File Transfer Service
• U Wisconsin Stork
• LBNL Data Mover Light
• EGEE FTS
• Provide feedback to higher level placement services on the outcome of the placement workflow
• Call on lower-level services to coordinate
The Managed Object Placement Service
Managed Object Placement Service (MOPS):
Basic MOPS Architecture Diagram
Building blocks:
• GridFTP server (Globus): add resource management
• Dan Fraser, John Bresnahan, Mike Link (ANL)
• NeST storage appliance (U Wisconsin): provides storage and connection management
• Nick LeRoy, Miron Livny (U Wisc)
• dCache storage management (Fermi): improve scalability and fault tolerance; jointly develop interfaces and interaction with GridFTP
• Gene Oleynick, Don Petravick, Ruth Podres (Fermi)
MOPS Features:
• Better internal resource management
• To overcome issues with GridFTP servers overwhelming resources
• Management interface
• System admins will prescribe resource limits for GridFTP service (maximum CPU, memory usage, connections, bandwidth)
• MOPS will report back on current state of its resources to administrative services, troubleshooting
• MOPS 0.9 release: • Provide automatic storage & network management
