0 Replies - 2929 Views - Last Post: 03 August 2012 - 08:29 AM

#1 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 9493
  • View blog
  • Posts: 35,828
  • Joined: 12-June 08

[link] CERN's data center needs and grid computing..

Posted 03 August 2012 - 08:29 AM

An interesting set of articles on the CERN's data storage needs. I mean that's a ton of data.. crazy big.

Does anyone have any experience with this 'grid computing'? I heard it's a pain in the ass crazy batch system.. I could be wrong though..

Their usage scenarios:

Quote

If you were to digitize all the information from a collision in a detector, it’s about a petabyte a second or a million gigabytes per second.


There is a lot of filtering of the data that occurs within the 25 nanoseconds between each bunch crossing (of protons). Each experiment operates their own trigger farm – each consisting of several thousand machines – that conduct real-time electronics within the LHC. These trigger farms decide, for example, was this set of collisions interesting? Do I keep this data or not?

The non-interesting event data is discarded, the interesting events go through a second filter or trigger farm of a few thousand more computers, also on-site at the experiment. [These computers] have a bit more time to do some initial reconstruction – looking at the data to decide if it’s interesting.
cite 1


center setup:

Quote

The filtered raw data comes into CERN’s Tier Zero data centre, but it is also sent out in almost real-time to eleven other large ‘Tier One’ data centres. These large data centres provide storage and long-term curation (back ups) of the data.

Other data products flow down to Tier Two centres. The Tier Twos provide CPU and disk for analysis and simulation. There are around 150 Tier Two data centres on the grid, made up of other labs, institutes and universities all over the globe.
cite 1



What's this grid thing all about?

Quote

So what we have is the concept of a grid, where we tie together all the computing resources of collaborating institutes so that it essentially looks like one pool of infrastructure.

The filtered raw data comes into CERN’s Tier Zero data centre, but it is also sent out in almost real-time to eleven other large ‘Tier One’ data centres. These large data centres provide storage and long-term curation (back ups) of the data.

Other data products flow down to Tier Two centres. The Tier Twos provide CPU and disk for analysis and simulation. There are around 150 Tier Two data centres on the grid, made up of other labs, institutes and universities all over the globe.

There is also always at least a second or third copy of the raw data distributed on the grid.

We have a distributed back-up system. You don’t have just one copy of the ATLAS data at the CERN site, that data is distributed at various sites that support the experiment. There is always one copy at CERN and a number of other copies elsewhere on the grid.

....


If I’m a physicist and I have some processing I want to do on some data, I submit that job to a physical machine that knows about the whole infrastructure. It knows at what site that data has been stored or replicated to on the grid, and it sends my job to one of those sites. It will be executed there against the data, and the results sent back to me. So jobs are flying all over the place.

What makes it a grid is that the user doesn’t see which server or storage system is being used, but rather submits a job and the system takes care of the execution of the job somewhere on a worldwide basis.
cite 1


There's an interesting discussion also about why they are not using cloud computing and how most of this grid software is on the Apache Software Licence v 2.0 (ASLv2).


Even more information on the new Budapest data center.

Is This A Good Question/Topic? 0
  • +

Page 1 of 1