Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
RAID report
Post: #1

i'm chandan from APSCE i'm presenting a seminars on RAID technology.
anyone plz help me out with the report and slides....
my email id is veeruuster[at] and rchandan27[at]
Post: #2

RAID - Redundant Array of Inexpensive Disks for Data-Intensive Scalable Computing

Presented By:
Kanwar Rajinder Pal Singh
B.B.S.B.E.C [ F.G.S]
Punjab Technical University
Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAID, organizations. Disk Reduce is a medications of the Hadoop distributed file system (HDFS) enabling asynchronous compression of initially triplicate data down to RAID-class redundancy overheads. In addition to increasing a cluster's storage capacity as seen by its users by up to a factor of three, Disk Reduce can delay encoding long enough to deliver the performance benefits of multiple data copies.
RAID, an acronym for Redundant Arrays of Inexpensive Disks, is a way to virtual-ized multiple, independent hard disk drives into one or more arrays to improve perfor¬mance, capacity and reliability (availability). The total array capacity depends on the type of RAID array you build and the number and size of disk drives. This total array capacity is independent of whether you use software or hardware RAID. The following sections look at the different implementations, the strengths and weaknesses and their impact to system performance and effectiveness in enhancing data availability.
The Goggle file system (GFS)[11] and Hadoop distributed file system (HDFS)[5], depend data-intensive file systems. They provide reliable storage and access to large scale data by parallel applications, typically through the Map/Reduce programming framework [10]. To tolerate frequent failures, each data block is triplicate and therefore capable of recovering from two simultaneous node failures. Though simple, a triplication policy comes with a high overhead cost in terms of disk space: 200%. The goal of this work is to reduce the storage overhead ignorantly while retaining double node failure tolerance and multiple copy performance advantage. We present Disk Reduce, an application of RAID in HDFS to save storage capacity. In this paper, we will elaborate and investigate the following key ideas:
A framework is proposed and prototyped for HDFS to accommodate deferent double failure tolerant encoding schemes, including a simple "RAID 5 and mirroring" encoding combination and a\RAID 6"encoding. The framework is extensible by replacing an encoding/decoding module with other double failure tolerant codes, and could easily be extended to higher failure tolerance.
Asynchronous and delayed encoding, based on a trace of the Yahoo! M45 cluster [2], enables most applications to attain the performance benefit of multiple copies with minimal storage overhead. With M45 usage as an example, delaying encoding by as little as an hour can allow almost all accesses to choose among three copies of the blocks being read.
A simple way to describe software RAID is that the RAID task runs on the CPU of your computer system. Some software RAID implementations include a piece of hardware, which might make the implementation seem like a hardware RAID implementation, at first glance. Therefore, it is important to understand that software RAID code utilizes the CPU's calculating power. The code that provides the RAID features runs on the sys¬tem CPU, sharing the computing power with the operating system and all the associated applications
Software RAID Implementations
Software RAID can be implemented in a variety of ways: 1) as a pure software solu¬tion, or 2) as a hybrid solution that includes some hardware designed to increase per¬formance and reduce system CPU overhead.
1) Pure Software Model - Operating System Software RAID
In this case, the RAID implementation is an application running on the host without any additional hardware. This type of software RAID uses hard disk drives which are at¬tached to the computer system via a built-in I/O interface or a processor-less host bus adapter (HBA). The RAID becomes active as soon as the operating system has loaded the RAID driver software. Such pure software RAID solutions often come integrated into the server OS and usually are free of additional cost for the user. Low cost is the primary advantage of this solution.
Hardware RAID
A hardware RAID solution has its own processor and memory to run the RAID applica¬tion. In this implementation, the RAID system is an independent small computer sys¬tem dedicated to the RAID application, offloading this task from the host system.
Hardware RAID can be found as an integral part of the solution (e.g. integrated in the motherboard) or as an add-in card. If the necessary hardware is already integrated in the system solution, then hardware RAID might become a software upgrade to your existing system. So like software RAID, hardware RAID might not be identified as such at first glance.
The simplest way to identify whether a solution is software or hardware RAID is to read the technical specification or data sheet of the RAID solution. If the solution includes a microprocessor (usually called I/O processor, processor or sometimes ROC - which means 'RAID on Chip'), then is the solution is a hardware RAID solution. If there is no processor, it is a software RAID solution. This is important for your selection because of the system impacts of t he software RAID vs. hardware RAID implementation. These impacts include:
- CPU utilization and performance when other applications are running
- Scalability of disk drives that can be added to a system
- Ease of recovery after a data loss
- Capability for advanced data management/monitoring - Ability to manage disk drives consistently across different
Operating systems
- Ability to add a battery backup option that allow to enable write caching on the con¬troller to enhance write performance of the system
Hardware RAID Implementations
Hardware RAID can be implemented in a variety of ways:
1) As a discrete RAID Controller Card
2) As integrated hardware based on RAID-on-Chip technology.
Almost all enterprise and high performance computing storage systems protect data against disk failures using a variant of the erasure protecting scheme known as Redundant Arrays of Inexpensive Disks [16]. Presented originally as a single disk failure tolerant scheme, RAID was soon enhanced by various double disk failure tolerance encodings, collectively known as RAID 6, including two-dimensional parity [12], P+Q Reed Solomon codes [20, 8], XOR-based Even Odd
[3], and NetApp's variant Row-Diagonal Parity [9]. Lately research as turned to greater reliability through codes that protect more, but not all, sets of larger than two disk failures [13], and the careful evaluation of the tradeoffs between codes and their implementations [17] Networked RAID has also been explored, initially as a block storage scheme [15], then later for symmetric multi-server logs [14], Redundant Arrays of Independent Nodes [4], peer to-peer le systems [22] and is in use today in the Pan FS supercomputer storage clusters [23]. This paper explores similar techniques, specialized to the characteristics of large scale data-intensive distributed le systems. Deferred encoding for compression, a technique we use to recover capacity without loss of the benefits of multiple copies for read bandwidth, is similar to two-level caching-and-compression in le systems [7], delayed parity updates in RAID systems [21], and alternative mirror or RAID 5 representation schemes [24].
Finally, our basic approach of adding erasure coding to data intensive distributed le systems has been introduced into the Goggle File System [19] and, as a result of an early version of this work, into the Hadoop Distributed File System [6]. This paper studies the advantages of deferring the act of encoding.
In this section we introduce Disk Reduce, a medication of the HDFS[5].
3.1 Hadoop Distributed File System
HDFS [5] is the native le system in Hadoop [1], an open source Map/Reduce parallel programming environment, and is highly similar to GFS [11]. HDFS supports write-once read-many semantics on les. Each HDFS cluster consists of a single metadata node and a usually large number of data nodes. The metadata node manages the namespace, file layout information and permissions. To handle failures, HDFS replicates les three times. In HDFS, all les are immutable once closed. Files are divided into blocks, typically 64MB, each stored on a data node. Each data node manages all le data stored on its persistent storage. It handles read and write requests from clients and performs\make a replica" requests from the meta- data node. There is a background process in HDFS that periodically checks for missing blocks and, if found, assigns a data node to replicate the block having too few copies.
3.2 Disk Reduce Basics
One principle of the Disk Reduce design is to minimize the change to original HDFS logic. Specially, Disk Reduce takes advantage of following two important features of
(1) files are immutable after they are written to the system and
(2) (2) all blocks in a le are triplicate initially. Disk Reduce makes no change to HDFS when les are committed and triplicate.
Then Disk Reduce exploits the background re-replication in HDFS, but in a different way: in HDFS the background process looks for insufficient number of copies, while in Disk Reduce it looks for blocks with high overhead (i.e. blocks triplicate) that can be turned into blocks with lower overhead (i.e. RAID encoding). Redundant blocks will not be deleted before the encoding is done to ensure data reliability during encoding phase. Since this process is inherently asynchronous, Disk Reduce can further delay encoding, when space allows, facilitating temporally local accesses to choose among multiple copies.
3.3 Encoding
Files are written initially as three copies on three different data nodes. We later compress the capacity used by encoding redundancy and deleting the excess copies. In our prototype, we have implemented two codes:
RAID 5 and Mirror As shown in Figure 1(b), we both maintain a mirror of all data and a RAID 5 en- coding. The RAID 5 encoding is only needed if both copies of one block are lost. In this way, the storage overhead is reduced to 1+1=N where N is the number of blocks in the parity's RAID set.
RAID 6 Disk Reduce also implements the leading scheme for double disk failure protection as shown in Figure 1©. The storage overhead is 2=N where N is the number of data blocks in a RAID set.
Based on a talk about our previous Disk Reduce work [6], a user space RAID 5 and mirror encoding scheme has been implemented on top of HDFS and may appear in the next HDFS release. In that implementation, only blocks from the same le will be grouped together. Alternatively, Figure 2 shows the capacity overhead derived from a le size distribution from the Yahoo! M45 cluster for two encoding schemes in which blocks grouped for encoding within a le or grouped across les. M45 is a cluster with approximately 4,000 processors, three terabytes of memory, and 1.5 pet bytes of disk. We can see that grouping across les may result in 40% reduction in capacity overhead for RAID 5 and mirror and 70% for RAID 6 because les on M45 are typically small relative to 64 MB blocks, and users often split data sets into many les. Our prototype explores this difference by grouping consecutively created blocks regardless of le boundaries on each node.
We have prototyped Disk Reduce as a modification to the Hadoop Distributed File System (HDFS) version 0.20.0. Currently, the Disk Reduce prototype supports only two encoding schemes: RAID 6 and RAID 5 and mirror. The RAID 6 encoding scheme uses the open-source Blaum-Roth RAID 6 code released in the Erasure coding library developed by the University of Tennessee [18]. The RAID 5 and mirror encoding scheme uses a simple block XOR operation written in Java to replace one of the three data copies of N blocks with single-block RAID parity. Without comments, our prototype implementation consists of less than 2,000 lines of Java code along with the Erasure coding library which is itself about 7,000 lines of C code. Our prototype runs in a cluster of 63 nodes, each containing two quad-core 2.83GHz Xeon processors, 16 GB of memory, and four 7200 rpm SATA 1 TB Seagate Barracuda ES.2 disks. Nodes are interconnected by 10 Gigabit Ethernet. All nodes run the Linux kernel and use the ext3 le system to store HDFS blocks. While our prototype is not fully functioned,
For example some reconstruction cases are still a work-in-progress; it functions enough for preliminary testing. To get a feel for its basic encoding function, we set up a 32 node partition and had each node write a le of 16 GB into a Disk Reduce modified HDFS, spread over the same 32 nodes using RAID groups of 8 data blocks. Figure 3 shows the storage capacity recovered for this 512GB of user data after it has been written three times.
While this experiment is simple, it shows the encoding process recovering 400GB and 900GB for the RAID 5 and mirror and RAID 6 schemes, respectively, bringing overhead down from 200% to 113% and 25%, respectively.
The background re-replication process in HDFS and GFS makes it easy to shift in time when data is encoded. It is generally believed that having multiple copies can improve read performance. In this section we bound the performance degradation that might result from decreasing copies
with encoding. If we reduce the number of data copies in HDFS, there are several reasons that Map/Reduce applications might suffer a performance penalty:
Backup tasks: In Map/Reduce a backup task is the redundant execution of a task if it fails or runs slowly. Backup tasks run in a different node, preferably in a node with a local copy of the data the original node was reading, otherwise more network track is needed to support the backup task. More data copies give scheduling more choices for assigning backup tasks on a node with a local copy.
Disk bandwidth: Popular, small datasets may be read by many jobs at the same time, making the total number of disk spindles holding the data a bottleneck. Copies of data may increase the number of spindles with desired data, increasing total bandwidth. GFS may make more than three copies for such \hot spots" [11].
Load Balance: When N blocks are spread at random over M nodes, they will not be perfectly evenly distributed, and the total work to be done by the most loaded node may determine the completion time. With more data copies, the job tracker has more extendibility to assign tasks to nodes with a local data copy, leading to better load balance.
The impact of these three factors is dependent on the encoding, frequency of failures, slow nodes, hot small les, and the ratio of disk to network bandwidth. For this study we are looking to bound the impact of a simple delaying scheme, so we will model the penalty as a factor of r, corresponding to a 100(1 r)% degradation. Our strategy is to exploit locality in the accesses of data, delaying encoding until there is a small impact on overall performance regardless of r. We obtained a data access trace from Yahoo's M45 cluster, recording the HDFS activity from December 2007 to July 2009. We count all block access requests, and calculate the \block age" when blocks are read. The cumulative distribution function (CDF) of the age of block accesses distribution is shown in Figure 4.
From this Figure, one can observe that most data access happens a short time after its creation. For instance, more than 99% of data accesses. Happen within the rest hour of a data block's life.
If we delay the encoding for t seconds, i.e. the background process will not encode data blocks until they have lived for at least t seconds, we can obtain the full performance of having three copies from a block's creation until it is t seconds old. The expected performance achieved by delaying encoding t seconds can be bounded as: 1 w(t) + r (1 (t))
Where w is the CDF of block access with regard to block age, derived from the trace and shown in Figure 4. Different r gives different expected performance bounds, as shown in Figure 5.
As we can see in Figure 5, even if one copy achieves only 1 = 3 of the performance of three copies, by delaying the encoding for one hour, there is very little system performance penalty. Delaying encoding delays recovering the disk capacity used by copies. Consider a bound for the disk capacity consumed by delaying by one hour the encoding. Each disk cannot write faster than about 100MB/s, and is unlikely to sustain more than 25MB/s through a le system today, because file systems cannot generally avoid fragmentation altogether. With disk capacities now 1-2TB, a workload of continuous writing at 25 MB/s per disk for one hour would consume 6 12% of total capacity. Combining these two bounds, regardless of the performance degradation a workload might suffer from not having extra copies, M45 usage suggests that at a capacity overhead of less than 6 12%, overall performance degradation will be negligible if all encoding is delayed for an hour.
Data-intensive le systems are part of the core of data intensive computing paradigms like Map/Reduce. We envision a large increase in the use of large scale parallel programming tools for science analysis applications applied to massive data sets such as astronomy surveys, protein folding, public information data mining, machine translation, etc. But current data-intensive le systems protect data against disk and node failure with high overhead triplication schemes, undesirable when data sets are massive and resources are shared over many users, each with their own massive datasets. Disk Reduce is a modification of the Hadoop distributed le system (HDFS) to asynchronously replace multiple copies of data blocks with RAID 5 and RAID 6 encodings. Because this replacement is asynchronous, it can be delayed wherever spare capacity is available. If encoding is delayed long enough, most read accesses will occur while multiple copies are available, protecting all potential performance benefits achievable with multiple copies. By examining a trace of block creation and use times on the Yahoo! M45 Hadoop cluster, we and that 99% of accesses are made to blocks younger than one hour old, and that far less than 12% of disk capacity is needed to delay encoding for an hour. We conclude that delaying encoding by about one hour is likely to be a good rule of thumb for balancing capacity overhead and performance benefits of multiple copies. We are completing our implementation of Disk Reduce as well as exploring the costs of \cleaning" partially deleted RAID sets, the reliability differences between our different encodings, and explorations of different encoding schemes than were presented in this paper.
[I] Hadoop.
[2] Yahoo! reaches for the stars with m45 supercomputing project.
[3] M. Blaum, J. Brady, J. Bruck, and J. Menon. Evenodd: An ecient scheme for tolerating double disk failures in raid architectures. IEEE Trans.Comput., 44(2):192{202, 1995.
[4] V. Bohossian, C. C. Fan, P. S. LeMahieu, M. D.Riedel, L. Xu, and J. Bruck. Computing in the rain: A reliable array of independent nodes. IEEE Trans. Parallel and Distributed Systems, (2), 2001.
[5] D. Borthakur. The hadoop distributed le system:Architecture and design, 2009.
[6] D. Borthakur. Hdfs and erasure codes, aug. 2009.
[7] V. Cate and T. Gross. Combining the concepts of compression and caching for a two-level lesystem. In ACM ASPLOS-IV, pages 200{211, April 1991.
[8] P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson. Raid: High-performance, reliable secondary storage. In ACM Computing Surveys, 1994.
[9] P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar. Row-diagonal parity for double disk failure correction. In USENIX FAST, pages 1{14, 2004.
[10] J. Dean and S. Ghemawat. Mapreduce: simplied data processing on large clusters. In USENIX OSDI'04, Berkeley, CA, USA, 2004.
[II] S. Ghemawat, H. Gobio, and S.-T. Leung. The google le system. ACM SIGOPS Oper. Syst. Rev.,
37(5):29{43, 2003.
[12] G. A. Gibson, L. Hellerstein, R. M. Karp, R. H. Katz, and D. A. Patterson. Failure correction techniques for large disk arrays. ACM ASPLOS, pages 123{132, 1989.
[13] J. L. Hafner. Weaver codes: Highly fault tolerant erasure codes for storage systems. In USENIX FAST 2005.
[14] J. Hartman and J. Ousterhout. The zebra striped network le system. In Proc. 14th ACM SOSP, 1994.
Post: #3
RAID report

for more info about raid technology

RAID, an acronym for redundant array of inexpensive disks or redundant array of independent disks, is a technology that allows high levels of storage reliability from low-cost and less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for redundancy. This concept was first defined by David A. Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987 as redundant array of inexpensive disks.[1] Marketers representing industry RAID manufacturers later reinvented the term to describe a redundant array of independent disks as a means of dissociating a low-cost expectation from RAID technology.
Post: #4
send me a detailed ppt and full report of raid technology

send me a complete ppt and detailed report of raid email is priyankaa.chow[at]
Post: #5
send me a complete ppt and detailed report about raid email id is priyankaa.chow[at]
Post: #6
go through the below thread for ppt and some other details on RAID.
Post: #7


Submitted by:
Sandeep Kumar
Anirudh Kumar
Rakesh Ranjan
Shiv kuar singh

Submitted to:
Mr. Ajay Kaul

What is RAID???

RAID stands for “Redundant array of inexpensive disks” or “Redundant array of independent disks”.


RAID is a way of storing the same data in different places on multiple hard disks.
By placing data on multiple disks, I/O operations can be performed easily.
A RAID appears to the operating system to be a single logical hard disk.


To mitigate the problem of MTBF.
To increase fault tolerance.
To increase performance of I/O.
To storage reliability through redundancy.
To increase capacity.
To provide higher availability in case of disk failure.

RAID components


Array of disks.

RAID controller.

Array of disks:-
It contains two or more disks .

This looks like one very fast, very reliable, very large disk to the host computer.

Disks are enclosed in smaller enclosures.

RAID controller:-

RAID controller is an electronic device.
It provides the interface between the host computer and array of disks.
RAID controller makes the array of disks , which works as a logical disk for the operating system.

It overcomes the disk performance limitation by stripping the data across the array of disks.
It uses the parallel data paths for the I/O operations.
Data transfer rate is as high as 35 MBPS while only 10 MBPS in case of single disk.

It handles following tasks:
Management and control of disk aggregation.

Translation of I/O request between logical disks and physical disks.

Data regeneration if disk failure occurs.

RAID concepts

RAID uses three main concepts as following:

where multiple disks contain identical data.
Ex- used in RAID level-1.

Post: #8
Submitted By:
Mr. Arun yadav

RAID is the term used to describe a storage systems.
RAID is a technology that provides increased storage functions and reliability through redundancy.
RAID stands for: Redundant Array of Inexpensive or Independent Disks.
Increased performance
Increased capacity
fault tolerance, Redundancy
RAID Techniques/Methods
Mirroring Technique Advantages
The easiest way to get high availability of data's .
Higher read performance.
Striping Technique Advantages
Higher performance
Distributed across disks
Work in parallel
Parity Technique Advantages
Improve the availability
Less waste of space
There are Five basic RAID levels, The other levels are the combination of these five levels.
Striping Technique
Lower availability of data’s.
No Fault tolerance
Mirroring Technique
High availability
RAID 2, 3, 4
Parity Technique
Disk striping
High performance
High availability
Parity with striping
Higher performance than level 1-4
High availability of data.
Most used RAID level
Combinations of RAID levels
Combine any two levels and get the advantages from both levels.
Examples: 0+1, 1+0, 0+3, 3+0, 0+5, 5+0, 1+5, and 5+1.
Post: #9
What is RAID
Redundant Array of Independent (Inexpensive) Disks
A set of disk stations treated as one logical station
Data are distributed over the stations
Redundant capacity is used for parity allowing for data repair
Levels of RAID
6 levels of RAID (0-5) have been accepted by industry
Other kinds have been proposed in literature
Level 2 and 4 are not commercially available, they are included for clarity
All data (user and system) are distributed over the disks so that there is a reasonable chance for parallelism
Disk is logically a set of strips (blocks, sectors,…). Strips are numbered and assigned consecutively to the disks (see picture.)
Raid 0 (No redundancy)
Data mapping Level 0
Performance depends highly on the the request patterns
High data transfer rates are reached if
Integral data path is fast (internal controllers, I/O bus of host system, I/O adapters and host memory busses)
Application generates efficient usage of the disk array by requests that span many consecutive strips
If response time is important (transactions) more I/O requests can be handled in parallel
Raid 1 (mirrored)
RAID 1 does not use parity, it simply mirrors the data to obtain reliability
Reading request can be served by any of the two disks containing the requested data (minimum search time)
Writing request can be performed in parallel to the two disks: no “writing penalty”
Recovery from error is easy, just copy the data from the correct disk
Price for disks is doubled
Will only be used for system critical data that must be available at all times
RAID 1 can reach high transfer rates and fast response times (~2*RAID 0) if most of the requests are reading requests. In case most requests are writing requests, RAID 1 is not much faster than RAID 0.
Raid 2 (redundancy through Hamming code)
Small strips, one byte or one word
Synchronized disks, each I/O operation is performed in a parallel way
Error correction code (Hamming code) allows for correction of a single bit error
Controller can correct without additional delay
Is still expensive, only used in case many frequent errors can be expected
Hamming code
RAID 3 (bit-interleaved parity)
Level 2 needs log2(number of disks) parity disks
Level 3 needs only one, for one parity bit
In case one disk crashes, the data can still be reconstructed even on line (“reduced mode”) and be written (X1-4 data, P parity):
P = X1+X2+X3+X4
RAID 2-3 have high data transfer times, but perform only one I/O at the time so that response times in transaction oriented environments are not so good
RAID 4 (block-level parity)
Larger strips and one parity disk
Blocks are kept on one disk, allowing for parallel access by multiple I/O requests
Writing penalty: when a block is written, the parity disk must be adjusted (e.g. writing on X1):
P =X4+X3+X2+X1
Parity disk may be a bottleneck
Good response times, less good transfer rates
RAID 5 (block-level distributed parity)
Distribution of the parity strip to avoid the bottle neck.
Can use round robin:
Parity disk = (-block number/4) mod 5
Post: #10
to get information about the topic RAID full report ppt and related topic refer the link bellow

Important Note..!

If you are not satisfied with above reply ,..Please


So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: sagarchougala yahoo com, acm sigmodpods, raid aid pharmacy, raid mnp status, mirroring blendshapes, www rvcrecords yahoo com, haritha72 yahoo com,

Quick Reply
Type your reply to this message here.

Image Verification
Image Verification
(case insensitive)
Please enter the text within the image on the left in to the text box below. This process is used to prevent automated posts.

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Raid technology computer science crazy 5 4,856 20-02-2012 02:23 PM
Last Post: seminar paper
  INDUSTRIAL VISIT REPORT FORMAT | IV REPORT FORMAT greatexperts 2 4,854 05-09-2011 09:33 AM
Last Post: seminar addict
  A Project Report On "SunZip" full report seminar class 1 1,598 10-04-2011 05:51 PM
Last Post: sreekanth0987
  RAID seminar projects crazy 0 671 13-06-2009 04:15 PM
Last Post: seminar projects crazy