Online citations, reference lists, and bibliographies.

Study Of Chunking Algorithm In Data Deduplication

A. Venish, K. S. Sankar
Published 2016 · Computer Science

Cite This
Download PDF
Analyze on Scholarcy
Share
Data deduplication is an emerging technology that introduces reduction of storage utilization and an efficient way of handling data replication in the backup environment. In cloud data storage, the deduplication technology plays a major role in the virtual machine framework, data sharing network, and structured and unstructured data handling by social media and, also, disaster recovery. In the deduplication technology, data are broken down into multiple pieces called “chunks” and every chunk is identified with a unique hash identifier. These identifiers are used to compare the chunks with previously stored chunks and verified for duplication. Since the chunking algorithm is the first step involved in getting efficient data deduplication ratio and throughput, it is very important in the deduplication scenario. In this paper, we discuss different chunking models and algorithms with a comparison of their performances.
This paper references
Data deduplication and tivoli storage
D Cannon (2009)
Venti: A New Approach to Archival Storage
S. Quinlan (2002)
10.1109/ICDE.2005.47
Deep Store: an archival storage system architecture
L. You (2005)
A Framework for Analyzing and Improving Content-Based Chunking Algorithms
K. Eshghi (2005)
Redundancy Elimination Within Large Collections of Files
Purushottam Kulkarni (2004)
Reliable and efficient storage of reference data
TE Denehy (2003)
10.1109/MASCOT.2008.4770594
Efficient index lookup for De-duplication backup system
Y. Won (2008)
Alternatives for Detecting Redundancy in Storage Systems Data
Calicrates Policroniades (2004)
10.1109/CTIT.2009.5423123
An Efficient Indexing Mechanism for Data Deduplication
Tin Thein Thwel (2009)
10.1145/362686.362692
Space/time trade-offs in hash coding with allowable errors
B. Bloom (1970)
A low-bandwidth network file system
MuthitacharoenAthicha (2001)
Single instance storage in Windows® 2000
W. Bolosky (2000)
10.1145/263105.263162
Potential benefits of delta encoding and data compression for HTTP
J. Mogul (1997)
Avoiding the Disk Bottleneck in the Data Domain Deduplication File System
Benjamin Zhu (2008)
Ef fi cient index lookup for deduplication backup system
P Kulkarni (2008)
10.1145/1534530.1534541
Multi-level comparison of data deduplication in a backup scenario
D. Meister (2009)
Bimodal Content Defined Chunking for Backup Streams
Erik Kruus (2010)
10.25911/5D67B61D91A92
Efficient Algorithms for Sorting and Synchronization
A. Tridgell (1999)
10.1109/SNAPI.2008.11
ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System
C. Liu (2008)
10.1109/ICCSA.2008.46
PRUN : Eliminating Information Redundancy for Large Scale Data Backup System
Y. Won (2008)
10.1145/2078861.2078864
A study of practical deduplication
D. Meyer (2012)
10.1145/356989.357007
OceanStore: an architecture for global-scale persistent storage
J. Kubiatowicz (2000)
Fingerprinting by random polynomials. Center for Research in Computing Technology, Aiken Computation
M Rabin (1981)
Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality
M. Lillibridge (2009)



This paper is referenced by
10.1007/978-981-13-5934-7_36
Analysis of Block-Level Data Deduplication on Cloud Storage
Lata Suresh (2019)
An Efficient Data Reduction Approach for Cloud Storage Based on Data Deduplication
Juilee Dilip Mahajan (2018)
An Efficient Inline Data Deduplication with Data Relationship Manager for Cloud
Storage Venish (2017)
10.1007/978-3-030-37051-0_64
Distributed Storage Hash Algorithm (DSHA) for File-Based Deduplication in Cloud Computing
S. Hema (2019)
Removing duplicated files on cloud storage using map reducing algorithm
A FairlinJenefa. (2018)
Two-Threshold Chunking (TTC): Efficient Chunking Algorithm For Data Deduplication For Backup Storage
Anand R. Bhalerao (2019)
10.1109/ICSCCC.2018.8703363
A Comparative Study of Data Deduplication Strategies
Nipun Chhabra (2018)
10.1109/ICPCSI.2017.8391823
An effective storage management in a twin cloud architecture using an authorized deduplication technique
Rasika V. Gode (2017)
10.1109/GLOCOM.2018.8647415
Data Deduplication with Edit Errors
L. Conde-Canencia (2018)
10.23956/IJARCSSE/V7I6/0272
Estimation of Secure Data Deduplication in Big Data
N. Kumar (2017)
10.1109/PDGC.2016.7913243
Comparative analysis of deduplication techniques for enhancing storage space
N. Kumar (2016)
10.14419/IJET.V7I2.4.10040
A Novel approach of data deduplication for distributed storage
S. Singhal (2018)
International Journal of Scientific Research in Computer Science, Engineering and Information Technology
K. Sridevi (2018)
10.1007/s10619-020-07301-2
A survey on novel classification of deduplication storage systems
Shawgi M. A. Mohamed (2020)
10.1109/ICOEI.2017.8300844
A survey: On data deduplication for efficiently utilizing cloud storage for big data backups
Anand R. Bhalerao (2017)
Detection and Elimination Scheme for Data Reduction with Low Overheads in Multi Cloud Storage
Sayali H. Ahire (2017)
10.1007/978-3-030-34080-3_80
Efficient Deduplication on Cloud Environment Using Bloom Filter and IND-CCA2 Secured Cramer Shoup Cryptosystem
Y. Mohamed Sirajudeen (2019)
10.1109/ESCI48226.2020.9167677
Methodological Survey to Improve the Secure Data Storage in Cloud Computing
K. Rajkumar (2020)
International Journal of Recent Technology and Engineering (IJRTE)
Inderpreet Kaur (2019)
10.1109/ICSESS.2016.7883027
Deduplication of files in cloud storage based on differential bloom filter
Zhibo Li (2016)
Exploiting Blockchains to improve Data Upload and Storage in the Cloud
Yassine El Khanboubi (2019)
10.24996/ijs.2017.58.4c.19
Evaluation of Two Thresholds Two Divisor Chunking Algorithm Using Rabin Finger print, Adler, and SHA1 Hashing Algorithms
Hala Abdulsalam (2018)
Semantic Scholar Logo Some data provided by SemanticScholar