VIJ Digital library
Articles

Text Compression Using the Shannon-Fano, Huffman, and Half–Byte Algorithms

Eko Priyono
Informatics Engineering, Universitas Muhammadiyah Purwokerto,
Hindayati Mustafidah
Informatics Engineering, Universitas Muhammadiyah Purwokerto,

Submission to VIJ 2024-09-05

Keywords

  • compression ratio,
  • text data,
  • effectiveness in compressing

Abstract

Background and Objectives: File sizes increase as technology advances. Large files require more storage memory and longer transfer times. Data compression is changing an input or original data into another data stream as output or compressed data which is smaller in size. Existing compression techniques include the Huffman, Shannon-Fano, and Half-Byte algorithms. Like algorithms in computer science, these three algorithms offer advantages and disadvantages. Therefore, testing is needed to determine which algorithm is most effective for data compression, especially text data.

Methods: Applying the Huffman, Shannon-Fano, and Half-Byte algorithms to test their effectiveness in compressing text data. The text data as a sample in the research carried out is a text file containing abstracts from research articles published in scientific publications randomly selected from 100 journals. The abstract text used as data is in Indonesian.

Results: Based on test findings, the Huffman algorithm outperforms the Shannon-Fano and Half-Byte algorithms in terms of compression ratio. The Half-Byte algorithm has the lowest compression ratio compared to the Huffman and Shannon-Fano algorithm. The Half-Byte compression algorithm is based on the similarity of the first four bits of seven consecutive characters, whereas Huffman and Shannon-Fano algorithms employ the number of character appearances. The Huffman method can be considered for use in compressing Indonesian language text data according to its average compression ratio of 46.05%, while Shannon-Fano of 40.36%, and Half-Byte of 5.04%.

References

  1. Darnita, Y., Khairunnisyah, K. and Mubarak, H. (2019), “Text Data Compression Using the Sequitur Algorithm”, SISTEMASI, Vol. 8 No. 1, pp. 104–113.
  2. Fatmawaty, F. and Mufty, M. (2020), “Comparative Analysis of Wav File Compression Using the Huffman Method and Run Length Encoding”, Jurnal Teknologi Informasi Dan Terapan, Vol. 7 No. 1, pp. 61–65.
  3. Irliansyah, M.R., Nasution, S.D. and Ulfa, K. (2017), “Application of the Deflate Method and Goldbach Codes Algorithm in Text File Compression”, KOMIK (Konferensi Nasional Teknologi Informasi Dan Komputer), Vol. 1 No. 1.
  4. Mahmoudi, R. and Zare, M. (2020), “Comparison of Compression Algorithms in text data for Data Mining”, Int. J. Adv. Eng. Manag. Sci., Vol. 6, pp. 231–235.
  5. Mizwar, T., Ginting, G.L., Mesran, M., Fau, A., Aripin, S. and Siregar, D. (2017), “Implementation of the J-Bit Encoding Algorithm in Text File Compression”, KOMIK (Konferensi Nasional Teknologi Informasi Dan Komputer), Vol. 1 No. 1.
  6. Pujianto, M., Prasetyo, B.H. and Prabowo, D. (2020), “Comparison of the Huffman Method and Run Length Encoding in Document Compression”, InfoTekJar J. Nas. Inform. Dan Teknol. Jar, Vol. 5 No. 1, pp. 216–223.
  7. Puspabhuana, A. (2016), “Three Steps Comparison of Text Compression Techniques”, President University.
  8. Rizky, N.F., Nasution, S.D. and Fadlina, F. (2020), “Application of the Elias Delta Codes Algorithm in Text File Compression”, Building of Informatics, Technology and Science (BITS), Vol. 2 No. 2, pp. 109–114.
  9. Salomon, D. (2007), A Concise Introduction to Data Compression, Springer Science & Business Media.
  10. Saragih, S.R. and Utomo, D.P. (2020), “Application of the Prefix Code Algorithm in Text Data Compression”, KOMIK (Konferensi Nasional Teknologi Informasi Dan Komputer), Vol. 4 No. 1.
  11. Sayood, K. (2017), Introduction to Data Compression, Morgan Kaufmann.
  12. Siahaan, A.P.U. (2016), “Implementation of Huffman Text Compression Technique”, Jurnal Informatika Ahmad Dahlan, Universitas Ahmad Dahlan, Vol. 10 No. 2, p. 101651.
  13. Simanjuntak, L.V. (2020), “Comparison of the Elias Delta Code and Levenstein Algorithms for Text File Compression”, Journal of Computer System and Informatics (JoSYC), Vol. 1 No. 3, pp. 184–190.
  14. Supiyandi, S. and Frida, O. (2018), “Comparative Analysis of Text Data Compression Using Huffman and Half-Byte Methods”, ALGORITMA: JURNAL ILMU KOMPUTER DAN INFORMATIKA, Vol. 2 No. 1.
  15. Widatama, K. and Saputro, W.T. (2019), “Comparison of the Performance of the Huffman Algorithm and the Shannon-Fano Algorithm in Compressing Image Files”, INTEK: Jurnal Informatika Dan Teknologi Informasi, Vol. 2 No. 2, pp. 70–77.