There are many instances during test automation when you would download the files and then would want to verify the completeness of your files downloaded. The MD5 checksum verification is to validate the authenticity of the file so we can ensure that the received file is exactly the same as at the source. This blog explains what MD5 checksum is and also, how we can use it in test automation.

Let’s understand the need for MD5 checksum Verification for a file.

Different kinds of files are distributed on the network or any storage media at different destinations. So, there is a chance that the file can be corrupted because of a few missing bits during data transfer due to some different reasons. Below are a few factors that can cause this to happen, such as:

  •    An interruption to the Internet or network connection.
  •    Storage or space problems, including hard drive-related problems.
  •    A corrupted disk or corrupted file.
  •    A third party interfering with the transfer of data.

After receiving the file, we need to check if the file received is correct or incorrect. It also becomes necessary to verify that our copy of the file is authentic and does not contain any error by applying some sort of test.

We basically use the special key string for this file integrity test known as a checksum. Also, we can refer to checksum as a hash sum or a hash value, hash code, or simply a hash. The most famous and widely used checksum technique is nothing but the MD5 checksum.

the digest, MD5, or hash checksum has been extensively used in the software world to provide some assurance that an uploaded file arrived intact from its source

What is MD5?

The message-digest algorithm(MD5) is a cryptographic hash function whose main purpose is to verify that a file has not been altered.

MD5 hashes are 128 bits long and are usually displayed as their 32-digit hexadecimal equivalents. It doesn’t matter how large or small the file or text is.

For Reference:https://www.intel.com/content/www/us/en/support/programmable/articles/000078103.html

What is the MD5 checksum of a file?

As we learn, it is a 32-character hexadecimal number that is computed on a file. Various Checksum programs are used to generate checksum key strings from the files and verify the integrity of the files. Later by using that checksum string with the original ones shared by the file servers. The file servers often provide a pre-computed MD5, so that a user can compare the checksum of the downloaded file to it. There is a high probability that two files with the same MD5 checksum are the same.

How to calculate MD5 checksum for a file:

1. Calculate MD5 checksum for a file for windows:

With a command prompt, PowerShell command, or third-party applications like Hash Generator or MD5 Checksum Utility, you can generate a checksum for a file. 

a. With the command prompt:

Basically, the command line tool is built-in into the Microsoft Windows 10 operating system, as a certificate service, which is “CertUtil”.

This command line offers a switch, “Hashfile”; We can generate a hash string. Here is a specific algorithm that we can use, such as MD5:

certutil -hashfile <file> <algorithm>           

certutil -hashfile Example.txt MD5

It generates an MD5 file checksum in the command prompt on Windows…

b. With PowerShell:

Since there is no coding, this is the most efficient and easiest method. If PowerShell 4.0 is used then a command line, i.e., cmdlet exists in it.

 This cmdlet can also reffer as “Get-FileHash”. Thanks to this command line, because of that command line we can generate hash files easily:

Get-FileHash -Path <file> -Algorithm <name>

Get-FileHash -Path explorer.exe -Algorithm MD5

Use Get-FileHash in Powershell. It returns the hexadecimal string/hash.

2. Calculate MD5 checksum for a file using some Third Party Tools:

Also, you can find out which tools to use in the right-click menu in a file. Following are some tools we generally can use:

·        Hash Generator

·        MD5 & SHA Checksum Utility.

·      HashMyFiles 

3. Calculate MD5 checksum for a file through automation using C# :

Here, you can calculate programmatically using .net, Java, Python, etc.

To calculate it for a file in C#, .net provides an inbuilt functionality for generating these hash functions, For that reason, we need to import the following NuGet package: https://www.nuget.org/packages/Security.Cryptography

System.Security.Cryptography.MD5

Firstly we need to instantiate the Message Digest Object.

The Compute Hash method of the instance returns the computed hash of the file (bypassing the file stream).

Later, we can convert it to hex, the Bit Converter allows you to represent it as a string for conversion.

So, below is an example code showing a method (GetMD5HashFromFile ()).

It accepts the file name along with the path and then returns the calculated checksum. Also, generates the checksum, After that, it converts into a hex string, and removes dashes. This is the typical format.

The hash string for the file returned by this method can be compared with the one provided by the file servers and checked if the file is altered or not.

private string GetMD5HashFromFile(string fileName)
       using (var md5 = MD5.Create())
       {
                using (var stream = File.OpenRead(fileName))
                 {
                       var hash= md5.ComputeHash(stream);
                      return BitConverter.ToString(hash).Replace("-", string.Empty);
              }
         }

You can call the above function wherever you need to get the checksum. For example:

void verifyChecksum()
{
string filePath = "../../../Resources/test.pdf";
string hash = GetMD5HashFromFile (filePath);
Assert.Equal("A152F13B6EE1EA3D047A6AB99D12A1A1", hash);
}

Conclusion:

As I have implemented and tested this in a test case automation, so, I believe this is a most simple and easier way to verify MD5 checksums for a file in automation using C#

Read more blogs here

9