Skip to main content

Blockchain Analysis

Understand how the Chia blockchain works and optimize the process.

Result was a performance improvement in computing Chia blockchains and a decrease in how often SSDs need to be replaced.

Medium.com article

My medium.com article is here.

Chia plotting and disk usage

Chia is a blockchain technology based upon an innovative consensus algorithm, Proof of Space and Time, that leverages the vast amounts of over-allocated hard drive space.

The Proof of Space and Time algorithm requires minimal computational power and maximum hard drive space.

Goal

My goal was to determine if a second SSD sped up Chia plot creation.

I cloned the Chia Proof of Space repository and changed the source code to NOT remove temporary files at the end of each phase. Currently, each phase cleans up after itself and removes temporary files.

Caveats

Some temporary files are truncated, meaning the contents of the file are remove while leaving the filename intact. Those files appear as having zero bytes. I did not change any truncate commands in the source code due to unknown secondary affects. Any file with zero bytes is not counted in my summary.

I presume a file is written and read once. If a file is randomly read then the amount of data read might be less than what I observed.

Analysis

I modified these source code files and commented out the fs::remove() statements.

  • src/plotter_disk.hpp

  • src/sort_manager.hpp

Then I build the ProofOfSpace command.

I created a plot file with this command.

Create a plot file
./ProofOfSpace \
-b 4096 \
-k 32 \
-r 4 \
-u 128 \
-t /media/temp_ssd_001 \
-2 /media/temp_ssd_002 \
-d /media/dest_hdd_001

This is an explanation of the arguments.

ProofOfSpace arguments
-b 4096 - use 4,096 MB (4 GB) of memory
-k 32 - plot size is 32, yielding a 101 GB final file
-r 4 - use 4 threads in the first phase
-u 128 - use 128 buckets, which determines how many
files are read/written in each phase
-t - read and write temporary files to an SSD
-2 - read and write secondary files to an SSD
-d - write the final file to a hard disk

After the plot was created I captured the files in the temporary (-t option), secondary (-2 option), and destination (-d option) directories.

This is a summary of each phase and where files are read and written for temporary, secondary, and destination disks.

PhaseTemporary
read (SSD)
Temporary
write (SSD)
Secondary
read (SSD)
Secondary
write (SSD)
Destination
write (HDD)
1write
2readwrite
3readwrite
4readwrite
Copyreadwrite

Here is the same table, as above, with the amount of data read and written in each phase to specific disks.

All numbers are in GB (gigabytes), where 1 (one) GB is 1024 * 1024 * 1024.

Numbers are rounded to the nearest GB.

PhaseTemporary
read (SSD)
Temporary
write (SSD)
Secondary
read (SSD)
Secondary
write (SSD)
Destination
write (HDD)
1476 GB
2476 GB164 GB
3.1164 GB244 GB
3.2244 GB167 GB
4167 GB101 GB
Copy101 GB101 GB
TOTAL1,051 GB1,051 GB101 GB101 GB101 GB

The temporary disk (the -t option) bears 91% of all temporary and secondary SSD activity.

Which means that using 2 SSD disks as temp drives gains you very little performance benefits while ensuring that the primary SSD will wear out long before the secondary SSD (the -2 option) does.

Alternate solution 1

Alternate solution 1 distributes reads and writes more equitability over two SSDs. The temporary / secondary SSD activity is split 55% / 45%.

This requires a small modification to phases 3 and 4.

PhaseTemporary
read (SSD)
Temporary
write (SSD)
Secondary
read (SSD)
Secondary
write (SSD)
Destination
write (HDD)
1476 GB
2476 GB164 GB
3.1164 GB244 GB
3.2244 GB167 GB
4167 GB101 GB
Copy101 GB101 GB
TOTAL640 GB640 GB512 GB512 GB101 GB

Alternate solution 2

We can optimize the 4th phase and eliminate the "copy" phase to reduce the workload by 202 GB.

Phase 4 would read the output from phase 3.2 and write the final file to the destination disk (-d option).

PhaseTemporary
read (SSD)
Temporary
write (SSD)
Secondary
read (SSD)
Secondary
write (SSD)
Destination
write (HDD)
1476 GB
2476 GB164 GB
3.1164 GB244 GB
3.2244 GB167 GB
4167 GB101 GB
Copy
TOTAL640 GB640 GB411 GB411 GB101 GB

The temporary / secondary SSD activity is split 61% / 39%.

Flip the SSD usage with each plot and you'll equalize the load across multiple plots.

Summary

  • With the existing ProofOfSpace code, there is almost no performance gain when using two SSDs as temporary storage (-t and -2 options).

  • The cost of a secondary SSD may be better spent on destination disks (hard disk drives).

  • You would need to replace the temporary SSD 10 times more often then the secondary SSD with the current 90% / 10% disk activity split.

A better option would be a second SSD used as a temporary disk (-d option) and split all of your plot creations across temporary SSDs. For example:

# first plot uses SSD 1 of 2
$ plot create -b 4096 -k 32 -r 4 -u 128 \
-t /media/temp_ssd_001 \
-d /media/dest_hdd_001

# second plot uses SSD 2 of 2
$ plot create -b 4096 -k 32 -r 4 -u 128 \
-t /media/temp_ssd_002 \
-d /media/dest_hdd_001

Chia is an emerging technology and there are opportunities for performance improvements.

Thank you and happy plotting.

Note: I intend to open an issue, make changes to the chiapos source code, and submit a pull request to implement my recommendations.

Technology

The software was written in Python.