Blockchain Analysis
Understand how the Chia blockchain works and optimize the process.
Result was a performance improvement in computing Chia blockchains and a decrease in how often SSDs need to be replaced.
Medium.com article
My medium.com article is here.
Chia plotting and disk usage
Chia is a blockchain technology based upon an innovative consensus algorithm, Proof of Space and Time, that leverages the vast amounts of over-allocated hard drive space.
The Proof of Space and Time algorithm requires minimal computational power and maximum hard drive space.
Goal
My goal was to determine if a second SSD sped up Chia plot creation.
I cloned the Chia Proof of Space repository and changed the source code to NOT remove temporary files at the end of each phase. Currently, each phase cleans up after itself and removes temporary files.
Caveats
Some temporary files are truncated, meaning the contents of the file are remove while leaving the filename intact. Those files appear as having zero bytes. I did not change any truncate commands in the source code due to unknown secondary affects. Any file with zero bytes is not counted in my summary.
I presume a file is written and read once. If a file is randomly read then the amount of data read might be less than what I observed.
Analysis
I modified these source code files and commented out the fs::remove()
statements.
-
src/plotter_disk.hpp
-
src/sort_manager.hpp
Then I build the ProofOfSpace command.
I created a plot file with this command.
./ProofOfSpace \
-b 4096 \
-k 32 \
-r 4 \
-u 128 \
-t /media/temp_ssd_001 \
-2 /media/temp_ssd_002 \
-d /media/dest_hdd_001
This is an explanation of the arguments.
-b 4096 - use 4,096 MB (4 GB) of memory
-k 32 - plot size is 32, yielding a 101 GB final file
-r 4 - use 4 threads in the first phase
-u 128 - use 128 buckets, which determines how many
files are read/written in each phase
-t - read and write temporary files to an SSD
-2 - read and write secondary files to an SSD
-d - write the final file to a hard disk
After the plot was created I captured the files in the temporary (-t option), secondary (-2 option), and destination (-d option) directories.
This is a summary of each phase and where files are read and written for temporary, secondary, and destination disks.
Phase | Temporary read (SSD) | Temporary write (SSD) | Secondary read (SSD) | Secondary write (SSD) | Destination write (HDD) |
---|---|---|---|---|---|
1 | write | ||||
2 | read | write | |||
3 | read | write | |||
4 | read | write | |||
Copy | read | write |
Here is the same table, as above, with the amount of data read and written in each phase to specific disks.
All numbers are in GB (gigabytes), where 1 (one) GB is 1024 * 1024 * 1024
.
Numbers are rounded to the nearest GB.
Phase | Temporary read (SSD) | Temporary write (SSD) | Secondary read (SSD) | Secondary write (SSD) | Destination write (HDD) |
---|---|---|---|---|---|
1 | 476 GB | ||||
2 | 476 GB | 164 GB | |||
3.1 | 164 GB | 244 GB | |||
3.2 | 244 GB | 167 GB | |||
4 | 167 GB | 101 GB | |||
Copy | 101 GB | 101 GB | |||
TOTAL | 1,051 GB | 1,051 GB | 101 GB | 101 GB | 101 GB |
The temporary disk (the -t option) bears 91% of all temporary and secondary SSD activity.
Which means that using 2 SSD disks as temp drives gains you very little performance benefits while ensuring that the primary SSD will wear out long before the secondary SSD (the -2 option) does.
Alternate solution 1
Alternate solution 1 distributes reads and writes more equitability over two SSDs. The temporary / secondary SSD activity is split 55% / 45%.
This requires a small modification to phases 3 and 4.
Phase | Temporary read (SSD) | Temporary write (SSD) | Secondary read (SSD) | Secondary write (SSD) | Destination write (HDD) |
---|---|---|---|---|---|
1 | 476 GB | ||||
2 | 476 GB | 164 GB | |||
3.1 | 164 GB | 244 GB | |||
3.2 | 244 GB | 167 GB | |||
4 | 167 GB | 101 GB | |||
Copy | 101 GB | 101 GB | |||
TOTAL | 640 GB | 640 GB | 512 GB | 512 GB | 101 GB |
Alternate solution 2
We can optimize the 4th phase and eliminate the "copy" phase to reduce the workload by 202 GB.
Phase 4 would read the output from phase 3.2 and write the final file to the destination disk (-d option).
Phase | Temporary read (SSD) | Temporary write (SSD) | Secondary read (SSD) | Secondary write (SSD) | Destination write (HDD) |
---|---|---|---|---|---|
1 | 476 GB | ||||
2 | 476 GB | 164 GB | |||
3.1 | 164 GB | 244 GB | |||
3.2 | 244 GB | 167 GB | |||
4 | 167 GB | 101 GB | |||
Copy | |||||
TOTAL | 640 GB | 640 GB | 411 GB | 411 GB | 101 GB |
The temporary / secondary SSD activity is split 61% / 39%.
Flip the SSD usage with each plot and you'll equalize the load across multiple plots.
Summary
-
With the existing ProofOfSpace code, there is almost no performance gain when using two SSDs as temporary storage (-t and -2 options).
-
The cost of a secondary SSD may be better spent on destination disks (hard disk drives).
-
You would need to replace the temporary SSD 10 times more often then the secondary SSD with the current 90% / 10% disk activity split.
A better option would be a second SSD used as a temporary disk (-d option) and split all of your plot creations across temporary SSDs. For example:
# first plot uses SSD 1 of 2
$ plot create -b 4096 -k 32 -r 4 -u 128 \
-t /media/temp_ssd_001 \
-d /media/dest_hdd_001
# second plot uses SSD 2 of 2
$ plot create -b 4096 -k 32 -r 4 -u 128 \
-t /media/temp_ssd_002 \
-d /media/dest_hdd_001
Chia is an emerging technology and there are opportunities for performance improvements.
Thank you and happy plotting.
Note: I intend to open an issue, make changes to the chiapos source code, and submit a pull request to implement my recommendations.
Technology
The software was written in Python.