Day 13 — Back to Tutorials #2

September 24th, 2020 No comments

Hey, all! Welcome back to CryptoCL.

Today I continued my work on the Hands-On OpenCL tutorial. I didn’t make much headway, since I was busy juggling another project that needed more urgent attention along with these tutorials.

Last time, I was having trouble with Exercise 06. I was able to figure out that issue and solve it, as the code that was given was causing issue. Perhaps they were in anticipation for systems with multiple Devices? Simply commenting out that code fixed the issue, and I replaced it by defining a DEVICE variable that sets the device type to default. After that issue was fixed, the program runs the sequential calculation well, but then is unable to solve the OpenCL computation. I was stuck on this for an hour, until I noticed that the kernel source code did not have an implementation. I wrote an implementation, but found it wasn’t the correct implementation. The Hands-On OpenCL git repository actually provides an implementation in a presentation slideshow file. Once implemented, the code now correctly calculates vector multiplication matrices!

I moved on to Exercise07, where the goal is to calculate vector multiplication matrices by row, instead of at row x and column y. I edited the kernel source, but was unable to get the exercise working.

Next time, I will continue working on my exercises. Dr. Marmorstein has been sick for the week, so we might or might not meet tomorrow. Until next time!

Kyle Jenkins.

Time spent today: 3 hours
Total Time: 19 hours

Categories: Uncategorized Tags: ,

Day 12 — Tutorials, Again #1

September 20th, 2020 No comments

Hey, all! Welcome to CryptoCL.

Today, I did a few of the Exercises in the Hands-On OpenCL tutorial mentioned yesterday. I’ve done every tutorial up to and stopped at Exercise #6.

The first two tutorials were mostly to check your system to see if you were able to run OpenCL on your system. Exercise #3 had you analyze a given program and understand what is happening within the program — it was adding two vectors together, and would return how many passed and would also return any equations that were wrong.

The fourth one is where things would get tricky. It was the same program, but they wanted you to run the program 3 times, adding vectors D, E, F, and G. First you would find C = A + B, then D = C + E, and finally return F = D + G. I ran the same function three times, except I replaced the inputs with the inputs I needed, and changed C and D to Read and Write. Eventually, I got the program to work after also changing places where I was supposed to return C values to be places I return F values.

The fifth one, I feel, was easier than the fourth exercise. In the fifth exercise, you had to change the program and kernel to add an additional vector, which I named D. Nothing was entirely complicated about it — the only thing I needed to do was add the D vector to wherever I needed it.

I stopped at the sixth exercise, which will require me to create an OpenCL program from scratch to multiply vectors. However, using the previous programs as a base, I think this exercise will be easy.

Next time, I’ll talk more about the upcoming Exercises in the tutorial. Stay tuned!

Kyle Jenkins.

Time spent today: 1 hour 15 minutes
Total Time: 16 hours

Categories: Uncategorized Tags: ,

Day 11 — Back to Tutorials

September 19th, 2020 1 comment

Hey, all. Welcome to CryptoCL!

I met with Dr. Marmorstein yesterday. We discussed what we had developed toward OpenCL implementation of BLAKE2. Dr. Marmorstein was quite a bit further than I with his BLAKE2b implementation, but also ran into an issue. His issue was that the size of data he was trying to give was still too small. It was still larger than mine, which was at 100000, but still too small for the operations we were trying to accomplish.

We made the decision to go back to doing tutorials. Earlier in the project development, Dr. Marmorstein found a tutorial called “Hands On OpenCL“, written by Simon McIntosh-Smith and Tom Deakin. The goal of the tutorial is to provide exercises to educate on how OpenCL works. After taking a look at the files, it does seem like a very useful and effective way to learn OpenCL. We decided to spend the week working on these exercises.

This week will definitely be much more tame than the last week or so of development time, but will be essential in exercising and testing our understanding of the OpenCL standard. Expect the next few posts to be about the tutorial.

Thank you, and see you next time!

Kyle Jenkins.

Time spent today: 1 hour
Total Time: 14 hours 45 minutes

Categories: Uncategorized Tags: ,

Day 10 — 10 Days of CryptoCL, and Ongoing Implementation

September 17th, 2020 No comments

Hello, all! Welcome to CryptoCL.

Firstly, thank you for joining me on Day 10 of CryptoCL. Every day I work on CryptoCL, I log my progress and thoughts, and I’m glad you all are joining me on this ongoing project. Thank you!

Back on topic, Implementation of OpenCL into BLAKE2s continues. I managed to find a slight workaround for the issue I stopped on, which is to simply set the second argument as 0. While in the main function, the argument is supposed to be 0-7 for all 8 instances of G being called, I’m only going to work on one instance, until things work. I just have the result of the kernel operation stored into “part” variables, from p1-p4, for v[0], v[4], v[8], v[12], respectively. Then, I free up the memory these cl objects take up within ROUND and main once I”m done.

Issues I’ve run into today is an issue of needing to make new definitions of blake2s functions. The function blake2s_compress is easy to fix, as that is declared and defined within the new blake2s-ref-driver.c file. However, functions like blake2s_init_key, blake2s_update, blake2s_final, and blake2s are declared in the blake2.h file. I could redefine them, but that would break other files within the directory that rely on those functions. I chose to copy those functions and make new definitions, which differ from the original to include the OpenCL objects (Like cl_context and cl_kernel) and the “_driver” suffix.

I was running into an issue on compiling, where the compiler couldn’t recognize OpenCL functions — but then I remembered I ran into this issue already, and included the OpenCL library in my make file.

The compiler is still having an issue at the moment — there’s an undefined reference to main. I don’t use makefiles often, and this seems to be an issue with being unable to make an output file. I will take a look tomorrow.

Next time, it will be time to finish the implementation and start bug hunting.

Thank you again for 10 days! See you next time!

Kyle Jenkins.

Time spent today: 1 hour 45 minutes
Total Time: 13 hours 45 minutes

Categories: Uncategorized Tags: ,

Day 9 — The Plan, and OpenCL Implementation into BLAKE2s

September 15th, 2020 No comments

Hey all, welcome to CryptoCL.

After talking with Dr. Marmorstein, we both decided we will both use one version of BLAKE2 to implement using OpenCL. I am tasked to implement OpenCL with the BLAKE2s version of BLAKE2.

However, I was a bit confused about our last meeting — I mistook my work to be to implement the ROUND function as an OpenCL kernel. This was a mistake, as I needed to implement the G function as the OpenCL kernel. After a quick fix, I began to implement various OpenCL functions into the BLAKE2s implementation.

I had a choice to make in whether or not I should start creating OpenCL objects in the main function, or later on in the ROUND function. Doing the former means that the program will be quick to end if there is an issue with building the kernel, however these variables will have to be carried from the main all the way to wherever the ROUND function is declared. Doing the latter meant that, while everything was concisely packed into the ROUND function, the program would have travelled pretty far already, and be difficult to manage the memory safely. Weighing the options, I opted for the latter, and decided to create the OpenCL objects within the main function.

Memory objects that were to be used by the kernel, however, will be created within the ROUND function, and disposed of at the end of the ROUND function.

I decided to stop in the middle of my work after looking over it for a long while, and since I have other obligations to do, stopping at the point where the kernel arguments were being assigned to the function. I stopped at a good place, too — the G function takes in an unsigned int i, which is the numbers 0-7 within the ROUND function. I might need to understand more of how OpenCL works, because I am not quite sure how to implement this as a argument for the kernel. I’ll do some more research to find out, and continue next time.

Next time, I should be able to finish implementation of the OpenCL version of BLAKE2s, and then it will be time to fix errors or bugs (but in an ideal world, the program has no errors and bugs and I can move onto testing, but this is not the likely outcome!)

See you next time!

Kyle Jenkins.

Time spent today: 3 hours
Total Time: 12 hours

Categories: Uncategorized Tags: ,

Day 8 — BLAKE2 Implementation #2, A New Direction?

September 12th, 2020 No comments

Hello, all! Welcome to CryptoCL.

Today, I completed implementing BLAKE2b and BLAKE2s. It appears that, although I cannot accurately tell what is happening when running the programs, they do successfully work.

I met with Dr. Marmorstein yesterday to discuss our next move. We explored through the code, and ran into an issue — a core function of BLAKE2, known as function G, runs sequentially. On the outset of a normal program, this is not an issue. However, for OpenCL, this is grave news. As discussed earlier, OpenCL is used to allow for parallelism. However, since the hash function for BLAKE2 requires values to be updated and then used elsewhere within the same function, it is impossible to be able to run the function in parallel, as information would not be properly updated or even overwritten.

We decided to try and brainstorm a new plan of action. In the meantime, I worked to implement the ROUND function as OpenCL kernels. ROUND calls the function G eight times — the first four and the second four are independent from each other. Ergo, we believe that if we run this function as two kernels for each half of the ROUND function, we can at least achieve some form of parallelism.

For both BLAKE2b and BLAKE2s, I implemented two files for each called “blake2?_round(x)”, where ? indicates either b or s, and x signals 1 or 2 for the top or bottom half of the ROUND function, respectively. These functions are basically untouched from how they are presented in the original function — just now the function has a __kernel prefix and all variables have a __global prefix.

Next time, I will meet with Dr. Marmorstein to discuss where to go next with this project — chances are, we will design our own, primitive cryptographic hashing algorithm to implement that will be able to run in parallel. One idea is run operations of pieces of the input in parallel, combining the results together until we get the encrypted message.

Until then, have a good night!

Kyle Jenkins.

Time spent today: 1 hour 45 minutes
Total Time: 9 hours

Categories: Uncategorized Tags: ,

Day 7 — BLAKE2 Implementation

September 10th, 2020 No comments

Hi, all! Welcome to CryptoCL.

Firstly, an update on the issue with the Rob Farber tutorial: After testing it on the lab system, the program successfully ran and passed all the tests. The lab systems have support for OpenCL, while my system at home does not. Ergo, using the lab systems to handle programs that deal with OpenCL will be crucial.

Next, BLAKE2 implementation has begun! Firstly, we are grabbing from the BLAKE2b and BLAKE2s (abbreviated as B2b and B2s hereafter) implementations provided from the BLAKE2 git repository. After browsing the code, I compiled both versions of B2b and B2s (One version specializes in speed, while the other complements portability and simplicity, as stated in the README). They all compiled and ran successfully, at least from what I can tell — looking through the code shows that the program prints “ok” when it runs without errors.

The problem is that the code doesn’t give an other information. When testing cryptographic functions, I would usually input some sort of input or the like, and compare to what the answer is supposed to be. However, the implementation provided does not given any other information besides if the program was successful in running.

Tomorrow, I will continue looking through the code. One, to study the implementation for the OpenCL-compatible version, and to see how I can accurately test the code.

Have a good day!

Kyle Jenkins.

Time spent today: 1 hour
Total Time: 7 hours 15 minutes

Day 6 — Tutorial #2, Decisions, and Name Change

September 9th, 2020 1 comment

Hi, all. Welcome to CryptoCL.

Firstly, the name change. I think the name change was a matter of time. I’m a bit disappointed it’s not a complete acronym, but I think there will be… less problems with this new one, so that should definitely outweigh the cons.

Now for the actual content of the blogpost — today I had begun implementing the Rob Farber tutorial. An interesting difference from the Erik Smistad tutorial from previous posts is that, besides being in C++/C respectively, Farber chose to implement his kernel source code as a constant character array, rather than its own file. I think I prefer Smistad’s method, however, as that will keep the kernel files separate from the main files and easier to find.

The same issue of the clCreateCommandQueue function call being deprecated in Smistad’s tutorial was also present in Farber’s tutorial, and was simply fixed the same way as last time — by adding “WithProperites” to the end of the function call name. I also ran into a simple bug where I forgot to include the stdc++ library in my compile command call.

The program now compiles and runs. However, the results are not what I expected it to be — this may just be an issue with the remote access, and needs to be tested physically. I double-checked by running Smistad’s test remotely, which also didn’t work correctly.

By the next blog post, I will run the program to make sure things are running smoothly. I will also begin with the actual implementation of BLAKE. We are deciding which version of BLAKE2 to implement — either BLAKE2b or BLAKE2s. 2b is optimized for 64-bit platforms, while 2s handles 8- and 32-bit platforms, as explained by the BLAKE2 RFC under 1. Introduction and Terminology. We may even implement both!

See you next time, and thank you for reading!

Kyle Jenkins.

Time spent today: 1 hour 15 minutes
Total Time: 6 hours 15 minutes

Day 5 — Tutorial #1 Complete and Research Findings

September 6th, 2020 1 comment

Hi, all. Welcome back to ICOC.

Today was a somewhat slow day. I decided to do a little bit of work towards the project, but not much — mostly some research.

However, Dr. Marmorstein contacted me about the driver update, and while the drivers were finished with the update, the error that was causing the Erik Smistad tutorial to fail was not an issue with the driver, but a bug in the program. The output vector, C, was accidentally given the const prefix. After that was fixed, the program ran correctly. Every print out of the program, where each element of A at i was incrementing by 1 from 0 until 1024, and every element of B at i was decrementing by 1 from 1024 until 0, equated to 1024.

I had begun looking into the other tutorial from Rob Farber, when Dr. Marmorstein shared with me a GitHub repository from a user whr, entitled “clblake.” It appears to be a very similar project to the one being conducted here, except rather than using BLAKE2, whr chose to use BLAKE256, instead. Regardless, given how closely the project resembles ours, we’re adding whr’s clblake repository as a “Previous Work” credit.

That was everything done today. Again, a shorter day, but next time, I will be implementing the Rob Farber tutorial as my second and final tutorial. Then, it’s time to start with the real project!

Thank you, and see you next time!

Kyle Jenkins.

Time spent today: 1 hour
Total Time: 5 hours

Day 4 — Tutorial and Driver Update

September 5th, 2020 No comments

Hi, all, welcome back to ICOC.

On the agenda today was scouring the web for tutorials to get familiarized with the OpenCL standard, and so far, I’ve found two tutorials that I want to try. One tutorial was written by a members of the Khronos Group, Rob Farber (which we’ll talk about later, but here’s a link now.), while the other was written by Erik Smistad.

The first tutorial I decided to implement was the one written by Erik Smistad, titled “Getting started with OpenGL and GPU computing.” This was an example that Dr. Marmorstein highlighted as we were forming the research project. The tutorial aims to add vectors contents together. While this can be done easily with a simple for loop, Smistad chooses to implement it using the OpenCL standard to allow for GPU computing. This will decrease the time it takes to compute the vector addition from linear, or O(n) with n being the size of the vectors, to relying on the number of cores in the processor instead, which can speed up the computation time.

Following Smistad’s guide, I was able to replicate this vectorAddition program, having created a main program and a kernel program to which the computation would occur in the GPU. The only issues that needed to be addressed were the facts that the OpenCL function “clCreateCommandQueue” is deprecated, and needed to be replaced with “clCreateCommandQueueWithProperties”, and the fact that the lab system I was using did not have an updated driver. Dr. Marmorstein is updating the drivers in all of the lab systems, which should fix that problem. Thankfully, one of the lab systems had the driver partially installed, so a remote login allowed me to compile the program.

Running the program, another problem appeared — when the program runs, the program will print out the given item at i in both vectors A and B, add them together, and print the result C in one line. However, all of the additions are wrong. They are either zero, or a ridiculously high number. At the moment, this does not appear to be user error, as Dr. Marmorstein was able to confirm the error, as well. Here’s hoping that the driver update will help this error somewhat…

That being said, that is all of the progress today. Tomorrow will be a day to work on more research and make sure that this tutorial is completed successfully.

Thank you, and see you next time!

Kyle Jenkins.

Time spent today: 1 hour 15 minutes
Total Time: 4 hours