cid icon indicating copy to clipboard operation
cid copied to clipboard

Use ex_multihash instead of :crypto.hash(:sha512, input)

Open nelsonic opened this issue 7 years ago • 11 comments

At present, the Erlang :crypto.hash(:sha512, input) is called directly when creating a cid: https://github.com/dwyl/cid/blob/3e8eb8267a7045910381dbe16bcce87f97922694/lib/cid.ex#L25

In light of the fact that https://github.com/ipld/cid achieves a pretty similar goal, I propose that we use multihash instead of directly creating a :sha512 hash. This will ensure forward compatibility and also mean that we can use the JS cid function on the client.

Todo

  • [x] figure out how to use https://github.com/multiformats/ex_multihash to hash content.
  • [x] At present the ex_multihash/test/multihash_test.exs is only doing a doctest ... 🙄 https://github.com/multiformats/ex_multihash/blob/dd3ce7e39d053049dbd7219c2aba2dbde9940bb2/test/multihash_test.exs#L3
    • [x] Create tests for https://github.com/multiformats/ex_multihash 🆕
  • [x] Submit a Pull Request with the tests.
  • [x] See if it gets accepted! 🤞

nelsonic avatar Dec 10 '18 17:12 nelsonic

If you need more context on the IPFS/IPLD CID function, see: https://github.com/nelsonic/learn-ipfs/issues/8 and https://github.com/nelsonic/learn-ipfs/issues/5

nelsonic avatar Dec 10 '18 17:12 nelsonic

@nelsonic I forked the repo and installed excoveralls to check the code coverage.

I know that they currently only have doctests but I figured if those doctests actually worked then all should be okay with them (unless I am misunderstanding why you might want separate tests)

image

COV    FILE                                        LINES RELEVANT   MISSED
 95.8% lib/multihash.ex                              261       24        1
  0.0% lib/util.ex                                    23        5        5
[TOTAL]  79.3%

There are no tests in the util file. There are only 2 function definitions however and one is a private function that is called by the other function. (What I am trying to say (UNLCEARLY) is that we would only need to test one function here)

There is only one line inmultihash.ex that is not being.

      do: encode(function, digest, length)

It is in the following function

  @spec encode(binary, binary, integer_default) :: on_encode
  def encode(<<_hash_code>> = hash_code, digest, length) when is_binary(digest) do
    with {:ok, function} <- get_hash_function(hash_code),
      do: encode(function, digest, length)
  end

I have called some of the functions in an iex session and have been able to replicate the results from the documentation. I haven't been able to test the function above though.

I can call it but I am not sure what argument to pass in to get past the following line

    with {:ok, function} <- get_hash_function(hash_code) do

as the get_hash_function function is keeps retuning an error for me.

With regards to this point

figure out how to use https://github.com/multiformats/ex_multihash to hash content.

I wouldn't say that I fully understand everything about hashing (right now) but the elixir code in this module is very well written and fairly straightforward to understand. (Plus the examples are pretty great)

RobStallion avatar Dec 11 '18 20:12 RobStallion

@RobStallion doctests are only testing the signature of the functions. I ran the doctests locally too: image Warnings galore.

Did you push a commit to GitHub with your addition of excoveralls?

nelsonic avatar Dec 11 '18 21:12 nelsonic

@nelsonic the doc tests do appear to be testing the functions

iex> Multihash.encode(:sha1, :crypto.hash(:sha, "Hello"))
{:ok, <<17, 20, 247, 255, 158, 139, 123, 178, 224, 155, 112, 147, 90, 93, 120, 94, 12, 197, 217, 208, 171, 240>>}

This calls the functions with the above arguments and expects the result below. These results can be replicated in your iex shell as well.

Also, if you change the returned binary in anyway the test fails.

The warnings all appear to be of the following... @doc attribute is always discarded for private functions/macros/types

The only reason this appears to be a thing is because the devs decided to put documentation on their private helper functions.

RobStallion avatar Dec 12 '18 09:12 RobStallion

Yeah, Docs in private functions are a good thing. I don't know why Elixir considers them a warning. 🙄 I'm quite confident in the doctest for most of these functions, I think we should consider writing a test for Util.sum for completeness. Creating a PR improving the test coverage of a project is more of a test of the "responsiveness" of the repo/package/module maintainer. i.e. do they accept PRs that don't change the code but simply improve it's overall reliability?

nelsonic avatar Dec 12 '18 10:12 nelsonic

@nelsonic Upon further inspection, the Multihas.Util/sum/2 function just ends up calling Multihash.encode/3 with a hash_type of :sha1, :sha2_256, :sha2_512.

This is probably the reason that they did not include the tests in the first place. (I should have noticed this on my first pass through 😞)

I can and have added tests for this now (in the same doc test format as the maintainers) and opened a pr here. Do you still want me to open a PR with them?

RobStallion avatar Dec 12 '18 12:12 RobStallion

@nelsonic

figure out how to use https://github.com/multiformats/ex_multihash to hash content.

The code below is a modified version of the current Cid.make/2.

It can be tidied and the variables can be given better names (didn't think this was important for proof of concept) but it shows that we can use ex_multihash and still allow our tests to pass.

Original

  def make(input, length \\ 32) do
    hash = :crypto.hash(:sha512, input)

    hash
    |> Base.encode64()
    |> String.replace(~r/[Il0oO=\/\+]/, "", global: true)
    |> String.slice(0..(length - 1))
  end

With ex_multihash

  def make(input, length \\ 32) do
    hash1 = :crypto.hash(:sha512, input)
    {:ok, <<_multihash_code, _length, hash2::binary>>} = Multihash.encode(:sha2_512, hash1)

    hash2
    |> Base.encode64()
    |> String.replace(~r/[Il0oO=\/\+]/, "", global: true)
    |> String.slice(0..(length - 1))
  end

Multihash takes a hash and prepends info onto it.

<hash-func-type><digest-length><digest-value>

In the example above I am simply pattern matching the first 2 values, doing nothing with them and using the hash2 value in what was the original function call.

This is not how we would use it in production I'm sure, but it does show what we can use the multihash_code or the length variables for

RobStallion avatar Dec 13 '18 15:12 RobStallion

@RobStallion this is a good start. thanks! 👍 What is your appetite for doing a "deep dive" into how the CID function works?

nelsonic avatar Dec 13 '18 15:12 nelsonic

@nelsonic I am up for diving deep 💯, but just want to be clear on what you mean by CID function.

Do you mean that this example in the readme shows? Essentially create the functionality that the readme talks about?

RobStallion avatar Dec 13 '18 16:12 RobStallion

@RobStallion using multihash is part of the puzzle but it's only the "four corners" of the puzzle. Take a look at https://proto.school https://github.com/nelsonic/learn-ipfs/issues/5

nelsonic avatar Dec 13 '18 16:12 nelsonic

GOTO: #11

nelsonic avatar Dec 13 '18 20:12 nelsonic