esqueleto icon indicating copy to clipboard operation
esqueleto copied to clipboard

Calling `sum_` on an `Int` column and trying to use the result as an `Int` leads to a marshal error

Open tysonzero opened this issue 8 years ago • 17 comments

It appears that (at least in Postgresql), summing a list of ints gives back a numeric. So when you call sum_ from esqueleto on an Int column and use the result as an Int you get a persist marshal error. IMO this should probably be solved on esqueleto's end, by either inserting an explicit cast or by restricting the return type of sum_ to ones which will not raise a marshal error.

tysonzero avatar Mar 23 '18 03:03 tysonzero

The type this should have could vary by database. This is very annoying, sorry for the hassle!

bitemyapp avatar Mar 23 '18 14:03 bitemyapp

@bitemyapp Can you think of any way to approach this to prevent this from happening? It would be really nice if Esqueleto could statically prevent all marshalling errors at compile time.

One idea I can think of is by having expr be parameterized by the database type, expr db, but have basically every function not touch it, such as not_ :: expr db (Value Bool) -> expr db (Value Bool), then for functions like sum_ you could have the type (PersistField a, PersistField b, ValidSum db a b) => expr db (Value a) -> expr db (Value (Maybe b)).

I realize that would be a very big change so I'm hoping there are other approaches that can be looked into first.

tysonzero avatar Mar 23 '18 18:03 tysonzero

It would be really nice if Esqueleto could statically prevent all marshalling errors at compile time.

Yes! I consider this totally unacceptable and I'd like to fix it!

One idea I can think of is by having expr be parameterized by the database type

That's mostly not possible with the way Esqueleto is currently structured. I have a placeholder project for a rewrite at https://github.com/bitemyapp/bluetick but until I get time carved out I can only keep the lights on here.

I actually tried to create different SqlExpr types and Esqueleto instances for each database for different but related reasons and it didn't go well.

bitemyapp avatar Mar 23 '18 19:03 bitemyapp

Yeah it would be a pretty drastic and structural change, and I'm sure it will have it's own downsides, for example I think building queries that work on multiple db's might lead to really large type signatures, as constraints would be added for any non-universal thing you do.

Perhaps for now explicit casts could be used for sum_?

tysonzero avatar Mar 23 '18 21:03 tysonzero

Perhaps for now explicit casts could be used for sum_?

I would need to see what you're proposing and how it would affect PostgreSQL, MySQL, and SQLite users.

bitemyapp avatar Mar 23 '18 21:03 bitemyapp

What I am proposing is that for queries that are less constrained like sum_, an explicit cast is added to the type you end up using it as. So if you use the result of sum_ as an Int, the generated SQL includes a cast to the SQL type equivalent to a Haskell Int.

As far as I am aware this should work in strictly more situations than without a cast.

tysonzero avatar Mar 23 '18 22:03 tysonzero

Could perhaps a cast function be created so you can explicitly cast whenever you want? It seems like floor_ doesn't fix this issue either, so for now I'll just change the Haskell type I Marshall the number into, and then do the conversion on the Haskell side, unless there is a better workaround.

tysonzero avatar Apr 10 '18 22:04 tysonzero

I think I am running into a related problem with sum_

I have a query which results in a 4-tuple, using (sum_, count, min_, max_) . The sum is over a Word64 field which i have constrained to be sqltype=uinteger where uinteger is specified using CREATE DOMAIN uinteger AS bigint CHECK (VALUE >= 0). The result of the above query gets inserted as a row in another table.

However, the query just hangs the whole program.If I turn on SQL logging, the generated SQL looks good, and I can run it successfully manually. This suggests that the query and the returned result is fine and that the failure is when esqueleto reads the result. The odd thing is that this just hangs. There is no compile time or run time error reported.

If I remove the sum_ operation, or replace it with count, the query completes successful in a tiny fraction of a second.

I have tried using Natural on the Haskell side, and specifying the result of the query to be inserted as a Natural (sqltype-numeric(32,0)) and that still hangs.

Hopefully I will figure out what is going on here and be able to report a fix or workaround.

erikd avatar Feb 10 '20 23:02 erikd

The hanging behavior has me suspecting the PersistField instance for the type you're using doing some infinite recursion. Can you post the code for that?

parsonsmatt avatar Feb 10 '20 23:02 parsonsmatt

I have not hand defined any PersistField instances. I am just using the defaults for the existing data types. The query as it is now is:

queryEpochEntry :: MonadIO m => Word64 -> ReaderT SqlBackend m (Either ExplorerNodeError Epoch)
queryEpochEntry epochNum = do
    res <- select . from $ \ (tx `InnerJoin` blk) -> do
              on (tx ^. TxBlock ==. blk ^. BlockId)
              where_ (blk ^. BlockEpochNo ==. just (val epochNum))
              pure $ (sum_ (tx ^. TxOutSum), count (tx ^. TxOutSum), min_ (blk ^. BlockTime), max_ (blk ^. BlockTime))
    case listToMaybe res of
      Nothing -> pure $ Left (ENEEpochLookup epochNum)
      Just x -> pure $ convert x
  where
    convert :: (ValMay Natural, Value Word64, ValMay UTCTime, ValMay UTCTime) -> Either ExplorerNodeError Epoch
    convert tuple =
      case tuple of
        (Value (Just outSum), Value txCount, Value (Just start), Value (Just end)) ->
            Right $ Epoch outSum txCount epochNum start end
        _other -> Left $ ENEEpochLookup epochNum

I am also pretty certain its not an infinite recursion, because it is not chewing up CPU time. Its not even listed in top.

erikd avatar Feb 10 '20 23:02 erikd

I made a test case to reproduce the issue:

    describe "sum bug?" $ do
      it "works" $ do
        [(Value msum', Value count', Value _minm, Value _maxm)] <- run $ do
          let
            insertBlock i = do
              now <- liftIO getCurrentTime
              insert (Block i now)

          [_, bid, _, bid2, _] <- traverse insertBlock [3, 7, 10, 7, 11]

          forM_ [1 .. 10] $ \i -> do
            insert_ $ Tx i bid

          forM_ [5 .. 15] $ \i ->
            insert_ $ Tx i bid2

          select $
            from $ \(t `InnerJoin` b) -> do
            on $ t ^. TxBlock ==. b ^. BlockId
            where_ $ b ^. BlockEpochNo ==. val 7
            pure
              ( sum_ $ t ^. TxOutSum
              , count $ t ^. TxOutSum
              , min_ $ b ^. BlockTime
              , max_ $ b ^. BlockTime
              )

        msum' `shouldBe` Just (sum [1..10 :: Natural] + sum [5 .. 15])
        count' `shouldBe` (21 :: Int)

I get a test failure on postgresql:

esqueleto         > Failures:
esqueleto         >
esqueleto         >   test/Common/Test.hs:2298:7:
esqueleto         >   1) Tests that are common to all backends, sum bug?, works
esqueleto         >        uncaught exception: PersistException
esqueleto         >        PersistMarshalError "Failed to parse Haskell type `Natural`; expect
ed integer from database, but received: PersistRational (165 % 1). Potential solution: Check t
hat your database schema matches your Persistent model definitions."
esqueleto         >
esqueleto         >   To rerun use: --match "/Tests that are common to all backends/sum bug?/w
orks/"
esqueleto         >
esqueleto         > Randomized with seed 1489286075
esqueleto         >
esqueleto         > Finished in 44.2547 seconds
esqueleto         > 213 examples, 1 failure, 2 pending
esqueleto         > Test suite postgresql failed

But no such failure on sqlite

esqueleto         > Test suite sqlite passed

I'm not sure what the issue is, exactly :\

parsonsmatt avatar Feb 11 '20 00:02 parsonsmatt

Hmm, i wonder if something in my app is catching and silently discarding exceptions :scream: .

And it seem a little odd to me that summing a column of Word64 results in a Rational.

erikd avatar Feb 11 '20 00:02 erikd

Hmm, Rational seems to be the key. Changing convert to:

    convert :: (ValMay Rational, Value Word64, ValMay UTCTime, ValMay UTCTime) -> Either ExplorerNodeError Epoch
     convert tuple =
       case tuple of
         (Value (Just outSum), Value txCount, Value (Just start), Value (Just end)) ->
            Right $ Epoch (fromIntegral $ numerator outSum) txCount epochNum start end
         _other -> Left $ ENEEpochLookup epochNum

gets me past the hang, only to hang in another place.

erikd avatar Feb 11 '20 00:02 erikd

Turns out that an incorrect usage of async was causing exceptions to be caught and held. Since the wait on the async was never called, the exception was never propagated/reported.

Onto the esqeleto issues, the sum_ was definitely resulting in a PersistRational field (which a denominator of 1) but the count also occasionally gets returned as a PersistRational. Why is the row count returned as a Rational ? Interestingly this seems to happen on an repsert action while the insert action works as expected.

PersistMarshalError "Couldn't parse field `txCount` from table `epoch`. Failed to parse 
    Haskell type `Word64`; expected integer from database, but received: PersistRational (1393 % 1). 
    Potential solution: Check that your database schema matches your Persistent model definitions."

erikd avatar Feb 11 '20 08:02 erikd

And then I repalced repsert with replace and it worked as expected. I suspect repsert is not quite right.

erikd avatar Feb 11 '20 08:02 erikd

Can you file another bug for the repsert issue? Plus an example query and models would be super helpful :smile:

Row count should always return bigint which should work fine with Word64. I would need to see more of your code to understand why it's getting coerced into a Rational.

parsonsmatt avatar Feb 11 '20 16:02 parsonsmatt

Restricting the type of sum_ is actually tricky - we need to put a CAST in for the output type, which may be a bit more involved than we'd like.

There's an upstream fix in persistent we can do.

parsonsmatt avatar Feb 06 '24 00:02 parsonsmatt