DistributedFactorGraphs.jl icon indicating copy to clipboard operation
DistributedFactorGraphs.jl copied to clipboard

Investigate splitting factor operational part from 'constant' part of factor

Open Affie opened this issue 2 years ago • 12 comments

Why?

  • Simplify serialization, pack/unpack factor only depends on the factor and does not need the full factor graph
  • Simplify PackedFunctionNodeData{T}, GenericFunctionNodeData{T} where T <: Union{<:AbstractPackedFactor, <:AbstractFactor, <:FactorOperationalMemory}

Questions to be answered If it can be split so that factor solver data only contains the working memory serialization becomes easy as solver data is just ignored and rehydrated when unpacked for use by the solver and there can possibly be no PackedFactor.

  • What fields are always constant?
    • DF: Almost everything in FactorNodeData is constant and can move to FactorFDG -- but, need to checking how CCW is built upon FactorCompute
  • Will it impact more complicated factor types? No, this should not touch the current usrfun field functionallity.
  • Should we build factor data (w.r.t. the compute part FactorOperationalMemory/CCW) lazily from DFG and only "inflate" when used in IIF. That is, move reconstFactorData out of DFG?
    • JT I believe so, that way we don't need a fg to unpack a factor and don't need to calculate ccw if never used (eg plotting).

'constant' in this case means not changed by the solver

In other words: Should we move non FactorOperationalMemory to DFGFactor: Maybe: fnc, multihypo, nullhypo, inflation ? that way we split solverData <: FactorOperationalMemory and constants

Affie avatar Mar 23 '23 11:03 Affie

Note on CCW: currently, only hyporecipe.certainhypo and usrfnc! are pulled out of CCW to be persisted, the rest are thrown away. Also code TODO: improve _createCCW for hypotheses and certainhypo field recovery when deserializing reconstitute from stored data

Affie avatar Mar 03 '25 10:03 Affie

Factor Field Comparison

Field FactorDFG FactorCompute
id Union{UUID, Nothing} Union{UUID, Nothing}
label Symbol Symbol
tags Vector{Symbol} Set{Symbol}
_variableOrderSymbols Vector{Symbol} NTuple{N, Symbol}
timestamp ZonedDateTime ZonedDateTime
nstime String Nanosecond
solvable Int Base.RefValue{Int}
metadata String smallData::Dict{Symbol, SmallDataTypes}
solverData data::String Base.RefValue{GenericFunctionNodeData{T}}
fnctype String Not present
_version String Not present

Affie avatar Mar 17 '25 16:03 Affie

@kwdef struct CommonConvWrapper{T <: AbstractFactor, VT <: Tuple, TP <: Base.RefValue{<:Tuple}, CT,  AM <: AbstractManifold, HR <: HypoRecipeCompute, MT, G} <: FactorOperationalMemory
  usrfnc::T
  fullvariables::VT
  varValsAll::TP
  dummyCache::CT = nothing
  manifold::AM = getManifold(usrfnc!)
  partialDims::Vector{<:Integer} = collect(1:manifold_dimension(manifold))
  partial::Bool = false
  nullhypo::Float64 = 0.0
  inflation::Float64 = SolverParams().inflation
  hyporecipe::HR = HypoRecipeCompute(;activehypo=collect(1:length(varValsAll)))
  measurement::Vector{MT} = Vector(Vector{Float64}())
  varidx::Base.RefValue{Int} = Ref(1)
  particleidx::Base.RefValue{Int} = Ref(1)
  res::Vector{Float64} = zeros(manifold_dimension(manifold))
  _gradients::G = nothing
end
@kwdef mutable struct GenericFunctionNodeData{T <: Union{<:AbstractPackedFactor, <:AbstractFactor, <:FactorOperationalMemory},}
    eliminated::Bool = false
    potentialused::Bool = false
    edgeIDs::Vector{Int} = Int[]
    fnc::T
    multihypo::Vector{Float64} = Float64[]
    certainhypo::Vector{Int} = Int[]
    nullhypo::Float64 = 0.0
    solveInProgress::Int = 0
    inflation::Float64 = 0.0
end

Overlap: fnc (almost), nullhypo and inflation

Affie avatar Mar 17 '25 16:03 Affie

It looks like we can look at main differences as:

Field FactorDFG FactorCompute
observationFnc Serialized JSONObject T <: ObservationFunction
computeMemory Not Serialized C <: FactorOperationalMemory*
  • *I think FactorOperationalMemory should be lazily initialized

Affie avatar Mar 17 '25 17:03 Affie

Also rename to _State as part of this refactor. #1121

Does Observation live next to or under State.

FactorCompute

  • state
  • observation
  • computeMemory

or FactorCompute

  • state
    • observation
  • computeMemory

Note: some state and observation fields may be copied to computeMemory as one CCW object.

Affie avatar Mar 26 '25 18:03 Affie

I went through the code and the only thing that might not be ideal is we may need something like computeMem::Base.RefValue{FactorOperationalMemory}

Advantages:

  • unblocks #1077
  • unblocks IIFTypes and RoMETypes subprojects
  • We can do lazy inflation of OperationalMemory (CCW)
  • Simplify serialization, pack/unpack factor only depends on the factor and does not need the full factor graph
  • Simplify PackedFunctionNodeData{T}, GenericFunctionNodeData{T} where T <: Union{<:AbstractPackedFactor, <:AbstractFactor, <:FactorOperationalMemory}

The disadvantage is we still have dynamic dispatch or we rebuild the entire FactorCompute when ccw is needed.

  • There will still be dynamic dispatch on ccw, FactorOperationalMemory is an abstract type.
  • We can maybe just rebuild the entire factor once the type is known. In that case we can have Nothing if it is not yet inflated. It will then be similar to the current state with the type still not know to the compiler.

Affie avatar Mar 28 '25 13:03 Affie

Does Observation live next to or under State.

Do next to. User only creates the observation, computer creates the state.

DUPLICATION AT: #1119

  • [ ] add new field to FactorDFG.observFnc (side by side with old .data / .fnctype)

dehann avatar Apr 02 '25 02:04 dehann

From: #1118

I think we can probably close this and only discuss in the issues #1105 and [992 here] to avoid duplication.

As a general philosophy, the following is bad and we should not have this anywhere: JSON.write({"label": JSON.write("namething")})
Please elaborate why it is so bad ~~in metadata and~~ FactorDFG.observation~~data~~? I don't see a way around it. The options there are currently "base64 encoded JSON string" or a "JSON string"

Agree -- Unavoidable for any user defined factor. See below for list of strange serialization that have already happened in the past. We must not restrict user factor definitions to adhere to JSON types as nested structures. The lazy user would just store a factor as e.g. factor.observation = {"payload": __b46str__, "content": "application/octet-stream"}. Basically that is what I do all the time, and have seen it around in may other places. So we are just conforming to this as a norm.


Existing and Strange user factor serializations (the fun stuff)

List of more complicated factor type that need to be unaffected (tested downstream) such as (i.e. not in Distributions.jl):

  • [ ] IMU: https://github.com/JuliaRobotics/RoME.jl/blob/ca98e9c18f8837d92863cb798eb7f1ec5adcbbfc/src/factors/Inertial/IMUDeltaFactor.jl#L540-L568
  • [ ] LIDAR: https://github.com/NavAbility/NavAbilitySDK.jl/blob/4bddd83fd1d66cf88077f9a69871df3b4530621d/test/unit/testFactors.jl#L61-L68
    • Notes: serialization for that ScatterAlign factor uses preambleCache and works for either parched or unparched serialization modes. https://github.com/JuliaRobotics/Caesar.jl/blob/develop/test/testStashing_SAP.jl
  • [ ] APRILTAGS: https://github.com/JuliaRobotics/Caesar.jl/blob/develop/test/testPose2AprilTag4Corner.jl
  • [ ] SASBearing2D: https://github.com/JuliaRobotics/Caesar.jl/blob/9df2df54d4a1bc271afafd5730d35c95af4c67f5/src/beamforming/SASBearing2D.jl#L219-L313
  • [x] FLUX.jl NNFactors: https://github.com/JuliaRobotics/IncrementalInference.jl/blob/dab3ca94bfb937629aa3e2751d3c5bf2f84d75fe/test/testFluxModelsDistribution.jl#L66-L68
  • [ ] DEFactors (no serialization yet): https://github.com/JuliaRobotics/IncrementalInference.jl/blob/develop/test/testDERelative.jl

dehann avatar Apr 02 '25 02:04 dehann

I went through the code and the only thing that might not be ideal is we may need something like computeMem::Base.RefValue{FactorOperationalMemory}

Sure, just do it and we can revise again later. The advantages are basically all consolidation points. very happy to see this more consolidated, if we have to do it through Ref then so be it. Adding decision label to help unblock.

dehann avatar Apr 02 '25 02:04 dehann

*I think FactorOperationalMemory should be lazily initialized

Sure, I think that should be fine, at least in the medium term. Think one day this will likely show as a performance issue but we are still a ways off at this time. Will improve and refactor for performance later.

Once upon a time we needed to expand the old unpack during deserialization functions. It became clear that a FactorOperationalMemory would be needed. So I worked under the assumption it only happens on creation of a factor or when a factor is deserialized. I remember being bit worried about this or that corner case.

dehann avatar Apr 02 '25 02:04 dehann

In other words: Should we move non FactorOperationalMemory to DFGFactor: Maybe: fnc, multihypo, nullhypo, inflation ? that way we split solverData <: FactorOperationalMemory and constants

This all these are addressed. hi @Affie , you should be unblocked on this issue now?

dehann avatar Apr 02 '25 02:04 dehann

On observation serialization

  • Must have a user provided pack and unpack
  • The packed observation must be JSON3.writeable

Affie avatar Apr 08 '25 18:04 Affie

Update on https://github.com/JuliaRobotics/DistributedFactorGraphs.jl/issues/992#issuecomment-2787285338

Here’s a summary of your observation serialization requirements:

  • Default Serialization:
    Observations use JSON.json and JSON.parse for round-tripping, with default pack(x) = x and unpack(x) = x.

  • Custom Serialization:
    If round-tripping fails, users must provide custom pack and unpack methods, or extend StructUtils.lower and StructUtils.lift.

    • The packed observation must have a unique, concrete type (not a generic String or Dict) to ensure correct dispatch for unpack, avoiding type piracy.

Affie avatar Nov 11 '25 12:11 Affie