Untracked memory in OSL
Using valgrind's massif profiling tool, I've found several large blocks of memory that aren't accounted for, where the unaccounted memory is dwarfing the accounted. The biggest is that the ShadingContext memory isn't accounted for, including all the constant pools. The vectors that point to the values for parameters and such are accounted in things like paramvals, but the actual values in the pools that those point to aren't accounted. The shader objects and their execution space also aren't accounted for. The final big source of missing memory is stuff that LLVM is using, but I can understand not counting that as it's much harder.
Here's what OSL reports.
OSL Shading Current Memory 86,549,052
OSL Shading Peak Memory 98,972,980
osl_mem_master_current 6,469,424
osl_mem_master_peak 6,469,424
osl_mem_master_ops_current 3,348,864
osl_mem_master_ops_peak 3,348,864
osl_mem_master_args_current 431,864
osl_mem_master_args_peak 431,864
osl_mem_master_syms_current 2,660,352
osl_mem_master_syms_peak 2,660,352
osl_mem_master_defaults_current 12,372
osl_mem_master_defaults_peak 12,372
osl_mem_master_consts_current 6,996
osl_mem_master_consts_peak 6,996
osl_mem_inst_current 80,079,628
osl_mem_inst_peak 92,503,556
osl_mem_inst_syms_current 57,871,904
osl_mem_inst_syms_peak 70,255,264
osl_mem_inst_paramvals_current 6,811,456
osl_mem_inst_paramvals_peak 6,811,456
osl_mem_inst_connections_current 2,783,484
osl_mem_inst_connections_peak 3,484,800
And here is a cleaned up version of the valgrind info for things I'm pretty sure aren't counted anywhere.
---- Shading context memory ----
->24.63% (301,596,672B) 0x871AC55: OSL::ShadingContext::ShadingContext(OSL::pvt::ShadingSystemImpl&, OSL::PerThreadInfo*)
->07.70% (94,248,960B) 0x871ABD8: OSL::ShadingContext::ShadingContext(OSL::pvt::ShadingSystemImpl&, OSL::PerThreadInfo*)
| ->07.70% (94,248,960B) 0x86C9F7C: OSL::pvt::ShadingSystemImpl::get_context(OSL::PerThreadInfo*, OpenImageIO::v1_5::TextureSystem::Perthread*)
->07.70% (94,248,960B) 0x871ABD8: OSL::ShadingContext::ShadingContext(OSL::pvt::ShadingSystemImpl&, OSL::PerThreadInfo*)
| ->07.70% (94,248,960B) 0x86C9F7C: OSL::pvt::ShadingSystemImpl::get_context(OSL::PerThreadInfo*, OpenImageIO::v1_5::TextureSystem::Perthread*)
->00.41% (5,043,792B) 0x86C9F68: OSL::pvt::ShadingSystemImpl::get_context(OSL::PerThreadInfo*, OpenImageIO::v1_5::TextureSystem::Perthread*)
->00.38% (4,712,448B) 0x871AA1C: OSL::ShadingContext::ShadingContext(OSL::pvt::ShadingSystemImpl&, OSL::PerThreadInfo*)
| ->00.38% (4,712,448B) 0x86C9F7C: OSL::pvt::ShadingSystemImpl::get_context(OSL::PerThreadInfo*, OpenImageIO::v1_5::TextureSystem::Perthread*)
---- actual value pools for strings, floats, ints, etc -----
->01.10% (13,486,024B) 0x7B132E8: __gnu_cxx::new_allocator<OpenImageIO::v1_5::ustring>::allocate(unsigned long, void const*) (new_allocator.h:89)
| ->01.10% (13,486,024B) 0x7B14468: std::_Vector_base<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::_M_allocate(unsigned long) (stl_vector.h:140)
| ->00.65% (8,000,000B) 0x87B442A: OSL::pvt::RuntimeOptimizer::add_constant(OSL::pvt::TypeSpec const&, void const*, OpenImageIO::v1_5::TypeDesc)
| ->00.41% (5,035,640B) 0x7B14188: std::_Vector_base<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::_Vector_base(unsigned long, std::allocator<OpenImageIO::v1_5::ustring> const&) (stl_vector.h:113)
| | ->00.41% (5,035,640B) 0x7B24094: std::_Vector_base<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::_Vector_base(unsigned long, std::allocator<OpenImageIO::v1_5::ustring> const&) (stl_vector.h:110)
| | ->00.41% (5,035,640B) 0x7B147A8: std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::vector(std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> > const&) (stl_vector.h:242)
| | ->00.41% (4,985,984B) 0x7B23608: OSL::OSLQuery::Parameter::Parameter(OSL::OSLQuery::Parameter const&) (oslquery.h:61)
| ->00.04% (450,384B) 0x7B152BD: OpenImageIO::v1_5::ustring* std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::_M_allocate_and_copy<__gnu_cxx::__normal_iterator<OpenImageIO::v1_5::ustring const*, std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> > > >(unsigned long, __gnu_cxx::__normal_iterator<OpenImageIO::v1_5::ustring const*, std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> > >, __gnu_cxx::__normal_iterator<OpenImageIO::v1_5::ustring const*, std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> > >) (stl_vector.h:963)
| ->00.04% (450,384B) 0x7B15D28: std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::operator=(std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> > const&) (vector.tcc:164)
| ->00.04% (450,384B) 0x8737FD9: OSL::pvt::ShaderInstance::parameters(OpenImageIO::v1_5::ParamValueList const&)
->00.96% (11,764,220B) 0x65FD9EC: __gnu_cxx::new_allocator<float>::allocate(unsigned long, void const*) (new_allocator.h:89)
| ->00.96% (11,764,220B) 0x65FE928: std::_Vector_base<float, std::allocator<float> >::_M_allocate(unsigned long) (stl_vector.h:140)
| ->00.49% (6,023,576B) 0x68B6319: float* std::vector<float, std::allocator<float> >::_M_allocate_and_copy<__gnu_cxx::__normal_iterator<float const*, std::vector<float, std::allocator<float> > > >(unsigned long, __gnu_cxx::__normal_iterator<float const*, std::vector<float, std::allocator<float> > >, __gnu_cxx::__normal_iterator<float const*, std::vector<float, std::allocator<float> > >) (stl_vector.h:963)
| | ->00.49% (6,023,576B) 0x68B6BD4: std::vector<float, std::allocator<float> >::operator=(std::vector<float, std::allocator<float> > const&) (vector.tcc:164)
| | ->00.49% (6,023,576B) 0x8737FBB: OSL::pvt::ShaderInstance::parameters(OpenImageIO::v1_5::ParamValueList const&)
| ->00.33% (4,000,000B) 0x68B62A1: float* std::vector<float, std::allocator<float> >::_M_allocate_and_copy<std::move_iterator<float*> >(unsigned long, std::move_iterator<float*>, std::move_iterator<float*>) (stl_vector.h:963)
| | ->00.33% (4,000,000B) 0x68B658A: std::vector<float, std::allocator<float> >::reserve(unsigned long) (vector.tcc:72)
| | ->00.33% (4,000,000B) 0x87C0D25: OSL::pvt::ShadingSystemImpl::alloc_float_constants(unsigned long)
| ->00.14% (1,705,484B) 0x68B5EE4: std::_Vector_base<float, std::allocator<float> >::_Vector_base(unsigned long, std::allocator<float> const&) (stl_vector.h:113)
| | ->00.14% (1,705,484B) 0x68B9788: std::_Vector_base<float, std::allocator<float> >::_Vector_base(unsigned long, std::allocator<float> const&) (stl_vector.h:110)
| | ->00.14% (1,705,484B) 0x6CA5504: std::vector<float, std::allocator<float> >::vector(std::vector<float, std::allocator<float> > const&) (stl_vector.h:242)
| | ->00.14% (1,705,484B) 0x7B235DF: OSL::OSLQuery::Parameter::Parameter(OSL::OSLQuery::Parameter const&) (oslquery.h:61)
| ->00.44% (5,404,828B) 0x65597EC: std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) (stl_vector.h:140)
| | ->00.33% (4,000,000B) 0x66A0411: int* std::vector<int, std::allocator<int> >::_M_allocate_and_copy<int*>(unsigned long, int*, int*) (stl_vector.h:963)
| | | ->00.33% (4,000,000B) 0x66A19D3: std::vector<int, std::allocator<int> >::reserve(unsigned long) (vector.tcc:72)
| | | ->00.33% (4,000,000B) 0x87B3D7F: OSL::pvt::RuntimeOptimizer::add_constant(OSL::pvt::TypeSpec const&, void const*, OpenImageIO::v1_5::TypeDesc)
| | ->00.07% (874,648B) 0x665847C: std::vector<int, std::allocator<int> >::_M_insert_aux(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int const&) (vector.tcc:322)
| | | ->00.07% (874,608B) 0x877B2C7: OSL::pvt::OSOReaderToMaster::instruction_arg(char const*)
| | ->00.04% (455,452B) 0x6657455: int* std::vector<int, std::allocator<int> >::_M_allocate_and_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(unsigned long, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) (stl_vector.h:963)
| | | ->00.04% (455,452B) 0x6657B50: std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) (vector.tcc:164)
| | | ->00.04% (455,452B) 0x8737F9D: OSL::pvt::ShaderInstance::parameters(OpenImageIO::v1_5::ParamValueList const&)
-- the actual shader objects ---
->01.05% (12,800,024B) 0x86D335C: OSL::pvt::ShadingSystemImpl::Shader(OpenImageIO::v1_5::string_view, OpenImageIO::v1_5::string_view, OpenImageIO::v1_5::string_view)
| ->01.05% (12,800,024B) 0x86D38E9: OSL::ShadingSystem::Shader(OpenImageIO::v1_5::string_view, OpenImageIO::v1_5::string_view, OpenImageIO::v1_5::string_view)
--- execution scratch? ---
->00.43% (5,219,886B) 0x6CDA194: __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) (new_allocator.h:89)
| ->00.43% (5,219,886B) 0x7870E74: std::_Vector_base<char, std::allocator<char> >::_M_allocate(unsigned long) (stl_vector.h:140)
| ->00.43% (5,219,886B) 0x78735FB: std::vector<char, std::allocator<char> >::_M_fill_insert(__gnu_cxx::__normal_iterator<char*, std::vector<char, std::allocator<char> > >, unsigned long, char const&) (vector.tcc:414)
| ->00.43% (5,219,044B) 0x871BC68: OSL::ShadingContext::execute(OSL::ShaderGroup&, OSL::ShaderGlobals&, bool)
----- LLVM ----
->01.71% (20,971,520B) 0x912397B: llvm::ValueHandleBase::AddToUseList()
| ->01.03% (12,582,912B) 0x8DAF743: llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*> >::operator[](llvm::Value const* const&)
| | ->00.69% (8,388,608B) 0x8E8E488: (anonymous namespace)::PruningFunctionCloner::CloneBlock(llvm::BasicBlock const*, std::vector<llvm::BasicBlock const*, std::allocator<llvm::BasicBlock const*> >&)
| | ->00.34% (4,194,304B) 0x8EE83C1: llvm::MapValue(llvm::Value const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*> >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*)
| | | ->00.34% (4,194,304B) 0x8EE81C6: llvm::MapValue(llvm::Value const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*> >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*)
| | | ->00.34% (4,194,304B) 0x8EE8517: llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*> >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*)
| ->00.34% (4,194,304B) 0x8E8E4DC: (anonymous namespace)::PruningFunctionCloner::CloneBlock(llvm::BasicBlock const*, std::vector<llvm::BasicBlock const*, std::allocator<llvm::BasicBlock const*> >&)
| | ->00.34% (4,194,304B) 0x8E8EF79: llvm::CloneAndPruneFunctionInto(llvm::Function*, llvm::Function const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*> >&, bool, llvm::SmallVectorImpl<llvm::ReturnInst*>&, char const*, llvm::ClonedCodeInfo*, llvm::DataLayout const*, llvm::Instruction*)
| ->00.34% (4,194,304B) 0x8B42E96: llvm::BitcodeReaderValueList::push_back(llvm::Value*)
| | ->00.34% (4,194,304B) 0x8B3F5F3: llvm::BitcodeReader::ParseModule(bool)
->00.61% (7,425,672B) 0x9121D11: llvm::User::operator new(unsigned long, unsigned int)
| ->00.35% (4,253,040B) 0x9090BB6: llvm::ConstantInt::get(llvm::LLVMContext&, llvm::APInt const&)
| | ->00.19% (2,339,352B) 0x89B7C6C: llvm::SelectionDAG::getConstant(llvm::APInt const&, llvm::EVT, bool)
| | | ->00.16% (1,913,688B) 0x89B7D7E: llvm::SelectionDAG::getConstant(unsigned long, llvm::EVT, bool)
| | | ->00.03% (421,632B) 0x89B8D87: llvm::SelectionDAG::FoldConstantArithmetic(unsigned int, llvm::EVT, llvm::SDNode*, llvm::SDNode*)
| | ->00.09% (1,161,144B) 0x8770E51: OSL::pvt::LLVM_Util::constant(unsigned long)
| | | ->00.09% (1,161,144B) 0x87730AE: OSL::pvt::LLVM_Util::constant_ptr(void*, llvm::PointerType*)
| | | ->00.07% (893,304B) 0x8759E5E: OSL::pvt::BackendLLVM::llvm_assign_initial_value(OSL::pvt::Symbol const&)
| | ->00.06% (679,392B) 0x9090D66: llvm::ConstantInt::get(llvm::IntegerType*, unsigned long, bool)
| ->00.25% (3,018,640B) 0x9093F54: llvm::ConstantCreator<llvm::ConstantExpr, llvm::Type, llvm::ExprMapKeyType>::create(llvm::Type*, llvm::ExprMapKeyType const&, unsigned short)
->00.51% (6,249,363B) 0x91475DA: llvm::MallocSlabAllocator::Allocate(unsigned long)
| ->00.42% (5,193,728B) 0x9147382: llvm::BumpPtrAllocator::StartNewSlab()
| | ->00.42% (5,173,248B) 0x9147492: llvm::BumpPtrAllocator::Allocate(unsigned long, unsigned long)
| | | ->00.16% (2,015,232B) 0x911AF7F: llvm::StructType::setBody(llvm::ArrayRef<llvm::Type*>, bool)
| | | ->00.11% (1,310,720B) 0x911BCF0: llvm::PointerType::get(llvm::Type*, unsigned int)
| | | ->00.07% (892,928B) 0x911B64D: llvm::StructType::create(llvm::LLVMContext&, llvm::StringRef)
| | | ->00.07% (802,816B) 0x911DAC0: llvm::FunctionType::get(llvm::Type*, llvm::ArrayRef<llvm::Type*>, bool)
| | | ->00.01% (135,168B) 0x911C772: llvm::ArrayType::get(llvm::Type*, unsigned long)
| ->00.09% (1,055,635B) 0x91474C8: llvm::BumpPtrAllocator::Allocate(unsigned long, unsigned long)
| ->00.09% (1,055,635B) 0x911AF7F: llvm::StructType::setBody(llvm::ArrayRef<llvm::Type*>, bool)
| ->00.09% (1,055,635B) 0x911B6FE: llvm::StructType::create(llvm::LLVMContext&, llvm::ArrayRef<llvm::Type*>, llvm::StringRef, bool)
| ->00.09% (1,055,635B) 0x875AFC7: OSL::pvt::BackendLLVM::llvm_type_groupdata()
->00.37% (4,527,960B) 0x9094622: std::_Rb_tree<std::pair<llvm::Type*, llvm::ExprMapKeyType>, std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*>, std::_Select1st<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> >, std::less<std::pair<llvm::Type*, llvm::ExprMapKeyType> >, std::allocator<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> > >::_M_insert_(std::_Rb_tree_node_base const*, std::_Rb_tree_node_base const*, std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> const&)
| ->00.37% (4,491,720B) 0x909482E: std::_Rb_tree<std::pair<llvm::Type*, llvm::ExprMapKeyType>, std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*>, std::_Select1st<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> >, std::less<std::pair<llvm::Type*, llvm::ExprMapKeyType> >, std::allocator<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> > >::_M_insert_unique(std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> const&)
| | ->00.37% (4,491,720B) 0x9094968: std::_Rb_tree<std::pair<llvm::Type*, llvm::ExprMapKeyType>, std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*>, std::_Select1st<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> >, std::less<std::pair<llvm::Type*, llvm::ExprMapKeyType> >, std::allocator<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> > >::_M_insert_unique_(std::_Rb_tree_const_iterator<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> >, std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> const&)
| | ->00.37% (4,491,720B) 0x9094C3A: llvm::ConstantUniqueMap<llvm::ExprMapKeyType, llvm::ExprMapKeyType const&, llvm::Type, llvm::ConstantExpr, false>::getOrCreate(llvm::Type*, llvm::ExprMapKeyType const&)
| | ->00.37% (4,491,720B) 0x9091CC7: llvm::ConstantExpr::getIntToPtr(llvm::Constant*, llvm::Type*)
->00.30% (3,670,016B) 0x9096FAE: llvm::DenseMapBase<llvm::DenseMap<llvm::DenseMapAPIntKeyInfo::KeyTy, llvm::ConstantInt*, llvm::DenseMapAPIntKeyInfo>, llvm::DenseMapAPIntKeyInfo::KeyTy, llvm::ConstantInt*, llvm::DenseMapAPIntKeyInfo>::grow(unsigned int)
| ->00.30% (3,670,016B) 0x9090C16: llvm::ConstantInt::get(llvm::LLVMContext&, llvm::APInt const&)
| ->00.30% (3,670,016B) 0x8770E51: OSL::pvt::LLVM_Util::constant(unsigned long)
I should also mention that this trace was from the 1.6.8 cut.
Ooh, I can believe that we forgot to count a couple things. I'll get on that, should be easy.
LLVM's internal data structures are going to be very hard to account for, but we should be able to get all the OSL-side stuff.
Are you only concerned that we aren't counting properly, or are you also suspecting that we may be leaking or allocating more than we need?
So far it just seems like not counting -- it doesn't look like any leaking, but I haven't dug in deep enough to see if there are any good places to trim fat.