OpenDataStandards icon indicating copy to clipboard operation
OpenDataStandards copied to clipboard

Number of occupants - addition to loc file

Open stufraser1 opened this issue 4 years ago • 48 comments

Propose that OED has a field added to record the number of occupants at a location to support use of OED to produce human-focussed outputs - number of people affected, number of fatalities. Ideally, this addition would have supporting type fields to define the age_group, sex, ethnicity, and nighttime/daytime occupancy, to support more detailed breakdown of risk to people, but this may lead multiple rows per location to reflect the breakdown.

Accepting that most OASIS models may not support the estimation of fatalities / number of people located in damaged buildings yet, there is call for this, and at present workaround is to place the number of occupants in BuildingTIV. OED should take a lead on enabling the data to be recorded in the exposure dataset, with the aim that this will enable modelling of these fields explicitly, not through the workaround.

stufraser1 avatar Jul 01 '21 11:07 stufraser1

@stufraser1 Can be you specific as to exactly what fields/data you think are required? The easy and temporary approach could be to use the current 'Flexi' fields in OED but I think more effort and development will be required to accommodate the humanitarian exposure data permanently in OED.

Depending on the number of fields you think are required, it may be worth considering a separate input file that could link to the loc file (possibly via a unique ID)? Like you say, there are currently no models in Oasis that uses this data so how its supported in OED depends on how you think it will be used.

Also, considering Parquet is the format we are moving to, it maybe able to be included in the one proposed 'package'?

@benhayes21 @johcarter any ideas?

MattDonovan82 avatar Jul 14 '21 10:07 MattDonovan82

I understand some human loss models like influenza pandemic and terrorism use mortality rate as the measure of damage. If we were to have a number of occupants field then the number of deaths could be computed as mortality rate * number of occupants, which has a symmetry with the economic damage/loss method for property. This would be a good start I think. As regards the other fields, we could add them when we understand what vulnerability characteristics are required by different types of human loss models.

johcarter avatar Jul 14 '21 10:07 johcarter

As @johcarter says - the most straightforward and highest priority would be to add a field e.g. 'numOccupants' so that number of occupants can be assigned to a location.

Models using OED would then need to define a vulnerability curve that related mortality to intensity (per structural vulnerability curves) or estimate mortality from the level of building damage at that location (as is done for seismic risk). Either way, having numOccupants in OED would support the development of that.

@johcarter Does LMF allow vulnerability curves to be related to numOccupants, rather than being linked to TIV fields by default? This would be required.

If that single field was added without the identifiers (day/nighttime, sex, age, disability) then I expect a workaround for not having the identifiers could be to set up a different account for each group as required by certain analyses.

stufraser1 avatar Jul 14 '21 13:07 stufraser1

There is a NumberOfEmployees field in OED. Does NumberOfOccupants mean the same thing @stufraser1 @johcarter @MattDonovan82 ? If the meaning is the same, but the new name is more clear, then from my perspective it can be updated in the OED. This was included with the thought of having terrorism model or models that require mortality. The other identifiers can be added, but there would need to be a standard on how these are defined. E.g. Shift - is it just the two value - day / night? Age - is that a range of pre-defined ages, or does it allow to enter individual age? Disability - is this "yes" or "no" or a list of pre-defined disabilities (which becomes similar to an occupancy code perhaps?). I am assuming that anyone using that someone using those mortality linked fields will have very few fields from the traditional property modelling, so there should ot be an issue in defining those fields in loc file.

aiste-kalinauskaite avatar Jul 14 '21 13:07 aiste-kalinauskaite

Agree numberOfOccupants is more clear. I think that the demographics of the occupants should likely be left out in terms of cat modelling, but combining this with Occupancy Code should allow modelers to make good approximations on if it's full at night or in the day. There are too many potential contributing factors I think across the occupancy type (especially when it gets into Industrial) to properly capture a value that would provide meaningful.

Perhaps there may also be a component of it in the policy side (MaximumAnyOneLife) in combination with numOccupants?

dsokol avatar Jul 14 '21 13:07 dsokol

Not at present, @stufraser1, the use of the TIV fields are to produce loss by multiplying by damage and this is hardcoded in OasisLMF. This part would need to be developed, but is not difficult. "Does LMF allow vulnerability curves to be related to numOccupants, rather than being linked to TIV fields by default? This would be required."

Note that no core LMF development would be needed to match exposure data to a mortality vulnerability curve, as the model developer decides which OED fields define vulnerability and writes the key service code to plug into the system.

johcarter avatar Jul 14 '21 13:07 johcarter

@aiste-kalinauskaite you make a great point -- The field NumberOfEmployees does the same thing, but the meaning is clearer with 'occupants' because most modelling of human impact outside of the industry uses the nighttime resident population (derived from census data), not only employees in commercial properties. A change in name would mean the field is recognised as appropriate for both cases.

Modelling population in different age ranges / disability groups is not yet performed with enough regularity to be sure about what values would be most useful. For age, ranges would be most appropriate based on my experience - but those ranges could change in different contexts. Generally for population I have seen day versus night (but some 'edge cases' exists where summer versus winter scenarios might be used, e.g., in tourist hotspots). Disability: I've seen able-bodied / disabled without further breakdown, but again not enough to know what useful values might emerge in this area.

Very happy to see the team recognises a way forward on this, and that it doesn't seem too difficult to achieve with what has already been coded.

stufraser1 avatar Jul 14 '21 13:07 stufraser1

hi @stufraser1 - sorry just picking this up again. So are we all agreed that for a quick resolution on this a 'NumOccupants' field should be added to OED with a supporting description? We can include this in v2.0.0 unless any push back?

The inclusion of the extra fields for demographic data for humanitarian uses is a larger and separate topic.

MattDonovan82 avatar Sep 02 '21 10:09 MattDonovan82

Yes that would be a good first step, though I think the additions should follow not too far behind. Several research projects looking to make use of additional demographic factors in OED and possibly in LMF modelling environment coming up.

On Thu, 2 Sept 2021 at 11:43, MattDonovan82 @.***> wrote:

hi @stufraser1 https://github.com/stufraser1 - sorry just picking this up again. So are we all agreed that for a quick resolution on this a 'NumOccupants' field should be added to OED with a supporting description?

The inclusion of the extra fields for demographic data for humanitarian uses is a larger and separate topic.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OasisLMF/OpenDataStandards/issues/40#issuecomment-911523209, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7PNYXYZFBF4CTGVX6PVATT75IM5ANCNFSM47UKOXSQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

stufraser1 avatar Sep 03 '21 16:09 stufraser1

@stufraser1 It makes sense that we add the additional fields in v2.0.0 aswell then. Can you please send over the fields you think need to be included and we will look to getting them added to the OED schema.

thanks.

MattDonovan82 avatar Sep 06 '21 08:09 MattDonovan82

@stufraser1 after some discussion, we thought it might be better to get all the required data fields for humanitarian uses together in a separate input file? This might be cleaner and easier to use for those models/tools you mention rather than adding to the existing loc file in OED.

It would also be ok to repeat any fields needed that already exist in OED such as street, city, state etc used for location. Are you ok to put this file together for review and include all the fields you think will be required? We can then name it accordingly and house it in the ODS repo with some commentary.

Matt

MattDonovan82 avatar Sep 07 '21 11:09 MattDonovan82

Initial Proposal for review contained in the attached xlsx., comprising the addition of an occupantPeriod field in Location table to support NumOccupants, accompanying code values for occupantPeriod, and a new table to contain the optional more-detailed breakdown of occupant numbers. Readme is also included in the workbook.

HumanImpact_OEDAdditions_Sept2021.xlsx

stufraser1 avatar Sep 07 '21 11:09 stufraser1

Small but crucial update in this version 2: additional row UID required. HumanImpact_OEDAdditions_Sept2021_v2.xlsx

stufraser1 avatar Sep 09 '21 05:09 stufraser1

the addition being the new 'LocPopNumber'

johcarter avatar Sep 09 '21 08:09 johcarter

@stufraser1 apologies as I've just read your notes. To confirm, you're proposing to add one new field to the current loc file ('OccupantPeriod') and changing 'NumEmployees' to 'NumOccupants'? All other new fields will go into the new 'LocPopulation' file?

We may need the Steer Co to review the change of 'NumEmployees' to 'NumOccupants' as this may possibly cause issues to those already using OED, although unlikely.

Is there a reason why the new 'OccupantPeriod' will only be in the 'Loc' file and not the 'LocPopulation' file aswell like 'NumOccupants' will be?

MattDonovan82 avatar Sep 09 '21 09:09 MattDonovan82

I think 'OccupantPeriod' should be in the same file as 'NumOccupants' because then you can define the number of occupants, and whether they are daytime employees or night-time residents with only the one location file. This seems the most likely use case for most insurance users.

The LocPopulation file then only comes into play when there is a more detailed breakdown required. For completeness you're probably correct the 'OccupantPeriod' could be repeated in LocPopulation, but I would not omit it from Location, since then it would make LocPopulation a requirement just to define the OccupantPeriod.

A consideration in repeating those 1 or 2 fields in LocPopulation: Does the number of fields repeated in both files put an extra burden on users to make sure they match? Does the model validate that the numbers match in both, or does one file take precedent over the other if there is a mismatch? Equally, how to handle the potential issue of the total population per location not equalling the sum of all classification fields for that location?

stufraser1 avatar Sep 09 '21 10:09 stufraser1

@stufraser1 I would add 'OccupantPeriod' and 'NumOccupants' to both files.

Repeating these in both files shouldn't be a problem but in what case do you forsee the user needing both files? Please correct me if I have not understood this properly, but if a user is only populating and using the 'NumOccupants' and 'OccupantPeriod' fields then they will only need the 'loc' file. If they are doing a more detailed analysis and using the additional fields for population breakdown then they will only need to use the LocPopulation file, is it one or the other not both?

For property modelling, the portfolio and account numbers appear in both the 'loc' and the 'account' files and need to be identical for Oasis to know what account details correspond to the correct exposures.

MattDonovan82 avatar Sep 09 '21 10:09 MattDonovan82

Could you please clarify why there is a need to have two files rather than putting all fields in Loc file? Currently any OED field if it's not needed, doesn't have to be present in the file. Wouldn't that work for having all fields in Loc file only?

aiste-kalinauskaite avatar Sep 09 '21 10:09 aiste-kalinauskaite

@aiste-kalinauskaite we thought having a separate file for population info/humanitarian use cases would be cleaner than overloading the current loc file with more fields? Oasis, currently will not be using this information as there are no models that would utilise this data.

Do you not agree?

MattDonovan82 avatar Sep 09 '21 10:09 MattDonovan82

@aiste-kalinauskaite we will discuss this at the next Steering committee on 20th Sept.

MattDonovan82 avatar Sep 09 '21 11:09 MattDonovan82

After more discussion, it makes sense to change the thinking around what OED is. OED should cover "all" exposure data and so can include several input files. i.e the current 'loc' file is the property OED file, the LocPopulation file is another OED file, any other lines of business (such as liability) is another OED file, etc.....

MattDonovan82 avatar Sep 09 '21 11:09 MattDonovan82

Could you please clarify why there is a need to have two files rather than putting all fields in Loc file? Currently any OED field if it's not needed, doesn't have to be present in the file. Wouldn't that work for having all fields in Loc file only?

Good question -- I tried to explain it in the readme, as I considered the case where one location might have multiple locPopulation rows, which could make location files unacceptably large (not clear how frequently it would occur though). Actually, the way the fields are set up doesn't require this at this point in time (but may become an issue in future).

Take the case that I have one location with 100 people. 20 are under 5, 30 are over 65. This would be described in a single row, in multiple fields, since the available fields define these classifications. Similarly, if I have one location with 100 people. 20 are classified as having a disability. This would be described in a single row, in multiple fields. However, if at some point we wanted to add a code for disability, we might want to describe 10 people with mobility related disability, and 10 people with mental disability. Then we would have two rows for one location.

I acknowledge though that this level of data may be some way off in time and won't be the majority of use cases for a long time. Hopefully that makes clear the thinking for ODS to consider the best implementation.

stufraser1 avatar Sep 09 '21 11:09 stufraser1

After more discussion, it makes sense to change the thinking around what OED is. OED should cover "all" exposure data and so can include several input files. i.e the current 'loc' file is the property OED file, the LocPopulation file is another OED file, any other lines of business (such as liability) is another OED file, etc.....

So separate loc files could be included for transport infrastructure, energy infrastructure, communications infrastructure? Once the mechanism is defined, IDF RMSG could assist to develop these out with the assistance of public sector partners - for instance leveraging previous development sector consideration of exposure standards (Risk Data Libvary, GED4ALL) and promoting interoperability with those.

stufraser1 avatar Sep 09 '21 11:09 stufraser1

yes I think so. For example the liability standard is nearly ready and this will be a separate OED file. I forsee these files being able to be interoperable in the future if required and all under 'OED'. We will of course put this to the SC.

MattDonovan82 avatar Sep 09 '21 11:09 MattDonovan82

@stufraser1 what do you suggest for data type and details when capturing OccupantPeriod?

'day/night'? 'time of day?

MattDonovan82 avatar Oct 12 '21 13:10 MattDonovan82

Day/night would be the main use case yes. Most analysis uses nighttime population, given by census statistics.

Potential for further description is useful longer term - have previously used data describing tourist numbers (High, low, shoulder season; or summer/winter) but that is perhaps more limited use case. Defining day/night as the only allowed values may be too restrictive long-term. Can string definition be provided to allow more options, thinking also beyond OED data being used in cat models.

stufraser1 avatar Oct 12 '21 17:10 stufraser1

Perhaps it's worth treating the field in the same way as many secondary modifiers are? E.g. defining a numeric value and assigning what the description is? In the simplest case that would be, say 0 = No, 1 = Yes. Having numeric value with a predefined list of what it means would avoid typos in the text & whether lower/upper cases are accepted.

aiste-kalinauskaite avatar Oct 13 '21 08:10 aiste-kalinauskaite

This seems like a sensible approach and would cover most near and long term use cases.

@stufraser1 do you want to come up with a list of descriptions this week and then we can get this into v2 which is being released on 1st Nov?

MattDonovan82 avatar Oct 18 '21 08:10 MattDonovan82

How about [Day, Night, Peak Season, Off-peak Season] (not using winter/summer, because for some locations peak will be summer, for others it will be winter)

stufraser1 avatar Oct 19 '21 16:10 stufraser1

@stufraser1 do the options for OccupantPeriod below make sense? Do options 5-8 make sense or is this detail you wouldn't require?

1 - Day 2 - Night 3 - Peak Season 4 - Off-peak Season 5 - Day - peak season 6 - Day - Off-peak season 7 - Night - Peak season 8 - Night - Off-peak season

MattDonovan82 avatar Oct 20 '21 15:10 MattDonovan82