makepbcore.py - add instantiationDate
so far it's just instantiationDate_mo in the pbcore document, which is fine for the three loopline formats, - as it;s the same thing. but we'll need both going forward for the other formats
Hey, I re-read the google docs conversation and I realise that we might have not been on the same page in terms of 'repeatability'.
Were you thinking that when we were only documenting date modified, that we would have the same value in instantiationDate and instantiationDate_mo?
I was thinking that more discussion was needed as my proposed solution was:
instantiationDate_mo - is always used for every format, and it is always just the date modified value.
It would get serialised in xml as
<instantiationDate dateType="file modification">2010-09-01T12:02:15Z</instantiationDate>
The attribute dataType being key there as it tells you which type of date we are discussing.
I was thinking that the instantiationDate field in the PBCore CSV would require an accompanying instantiationDateType field in order to avoid the awkwardness of hardcoding certain date types in standalone fields. For example a reproduction date field, or a DCP Issue Date field. I think this route gives us a lot of flexibility going forward.
We could instead say in the PBCore/tech record:
instantiationDate=2018-05-01T12:02:15Z
instantationDateType='Date of reproduction'
^ Which would also be noted in 'Date of Acquisition'
or in the instance of a DCP:
instantiationDate=2018-05-01T12:02:15Z
instantationDateType='DCP Issue Date'
I think we had agreed that we would always include the date modified value, so i think it makes to have it as a standalone field as it's the most meaningful of the file system dates.
But in terms of serialising as XML, we can use as many instantiationDate values as we want:
<instantiationDate dateType="file modification">2010-09-01T23:03:11Z</instantiationDate>
<instantiationDate dateType="Date of reproduction">2017-09-01T15:02:15Z</instantiationDate>
<instantiationDate dateType="DCP Issue Date">2018-05-01T12:02:15Z</instantiationDate>
I think this also becomes even more valuable when the 'date modified' values are fundamentally misleading. Like there is some embedded metadata to suggest that the file was actually created in 2010, but the date modified says '2017'. It would be useful in this instance to be able to use instantiationDate to note the more accurate date, and we could have a controlled vocab of terms to use with instantiationDateType.
What do you think?
Hi
Yeah, - I think we were actually on the same page. or at least pretty close. when I said repeatability, I meant that the field can be repeated within pbcore according to the pbcore information. And I guess if we are repeating the field, it would follow that we'd need to be more specific so the attribute datatype works for that.
I had meant that in the case of the 3 loopline formats we are working with for this batch of accessioning that the instatiationDate and the instantiationDate modified would have the same values, as i'd understood from the notes on the pbcore template doc that this was the case for these formats but not for dcp. so, my understanding was that the values happened to be the same, for these examples, but we needed to make the difference clear at this point and include the repeated fields as these values won't be the same moving forward, - dcp being the most obvious example. although i remember the discussion at the dcm when we were working on instantiationDate, I didn;t really understand it. I think it was Felix who had suggested that date modified 'made the most sense' for the loopline formats, but didn't understand why.
anyway, - moving forward, - yes i absolutely think that we should repeat the field, with datatype attributes added. and that there is a controlled vocab for instantiationDataType field but..... it's the controlled vocab we need to discuss i think. i'm not sure that 'date of reproduction' shoudl be part of the controlled vocab. 'date of creation' should cover most things. the fact that it is a reproduction is explicit in other parts of the different records and doesn't need to be made explicit here. it might be possible that we can break this vocab down to two values: 'date of modification' and 'date of creation' what do you think? we should perhaps put it on the agenda for tomorrow's DCM? for first discussion to get it out of the way. gives you chance to work on it after.
R
I agree that we should put in on the agenda for the tomorrow. I think we are in agreement overall, is this a correct summary:
- instantiationDate_mo field is always used by harvesting date modified
- it makes sense to add a new instantiationDate_Type field
- instantiationDate_Type is mandatory IF there is a value used in the instantiationDate field
- if there is no reason to add a value to instantationDate (such as date of creation/reprduction) then we just add the date modified date, and include 'date of modification in instantationDate_Type? I was initially thinking that we would leave instantiationDate just be n/a if it would contain duplicate information from instantationDate_mo as there's no need for duplication.
- As for the vocab, definitely a topic for discussion. I'd be in favour of having as specific a vocab as possible. The definition from PBCore says :
dateType – optional. The dateType attribute classifies by named type the date-related data of the element e.g., created, broadcast, dateAvailableStart. Used to clarify how the instantiationDate is related to the instantiation. Date Created may be the most common, but the element could also be used to describe the Date Accessioned, Date Deaccessioned, or Date Transcoded, for example. Has a PBCore controlled vocabulary (recommended).
here's the vocab btw - http://metadataregistry.org/concept/list/vocabulary_id/162.html Looks like mediainfo is ignorning it :)
So I think we could stick o the PBCore vocab as much as possible, but I think that there is a value in being explicit about the source. For example, if we're harvesting the DCP IssueDate value from the CPL/XML, then I think that it's really useful to be able to state that we harvested that info from there...
we could conceivably use it over and over again in the future. date migrated date acquired date deaccessioned etc etc,
i'd still be for not being so specific about the source of the the value as i'm not sure how important that info is on the database. we might start getting into another situation where we keep adding to the list of validated fields to cater to the the particular differences in a growing number of av formats that we'll be taking in. it might be pedantic. a bit like how that type of acquisition field is now. if all the information we need about the source of the value is retrievable elsewhere anyhow. that's my thinking anyway
- but looking forward to discussing with the team tomorrow to see what others think.
re your bullet points for the summary. yes to all except that i see no harm in repeating the value in the two fields in the case that they are the same. yes it's a duplication, but they mean two separate things that are getting their value form the same source. the fact that the value is repeated in both fields gives us information too in a sense.
R
I'm cool with the repetition/duplication of the value, it should be consistent and I think easy enough to implement.
I also take your point on relaxing the specificity - maybe we can just keep it to the PBCore vocab for now. Sure we can revisit this thread tomorrow anyhow at the meeting - I think we're quite close to a solution.
ok, - thanks for that debate. It was really useful to chat it out i think. I'm now going to take a 360 on what I said and suggest that you continue in the way that you (I think) had originally intended. which is the F7 method for all the different datetypes of dateInstantiation either in one field or two. so what i mean is:
- not splitting datetype modified into a separate field to datetype created
- apart from the date acquired field which collects date acquired and date reproduced dates, - there is only only dateInstantiation field where all the different date values are kept over years, separated by F7
- I don't mind how you want to do this, - putting all the info for one date value in one field. i.e. date & datetype or splitting up the date and the datetype into two different fields. I feel that splitting into two fields could leave an opening for human error (as you described), so would be in favour of the later method. But whatever you think is best, as you need to keep in mind what is most easily migrated to genie plus.
cheers R
Cheers Raelene. I'll have a think on the best way forward. Just to confirm, we still will have a sepereate instantiationDate_mo field that will just take in the file system date modified?
As for this 'flexible' instantiationDate field - what values are put in here? I could see it being a place for the DCP Issue Date - and some kind of 'date that an object first existed' type field, which would be a duplication of Date Acquired in the case of a Reproduction.
Is this to use date modified values when there are no other relevant dates to record, or just n/a?
In the short term, we'll need to know what goes in here for
- XDCAM EX
- Converted
- Concatenated
We will already have date modified values in instantiationDate_mo, but I can't think of any other relevant dates here, unless I start rooting around in the XML of the XDCAM EX cards for some dates.
hi Kieran
do you have a minute to talk about this? i can come up to you if you want. I'd understood it a little differently, - but better to chat it out quickly rather than back and forth by mail. i'm happy to go with what you think is best, - just want to make sure i understand it R
instantiationDate_other to be the name on the back end. it must be paired with attribute field instantionDate_type this field must be added to the database for the moment we leave the values in those two fields blank. we don;t put in n/a or anyhtign else. instantionDate_mo stays as is, - we see the datetype in the pbcore ifi xml, - but not on the database R