icews icon indicating copy to clipboard operation
icews copied to clipboard

Some text fields include quotes, e.g. ""Fight"" instead of "Fight"

Open andybega opened this issue 5 years ago • 0 comments

Some of the text field values include outer double-quotes in their value, e.g.:

> query_icews("select * from events where event_id = 25326166;")
  event_id event_date    source_name                                  source_sectors
1 25326166   20170101 Women (Turkey) "Social,General Population / Civilian / Social"
  source_country                                            event_text cameo_code
1         Turkey "Conduct suicide, car, or other non-military bombing"       <NA>
  intensity target_name target_sectors target_country story_id sentence_number
1       -10      Turkey           NULL         Turkey 43113964               6
                   publisher   city district province country latitude longitude year
1 Associated Press Newswires Ankara     NULL   Ankara  Turkey  39.9199   32.8543 2017
  yearmonth              source_file
1    201701 Events.2017.20201006.tab

The "source_sectors" and "event_text" values include quotes...they shouldn't. This is the proper format:

> query_icews("select * from events limit 1;")
  event_id event_date        source_name
1   926685   19950101 Extremist (Russia)
                                     source_sectors     source_country        event_text
1 Radicals / Extremists / Fundamentalists,Dissident Russian Federation Praise or endorse
  cameo_code intensity   target_name                              target_sectors
1        051       3.4 Boris Yeltsin Elite,Executive,Executive Office,Government
      target_country story_id sentence_number        publisher   city district province
1 Russian Federation 28235806               5 The Toronto Star Moscow     <NA>   Moskva
             country latitude longitude year yearmonth                    source_file
1 Russian Federation  55.7522   37.6156 1995    199501 events.1995.20150313082510.tab

Is this in the raw data files or package error?

Some of these are from Events.2017, and I manually checked to verify that these quotes are indeed present in the tab delimited raw data files.

Screen Shot 2020-10-26 at 13 37 52

What files are affected?

Check a couple of the fields to see what source file(s) these are coming from:

"event_text"

> query_icews("select distinct(source_file), count(*) as N from events where event_text like '\"%' group by source_file;")
               source_file     N
1 Events.2017.20201006.tab 59000

"source_sectors"

> query_icews("select distinct(source_file), count(*) as N from events where source_sectors like '\"%' group by source_file;")
               source_file      N
1 Events.2017.20201006.tab 512730

"target_sectors"

> query_icews("select distinct(source_file), count(*) as N from events where target_sectors like '\"%' group by source_file;")
               source_file      N
1 Events.2017.20201006.tab 425417

Of course. "Events.2017....tab"

andybega avatar Oct 26 '20 11:10 andybega