Jin Sun
Jin Sun
Monitoring the scheduled db-solr-sync job: 10/12 0 packages need to be removed from Solr 1 packages need to be updated/added to Solr 10/13 1 packages need to be removed from...
10/17 0 packages need to be removed from Solr 1 packages need to be updated/added to Solr 10/18 0 packages need to be removed from Solr 1 packages need to...
The following SQL script picks up the duplicates (from https://github.com/GSA/data.gov/issues/3567) `SELECT "group".name, COUNT(*) FROM package JOIN "group" ON package.owner_org = "group".id LEFT JOIN harvest_object ON package.id = harvest_object.package_id WHERE package.state='active'...
Will do following cleanup today: ``` 10-18-2022 doc-gov | 23425 ca-gov | 12455 noaa-gov | 10869 ```
10-24-2022 There is new one duplicate in dhs-gov today.
just cleaned up duplicates for ca-gov, it only took about 4 min for 12455 records with new deletion method (defer the commit to the end).
The new delete function, only has one solr connection for all deletions. And the adding/updating has new connection for each call.
The following duplicates are also be cleared : doc-gov 23425 noaa-gov 10869 So there is no duplicates in DB as of today. Will continue monitor for couple days to see...
checked the duplicate today, no new item returned `SELECT "group".name, COUNT(*) FROM package JOIN "group" ON package.owner_org = "group".id LEFT JOIN harvest_object ON package.id = harvest_object.package_id WHERE package.state='active' AND package.type='dataset'...
Removed this broken link from the search result page:  > .html