Jin Sun

Results 57 comments of Jin Sun

Monitoring the scheduled db-solr-sync job: 10/12 0 packages need to be removed from Solr 1 packages need to be updated/added to Solr 10/13 1 packages need to be removed from...

10/17 0 packages need to be removed from Solr 1 packages need to be updated/added to Solr 10/18 0 packages need to be removed from Solr 1 packages need to...

The following SQL script picks up the duplicates (from https://github.com/GSA/data.gov/issues/3567) `SELECT "group".name, COUNT(*) FROM package JOIN "group" ON package.owner_org = "group".id LEFT JOIN harvest_object ON package.id = harvest_object.package_id WHERE package.state='active'...

Will do following cleanup today: ``` 10-18-2022 doc-gov | 23425 ca-gov | 12455 noaa-gov | 10869 ```

10-24-2022 There is new one duplicate in dhs-gov today.

just cleaned up duplicates for ca-gov, it only took about 4 min for 12455 records with new deletion method (defer the commit to the end).

The new delete function, only has one solr connection for all deletions. And the adding/updating has new connection for each call.

The following duplicates are also be cleared : doc-gov 23425 noaa-gov 10869 So there is no duplicates in DB as of today. Will continue monitor for couple days to see...

checked the duplicate today, no new item returned `SELECT "group".name, COUNT(*) FROM package JOIN "group" ON package.owner_org = "group".id LEFT JOIN harvest_object ON package.id = harvest_object.package_id WHERE package.state='active' AND package.type='dataset'...

Removed this broken link from the search result page: ![Image](https://user-images.githubusercontent.com/104456257/198128062-d6a08f26-4e40-43f8-855e-2071c8230cb4.png) > .html