data.gov icon indicating copy to clipboard operation
data.gov copied to clipboard

O+M 2024-02-19

Open rshewitt opened this issue 1 year ago • 1 comments

As part of day-to-day operation of Data.gov, there are many Operation and Maintenance (O&M) responsibilities. Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an O&M Triage role. One person on the team is assigned the Triage role which rotates each sprint. This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time.

Check the O&M Rotation Schedule for future planning.

Acceptance criteria

You are responsible for all O&M responsibilities this week. We've highlighted a few so they're not forgotten. You can copy each checklist into your daily report.

Daily Checklist

Note: Catalog Auto Tasks You will need to update the chart values manually. Click the Action link in each issue and grab the values from monitor task output and check runtime.

Weekly Checklist

Monthly Checklist

ad-hoc checklist

  • [ ] audit/review applications on cloud foundry and determine what can be stopped and/or deleted.

Reference

rshewitt avatar Feb 20 '24 15:02 rshewitt

recent bot activity caused catalog and nginx to crash. this activity has stopped for now. both are back up. we've identified the ip ranges and will monitor future activity. the request urls are often malicious because they're looking for .env files or bypassing our cache by querying random text. we have 2 options to prevent this from happening in the future.

rshewitt avatar Feb 22 '24 16:02 rshewitt