Normalize Database
Changes to the schema
- remove Nicety:
starredand User:autosave_timeoutandautosave_enabledThis will address Issues #80 and #34 - Connected
UsertoProfileusingUser.profile_id - Nicety:
target_idnow points toProfile.id. Added backnamealong with other attributes that were normally being called thorough the API. - Stint: Added in
profile_idfor the foreign key connection - Profile: removed
facultysince that's determined bystint['type']
To use new database:
-
createdb <database_name> - change the
DATABASE_URLenvironment variable in your.envfile to update the database, then runsource .envon your terminal -
flask db upgrade - Run
python update-data.pyon the project level directory. The script logs eachProfilebeing added so feel free to comment out lines 78 and 83.
Here is a link to edit the schema directly

Originally we talked about creating one migration script which includes the schema updates as well as the new models to populate from the RC API. Here is actually the approach I took (so far):
In order to normalize the database with profile and stint, I needed to
- create and migrate those models
- run the database populating script
- create another migration to move over missing people from
usertoprofile.
I couldn't combine steps 1 and 3 into one migration script since the database has to be populated with profiles after the profile and stint tables are created but before the missing profiles from user can be added in. Here are the steps to run them on the production database.
-
flask db downgraderun this command twice to downgrade to the latest migration script used in production -
flask db upgrade b99a55eb8e08to addProfileandStinttables -
python update-data.pyto add the 2070 profiles -
flask db upgrade 27d86222146dto migrate missing users (not pulled from RC API) over toProfile. Total profiles should now be 2074.
If we don't want two migrations, then I imagine 2 ways to create a single migration:
- creating
profileandstinttables and moving allusers over toprofileso thatprofilenow has 451users. Then running theupdate-data.pyto pull all RC API profiles and overwriting the 451 ids that already exist inprofilebut not overwriting the missing ones. - Merge the two migrations above into one and call the
update-data.pyscript in between the two migration steps. When I do update the schema foruserandnicety, I can combine that migration script with themigrations/versions/27d86222146d_migrate_missing_users_into_profile.pyso we will still have a total of 2 migration scripts.
I also ran into a separate issue: apparently there several niceties written for target_id=1932 but id=1932 doesn't exist in user or profile. Which means I'm unable to change the schema to connect the nicety.target_id to profile.id (author_ids are fine btw). I ran a quick query to find the other missing ids:
SELECT
DISTINCT target_id
FROM
nicety
WHERE
target_id NOT IN (
SELECT
id
FROM
profile
);
which returned the following 4 ids: 1932, 3292, 2218, 2481
Not sure why this is? Until then, I can't update the schema for nicety. I just messaged James from the RC staff to ask about this - just talked to him last week about the stint ids.
I still made changes to the user and nicety models without adding them to any migration. If this is confusing, I can change them back to their original state until they are ready to be migrated.
Let me know if this is unclear or the wrong approach to take. Thanks!
Just asked James about those missing ids and he said they are no longer part of the RC community. I can speak more to this at our Wednesday meeting.
We will need an access token for the nightly script. @mjec, since the application is using your client credentials, I think it makes the most sense to also have you generate an access token - does that sound right to you? If so, could you please add an environment variable to Heroku named RC_API_ACCESS_TOKEN?
I think the command line would be something like:
heroku config:set RC_API_ACCESS_TOKEN=my_token
RC_API_ACCESS_TOKEN has been created in Heroku and should now be available for use @jasonaowen
So quick - thank you so much, @mjec!
I believe this captures everything we've talked about thus far. Run the following lines on the terminal to upgrade and downgrade the migration:
flask db upgrade
flask db downgrade
@tas09009 and I paired on restructuring these changes. So far we've pulled two commits out, 448a9c3 and e0b4921.
e0b4921 is worth calling out, I think: I reviewed this data-only migration by examining the database by hand.
Nicety ID 221 is one of the lost niceties described in #10: the target user changed their end date, and the author created a new nicety (ID 791) that is very similar but not identical, and with the new end date.
Relevant queries:
SELECT id, author_id, target_id, date_updated, CONVERT_FROM(DECODE(text, 'base64'), 'utf8')
FROM nicety WHERE id IN (221, 791);
SELECT * FROM nicety WHERE author_id = ? AND target_id = ?;
Nicety ID 3634 is more complicated. It has a correct end date, but it is the replacement for a previously-written nicety (ID 2956) that was lost due to the target recurser changing their end date. The text of the old nicety has much more detail than the new one, but it also has an incorrect end date. I believe the intent is to clean that up in the subsequent migration that assigns stint_ids to all niceties that have an end date that matches no known stints. Also, @tas09009, is this one of the niceties people you emailed the author about?
Hi @jasonaowen, I didn't send an email for this one because of what you mentioned when comparing the two niceties. But I certainly can if you'd like!
So last we talked, the migrations were in pretty good shape, and the remaining work on this PR is to update the API to use the new database structure. Thanks again for working on this, @tas09009 - I know it's a big change!
Hi @jasonaowen,
While working on the API changes, I realized they could only happen after the database migration since some of the changes we're implementing - switching from batches to stints and removing caching - have to wait until that last migration. You can see some of my WIP in the 3rd to last commit.
Do you think we should split this PR up again into two parts? It'll be easier to digest too:
- Database + schema migration
- API changes to incorporate migrations
It will also make it easier to push changes to the two PRs as unanticipated code changes come up, such as when we discover that a user's stint changes their date and their stint_id.