Scrub binary files from git history
Before: Receiving objects: 100% (6743/6743), 121.52 MiB | 210.00 KiB/s, done. After: Receiving objects: 100% (6421/6421), 36.37 MiB | 210.00 KiB/s, done.
This change has to be force-pushed. Merging does not do the trick. I am including the exact commands I executed to do this. It might be best if you just run the commands yourself.
Fixes #101
List all files ever in the repository
# https://git-scm.com/docs/git-log
# http://stackoverflow.com/a/13547351/1047788
git log --name-only --pretty=format: | sort | uniq
List all deleted files ever in the repository
# http://stackoverflow.com/a/21871377/1047788
git log --name-only --diff-filter=D --pretty=format: | sort | uniq
Get changelog
git log --name-status > changelog.txt
Decide what to scrub
# http://www.tldp.org/LDP/abs/html/here-docs.html
cat << EOF > filenamestoscrub.txt
contigs.fasta
google-genomics-dataflow.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140604/dataflow-sdk-1.0.140604.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140604/dataflow-sdk-1.0.140604.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140604/dataflow-sdk-1.0.140604.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140604/dataflow-sdk-1.0.140604-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140604/dataflow-sdk-1.0.140604-javadoc.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140604/dataflow-sdk-1.0.140604-javadoc.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140604/dataflow-sdk-1.0.140604.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140604/dataflow-sdk-1.0.140604.pom.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140604/dataflow-sdk-1.0.140604.pom.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617-javadoc.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617-javadoc.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.pom.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.pom.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140709/dataflow-sdk-1.0.140709.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140709/dataflow-sdk-1.0.140709.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140709/dataflow-sdk-1.0.140709.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140709/dataflow-sdk-1.0.140709-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140709/dataflow-sdk-1.0.140709-javadoc.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140709/dataflow-sdk-1.0.140709-javadoc.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140709/dataflow-sdk-1.0.140709.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140709/dataflow-sdk-1.0.140709.pom.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140709/dataflow-sdk-1.0.140709.pom.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140801/dataflow-sdk-1.0.140801.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140801/dataflow-sdk-1.0.140801.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140801/dataflow-sdk-1.0.140801.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140801/dataflow-sdk-1.0.140801-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140801/dataflow-sdk-1.0.140801-javadoc.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140801/dataflow-sdk-1.0.140801-javadoc.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140801/dataflow-sdk-1.0.140801.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140801/dataflow-sdk-1.0.140801.pom.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140801/dataflow-sdk-1.0.140801.pom.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140808/dataflow-sdk-1.0.140808.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140808/dataflow-sdk-1.0.140808.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140808/dataflow-sdk-1.0.140808.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140808/dataflow-sdk-1.0.140808-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140808/dataflow-sdk-1.0.140808-javadoc.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140808/dataflow-sdk-1.0.140808-javadoc.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140808/dataflow-sdk-1.0.140808.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140808/dataflow-sdk-1.0.140808.pom.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140808/dataflow-sdk-1.0.140808.pom.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140818/dataflow-sdk-1.0.140818.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140818/dataflow-sdk-1.0.140818.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140818/dataflow-sdk-1.0.140818.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140818/dataflow-sdk-1.0.140818-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140818/dataflow-sdk-1.0.140818-javadoc.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140818/dataflow-sdk-1.0.140818-javadoc.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140818/dataflow-sdk-1.0.140818.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140818/dataflow-sdk-1.0.140818.pom.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140818/dataflow-sdk-1.0.140818.pom.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140828/dataflow-sdk-1.0.140828.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140828/dataflow-sdk-1.0.140828.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140828/dataflow-sdk-1.0.140828.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140828/dataflow-sdk-1.0.140828-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140828/dataflow-sdk-1.0.140828-javadoc.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140828/dataflow-sdk-1.0.140828-javadoc.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140828/dataflow-sdk-1.0.140828.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140828/dataflow-sdk-1.0.140828.pom.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140828/dataflow-sdk-1.0.140828.pom.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140915/dataflow-sdk-1.0.140915.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140915/dataflow-sdk-1.0.140915.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140915/dataflow-sdk-1.0.140915.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140915/dataflow-sdk-1.0.140915-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140915/dataflow-sdk-1.0.140915-javadoc.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140915/dataflow-sdk-1.0.140915-javadoc.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140915/dataflow-sdk-1.0.140915.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140915/dataflow-sdk-1.0.140915.pom.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140915/dataflow-sdk-1.0.140915.pom.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140924/dataflow-sdk-1.0.140924.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140924/dataflow-sdk-1.0.140924.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140924/dataflow-sdk-1.0.140924.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140924/dataflow-sdk-1.0.140924-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140924/dataflow-sdk-1.0.140924-javadoc.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140924/dataflow-sdk-1.0.140924-javadoc.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140924/dataflow-sdk-1.0.140924.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140924/dataflow-sdk-1.0.140924.pom.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.140924/dataflow-sdk-1.0.140924.pom.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141013/dataflow-sdk-1.0.141013.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141013/dataflow-sdk-1.0.141013.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141013/dataflow-sdk-1.0.141013.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141013/dataflow-sdk-1.0.141013-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141013/dataflow-sdk-1.0.141013-javadoc.jar.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141013/dataflow-sdk-1.0.141013-javadoc.jar.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141013/dataflow-sdk-1.0.141013.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141013/dataflow-sdk-1.0.141013.pom.md5
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141013/dataflow-sdk-1.0.141013.pom.sha1
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141027/dataflow-sdk-1.0.141027.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141027/dataflow-sdk-1.0.141027.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141120/dataflow-sdk-1.0.141120.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141120/dataflow-sdk-1.0.141120-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141120/dataflow-sdk-1.0.141120.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141120/dataflow-sdk-1.0.141120-sources.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141206/dataflow-sdk-1.0.141206.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141206/dataflow-sdk-1.0.141206-javadoc.jar
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141206/dataflow-sdk-1.0.141206.pom
jars/com/google/cloud/dataflow/dataflow-sdk/1.0.141206/dataflow-sdk-1.0.141206-sources.jar
jars/com/google/cloud/dataflow/dataflow-sdk/maven-metadata-local.xml
jars/com/google/cloud/dataflow/dataflow-sdk/maven-metadata-local.xml.md5
jars/com/google/cloud/dataflow/dataflow-sdk/maven-metadata-local.xml.sha1
jars/org/broadinstitute/sting/gatk/gatk/3.1-1/gatk-3.1-1.jar
jars/org/sf/picard/picard/1.115/picard-1.115.jar
lib/bwa-0.7.9a/bamlite.c
lib/bwa-0.7.9a/bamlite.h
lib/bwa-0.7.9a/bntseq.c
lib/bwa-0.7.9a/bntseq.h
lib/bwa-0.7.9a/bwa.1
lib/bwa-0.7.9a/bwa.c
lib/bwa-0.7.9a/bwa.h
lib/bwa-0.7.9a/bwa-helper.js
lib/bwa-0.7.9a/bwamem.c
lib/bwa-0.7.9a/bwamem_extra.c
lib/bwa-0.7.9a/bwamem.h
lib/bwa-0.7.9a/bwamem_pair.c
lib/bwa-0.7.9a/bwape.c
lib/bwa-0.7.9a/bwase.c
lib/bwa-0.7.9a/bwase.h
lib/bwa-0.7.9a/bwaseqio.c
lib/bwa-0.7.9a/bwtaln.c
lib/bwa-0.7.9a/bwtaln.h
lib/bwa-0.7.9a/bwt.c
lib/bwa-0.7.9a/bwtgap.c
lib/bwa-0.7.9a/bwtgap.h
lib/bwa-0.7.9a/bwt_gen.c
lib/bwa-0.7.9a/bwt.h
lib/bwa-0.7.9a/bwtindex.c
lib/bwa-0.7.9a/bwt_lite.c
lib/bwa-0.7.9a/bwt_lite.h
lib/bwa-0.7.9a/bwtsw2_aux.c
lib/bwa-0.7.9a/bwtsw2_chain.c
lib/bwa-0.7.9a/bwtsw2_core.c
lib/bwa-0.7.9a/bwtsw2.h
lib/bwa-0.7.9a/bwtsw2_main.c
lib/bwa-0.7.9a/bwtsw2_pair.c
lib/bwa-0.7.9a/ChangeLog
lib/bwa-0.7.9a/COPYING
lib/bwa-0.7.9a/example.c
lib/bwa-0.7.9a/fastmap.c
lib/bwa-0.7.9a/is.c
lib/bwa-0.7.9a/kbtree.h
lib/bwa-0.7.9a/khash.h
lib/bwa-0.7.9a/kopen.c
lib/bwa-0.7.9a/kseq.h
lib/bwa-0.7.9a/ksort.h
lib/bwa-0.7.9a/kstring.c
lib/bwa-0.7.9a/kstring.h
lib/bwa-0.7.9a/ksw.c
lib/bwa-0.7.9a/ksw.h
lib/bwa-0.7.9a/kthread.c
lib/bwa-0.7.9a/kvec.h
lib/bwa-0.7.9a/main.c
lib/bwa-0.7.9a/Makefile
lib/bwa-0.7.9a/malloc_wrap.c
lib/bwa-0.7.9a/malloc_wrap.h
lib/bwa-0.7.9a/NEWS.md
lib/bwa-0.7.9a/pemerge.c
lib/bwa-0.7.9a/QSufSort.c
lib/bwa-0.7.9a/QSufSort.h
lib/bwa-0.7.9a/qualfa2fq.pl
lib/bwa-0.7.9a/README.md
lib/bwa-0.7.9a/utils.c
lib/bwa-0.7.9a/utils.h
lib/bwa-0.7.9a/xa2multi.pl
lib/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.jar
lib/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.jar.md5
lib/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.jar.sha1
lib/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617-javadoc.jar
lib/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617-javadoc.jar.md5
lib/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617-javadoc.jar.sha1
lib/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.pom
lib/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.pom.md5
lib/com/google/cloud/dataflow/dataflow-sdk/1.0.140617/dataflow-sdk-1.0.140617.pom.sha1
lib/com/google/cloud/dataflow/dataflow-sdk/maven-metadata-local.xml
lib/com/google/cloud/dataflow/dataflow-sdk/maven-metadata-local.xml.md5
lib/com/google/cloud/dataflow/dataflow-sdk/maven-metadata-local.xml.sha1
lib/org/broadinstitute/sting/gatk/gatk/3.1-1/gatk-3.1-1.jar
lib/org/sf/picard/picard/1.115/picard-1.115.jar
README.md~
EOF
Scrub the files from history
# DO NOT DO THIS
# http://stackoverflow.com/a/1521498/1047788
while read filename; do
# https://help.github.com/articles/remove-sensitive-data/
git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch $filename" \
--prune-empty --tag-name-filter cat -- --all
done < filenamestoscrub.txt
Wait for this to complete. It takes a very long time, which proves that scrubbing the files one by one was a bad idea.
# DO THIS INSTEAD
# http://stackoverflow.com/a/4229151/1047788
git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch -- $(tr '\n' ' ' < filenamestoscrub.txt)" \
--prune-empty --tag-name-filter cat -- --all
Review and push the result
mvn package
git push origin --force --all
git push origin --force --tags
Local clones
Do steps # 8 and # 9 from https://help.github.com/articles/remove-sensitive-data/ on each local clone you have
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
@jirkadanek Thanks so much for these detailed instructions!!! We will make it so.