resources icon indicating copy to clipboard operation
resources copied to clipboard

Perform Copy Number Variation analysis on C gigas samples

Open sr320 opened this issue 2 years ago • 27 comments

for @ggoetznoaa / @mgavery can advise

TLDR: Nightingales Folder - F05/F14 prefix <- raw data

Short-read WGS, 30x coverage, 32 samples, 2 families, 8 samples per ploidy/family. See notebook post for more information.

WGS results received from Azenta. Stored on nightingales. Intial QC and trimming done on Raven. See notebook post.


Pertinent documents

repo

Manuscript Proposal Tissue Sample List https://github.com/RobertsLab/resources/issues/1304 DNA extraction protocol DNA extraction results Gannet Folder Nightingales Folder - F05/F14 prefix <- raw data

sr320 avatar Jun 13 '23 17:06 sr320

@sr320 I see the files but I can't download them. It just sorta hangs when I try either via the web browser or via command line program. Do I need a login/password?

ggoetznoaa avatar Jun 21 '23 20:06 ggoetznoaa

@sr320 oh and I never got an email saying you tagged me. Not sure if its something on my end that needs to be changed or your end. I usually don't use github for this sort of stuff.

ggoetznoaa avatar Jun 21 '23 20:06 ggoetznoaa

Hmmm, sorry you're having trouble! All the files are publicly accessible, so no need for a password.

I just tested downloading a file and proceeded without issue:

wget https://owl.fish.washington.edu/nightingales/C_gigas/F142n01_R1_001.fastq.gz

Obviously, this isn't too much help, since it doesn't solve your problem...

kubu4 avatar Jun 21 '23 20:06 kubu4

it could be something on NOAA's side, firewall setting etc. I was able to get your command to run but when I tried the following it failed.

wget http://owl.fish.washington.edu/nightingales/C_gigas/0501_R1_001.fastq.gz

ggoetznoaa avatar Jun 21 '23 20:06 ggoetznoaa

Ah I think I see the issue. The link in the doc doesn't have https and firefox went with http and that doesn't seem to be working.

ggoetznoaa avatar Jun 21 '23 20:06 ggoetznoaa

yep its working, as soon as I added https.

ggoetznoaa avatar Jun 21 '23 20:06 ggoetznoaa

Ok I've downloaded the files. I'm assuming the files I've downloaded are the raw files and not the trimmed/QC'd files Matt mentions in his lab doc. I see all 32 samples (16 F0 and 16 F14).

ggoetznoaa avatar Jun 22 '23 15:06 ggoetznoaa

Correct

sr320 avatar Jun 22 '23 15:06 sr320

@ggoetznoaa just checking in to see where this is at?

sr320 avatar Jul 11 '23 16:07 sr320

It's almost ready to move to my plate! Giles has done the mapping and pulled coverage for single copy genes, mito and ribo using bedtools. He is going to make tables that consolidate coverage for all samples into a single table and then I'll analyze in R.

mgavery avatar Jul 11 '23 16:07 mgavery

Can I get access to bam files?

sr320 avatar Jul 11 '23 16:07 sr320

@mgavery yep, just finished making the three different table files.

@sr320 I still have the BAM files, there are 32 files totaling 172 GB. I can put them up in a google drive for you to download them from unless you have another place you want me to put them.

ggoetznoaa avatar Jul 11 '23 16:07 ggoetznoaa

@ggoetznoaa can you put on mox scrubbed directory?

sr320 avatar Jul 12 '23 15:07 sr320

@sr320 I have no idea what that is or where that is. Is it a server you have? if so I don't think I have access to it.

ggoetznoaa avatar Jul 12 '23 16:07 ggoetznoaa

mox - hyak..... mox.hyak.uw.edu ; you have account.

other suggestion is fine but google docs?? cannot do anything with files that big in google docs.

sr320 avatar Jul 12 '23 16:07 sr320

I don't have access to hyak anymore or at least I haven't tried to access it in a while. I can take a look though.

As for Google, I basically would put the files up in a folder on Google Drive (not Docs). I would then send a link to you for that folder and then you can just download the files. But that would require you to have the hard drive space on your computer and a decent internet connection. Currently I use a command line tool called rclone to move large files up and down from Google Drive.

I'm open to other ideas if you got any.

G.

ggoetznoaa avatar Jul 12 '23 16:07 ggoetznoaa

Yep, I don't have access to Hyak anymore. I just followed the instructions here.

https://wiki.cac.washington.edu/display/hyakusers/Logging+In

And I'm not seeing the option to activate Hyak.

image

I would need someone to sponsor an account for me.

ggoetznoaa avatar Jul 12 '23 16:07 ggoetznoaa

@sr320 I have an acct on gannet. I can put them there. Does that work?

mgavery avatar Jul 12 '23 16:07 mgavery

@mgavery yes that would be great. thanks!

sr320 avatar Jul 12 '23 16:07 sr320

@mgavery the sorted/indexed BAM files are located in

/share/nwfsc/ggoetz/202306-c.gigas-cnv/bowtie2/cgigas_ref

On Sedna

ggoetznoaa avatar Jul 12 '23 16:07 ggoetznoaa

@sr320 bams are here: /var/services/homes/charlie/Cgigas_WGS_bams

mgavery avatar Jul 13 '23 14:07 mgavery

Cursor_and_Plot_Zoom

mgavery avatar Jul 13 '23 22:07 mgavery

First pass at looking at coverage per sample for ~5000 single copy genes (top left) and then average copy number of mito and ribo genes

mgavery avatar Jul 13 '23 22:07 mgavery

This is a plot to see individual variation. Color indicates family and number on each bar indicates ploidy.

Plot_Zoom

mgavery avatar Aug 10 '23 19:08 mgavery

Amazing the variation within family/ploidy. Some oysters have x3!

mattgeorgephd avatar Aug 10 '23 19:08 mattgeorgephd

Dip_v_trip_mito_ribo_copynum.csv

This is the file the plots were generated from. I haven't done any stats yet

mgavery avatar Aug 16 '23 20:08 mgavery

Here is what I have for global analysis... https://rpubs.com/sr320/1070681

sr320 avatar Aug 17 '23 04:08 sr320