log

2025-6-25:TCGA目前已经完整下载的数据:CHOL,STAD,ESCA,STAD,LIHC
2025-6-26: 继续下载:PAAD下完了,下UVM,下完了
2025-6-27:继续下载:PCPG下完了,在下载COAD
2025-6-29: 继续下载,下载LYNODE,DLBC,THYM
2025-6-30:继续下载,上面三个的Tumor,在下完COAD之后
2025-7-1:继续下载LUAD,反正下载完COAD了,主要命令:nohup ./gdc-client download -d /home/databank/gxxu/TCGA/COAD/Source/Tumor -m /home/databank/gxxu/tool/Manifest/gdc_manifest.Tumor_coad.txt --log-file /home/databank/gxxu/TCGA/COAD/Source/Tumor/COAD_Tumor.log >COAD_T.txt 2>&1 &
2025-7-2:开始下载DLBC,好像DLBC的文件下不了……,下一下LYNODE,怎么下不动呢。我再试试THYM,这个就可以。
2025-7-3: THYM下载完了,再下载下载LUAD,还没下完呢
2025-7-7:LUAD下载完了,下BLCA
2025-7-8:整理整理思路哈,整理整理思路。首先,我们需要做的是分类,分类然后标准化,更新文件,生成数据库录入报告。没啥问题,要不要做些ER分析呢??有点懒
详细想一想步骤:
1)要不要按照primary site区分开来,感觉还是要的,基本文件夹逻辑如下:
colorectal|
  • |Tumor
    • |cat1 |cat2 |cat3
  • |Norm
    • |cat1 |cat2 |cat3
lung…………
忘了一个重要的事情,要检查一下文件的现在情况,有没有缺少了,事实证明还是少了,checkfiles中是按照文件筛选的,之后,就cp出去,然后继续下载。后面还要通过看看有没有下错路径
另外,首先可以确认的是,svs文件一定是病理切片
2025-7-9:开始处理文件,首先是以CHOL为例,开始处理。先把CHOL/Source/Tumor复制一下,Tumor_cp,在cp中操作,extract.py文件就是把所有的文件放到一个目录下
2025-7-10:恭喜宿主又活过一天,今天也要努力工作呢。现在的整理思路是:extract.py->classify.py,运行下来还不错,在Tumor上(CHOL),我再试试Norm,效果也不错。现在就是需要在分类的问题,重新分类,再就是标准化形成新的文件,先稍微总结了下,还是要从具体文件出发。
2025-7-11:在总结一下:分类和预处理方案
现在的分类:
Annotation忽略不计
Clinical稍微也有点不重要。
{Ascat2.allelic_specific.seg/Ascat3.allelic_specific.seg }基本一个样,就是换了算法
Gdc_realn.cr.igv.reheader.seg 和上面一个样都是CNV文件
Gene_level_copy_number 下面的文件的确都是一类的,despite Ascat算法
Wgs.ASCAT.copy_number_variation
Wgs.ASCAT.gene_level.copy_number_variation还是cnv文件
Grch38:是SNP芯片数据???
Methylation_array.sesame.level3betas甲基化数据毋庸置疑了
但是Noid_Grn/Red也都是甲基化数据
*quantification都是rna定量数据,Mirbase21/Mirnaseq是技术的不同,mirnas/isoforms是测量对象的不同
gene_counts就是rna测序数据
Rppa蛋白质组,反相啥来着
svs影像学数据就更不要说了
Wxs.aliquot_ensemble_masked,外显子组突变数据
Cel微阵列文件,不好办
再整理下:
RNA_Seq/Star_Gene_Counts<Rna_seq.augmented_star_gene_counts
miRNA/Mature_Quant<Mirbase21.mirnas.quantification&Mirnaseq.mirnas.quantification(要区分吗)
miRNA/Isoform_Quant<Mirbase21.isoforms.quantification&Mirnaseq.isoforms.quantification
Methylation/Sesame_Betas<Methylation_array.sesame.level3betas
Methylation/IDAT_Raw<Noid_Grn&Noid_Red
WXS_Somatic<Wxs.aliquot_ensemble_masked
Pathology/SVS<svs
Rppa<RPPA
Cel<Cel
ASCAT_CNV/Allelic_Segments<Ascat2.allelic_specific.seg&Ascat3.allelic_specific.seg&Gdc_realn.cr.igv.reheader.seg&Wgs.ASCAT.copy_number_variation&Wgs.ASCAT.gene_level.copy_number_variation&Wholegenome.rp-1765.cr.igv.reheader.seg&Wgs.rp-1657.cr.igv.reheader
SNP_Array/GRCh38_Segments<Grch38.seg
Gene_level_copy_number<Gene_level_copy_number
先这样分类吧,先对一些内容做预处理,好吧很多都不需要预处理,还是先数据样本匹配吧,Match.py,ok,match 也match上了
2025-7-14:开始新的一周,再整理下思路吧。先整理下提交材料:
1.最核心的材料数据文件夹,再做一些区分吧,把raw data拿出来最后提交的都是处理后的数据
2.说明文件:样本-文件说明;文件组成说明;全部文件组织说明
3.验证与统计说明
其次的就是:有一点就是文件存在冗余,存在_1的文件
重新安排管线是:filter-> validation->extract->classify->match 
python filter.py --tcga-root /home/databank/gxxu/TCGA/ --manifest /home/databank/gxxu/tool/Manifest/ > filter/20240714.txt
python validate.py ../PAAD/Source/Norm/ -o validate/validate_paad_norm.csv > ./validate/paad.txt
2025-7-15:快啊,整理啊
这一轮提交的是PAAD,LIHC,LUAD,STAD,COAD,ESCA,CHOL,
2025-7-16:看来是有什么下载下不下来
2025-7-17:下不下来,就算了,还是就按照目前的方案整理吧。
文件的组织形式按照样本分类,还是按照癌症类型?
按照癌症类型吧,然后提交材料中声明关系
2025-7-18:开始流程
filter多出来的之后再说
validate目标的几个都可以了
  • 开始extract,CHOL,PAAD,(cl是classify的记录文件)ESCA,LIHC,COAD,STAD,luad等一等
LUAD行了
match:chol,paad lihc ,coad,stad,esca,luad(这个不太行)
python match.py -d /home/databank/gxxu/TCGA/CHOL/Source/Tumor -o ./match/chol_tumor_ma.csv -s /home/databank/gxxu/TCGA/Samplesheet/gdc_sample_sheet.Tumor_chol.tsv
match:chol,
preprocess中三个处理方式:甲基化,cnv,体细胞突变。分别对应Meth..,CNV&gene_level_copy_numbercd,WXS_Somatic,然后处理完后面加个前缀
CNV处理:chol,esca, paad,lihc,coad,stad,luad
python cnv_preprocess.py --input ../../CHOL/Source/Tumor/ASCAT_CNV/ --output ../../CHOL/Source/Tumor/ASCAT_CNV_processed
python methylation_preprocess.py --input ../../CHOL/Source/Tumor/Methylation/IDAT_Raw/ --output ../../CHOL/Source/Tumor/Methylation/IDAT_Raw_processed
 methyl处理:chol,esca,paad,coad,stad,lihc,luadpython methylation_preprocess.py --input ../../CHOL/Source/Tumor/Methylation/Sesame_Betas/ --output ../../CHOL/Source/Tumor/Methylation/Sesame_Betas_processed
 2025-7-19:
 wxg处理:chol,esca,paad,lihc,coad,stad,luad
 nohup python mutation_preprocess.py --input ../../CHOL/Source/Tumor/WXS_Somatic/ --output ../../CHOL/Source/Tumor/WXS_Somatic_processed > mut/chol_tumor_mut.txt 2>&1 &
 2033  nohup python mutation_preprocess.py --input ../../CHOL/Source/Norm/WXS_Somatic/ --output ../../CHOL/Source/Norm/WXS_Somatic_processed > mut/chol_norm_mut.txt 2>&1 &
 2034  nohup python mutation_preprocess.py --input ../../ESCA/Source/Norm/WXS_Somatic/ --output ../../ESCA/Source/Norm/WXS_Somatic_processed > mut/esca_norm_mut.txt 2>&1 &
 2035  nohup python mutation_preprocess.py --input ../../ESCA/Source/Tumor/WXS_Somatic/ --output ../../ESCA/Source/Tumor/WXS_Somatic_processed > mut/esca_tumor_mut.txt 2>&1 &
 2036  nohup python mutation_preprocess.py --input ../../PAAD/Source/Tumor/WXS_Somatic/ --output ../../PAAD/Source/Tumor/WXS_Somatic_processed > mut/paad_tumor_mut.txt 2>&1 &
 2037  nohup python mutation_preprocess.py --input ../../PAAD/Source/Norm/WXS_Somatic/ --output ../../PAAD/Source/Norm/WXS_Somatic_processed > mut/paad_norm_mut.txt 2>&1 &
 2038  nohup python mutation_preprocess.py --input ../../LIHC/Source/Norm/WXS_Somatic/ --output ../../LIHC/Source/Norm/WXS_Somatic_processed > mut/lihc_norm_mut.txt 2>&1 &
 2039  nohup python mutation_preprocess.py --input ../../LIHC/Source/Tumor/WXS_Somatic/ --output ../../LIHC/Source/Tumor/WXS_Somatic_processed > mut/lihc_tumor_mut.txt 2>&1 &
 2040  nohup python mutation_preprocess.py --input ../../COAD/Source/Tumor/WXS_Somatic/ --output ../../COAD/Source/Tumor/WXS_Somatic_processed > mut/coad_tumor_mut.txt 2>&1 &
 2041  nohup python mutation_preprocess.py --input ../../COAD/Source/Norm/WXS_Somatic/ --output ../../COAD/Source/Norm/WXS_Somatic_processed > mut/coad_norm_mut.txt 2>&1 &
 2042  nohup python mutation_preprocess.py --input ../../STAD/Source/Norm/WXS_Somatic/ --output ../../STAD/Source/Norm/WXS_Somatic_processed > mut/stad_norm_mut.txt 2>&1 &
 2043  nohup python mutation_preprocess.py --input ../../STAD/Source/Tumor/WXS_Somatic/ --output ../../STAD/Source/Tumor/WXS_Somatic_processed > mut/stad_tumor_mut.txt 2>&1 &
 2044  nohup python mutation_preprocess.py --input ../../LUAD/Source/Tumor/WXS_Somatic/ --output ../../LUAD/Source/Tumor/WXS_Somatic_processed > mut/luad_tumor_mut.txt 2>&1 &
 2045  nohup python mutation_preprocess.py --input ../../LUAD/Source/Norm/WXS_Somatic/ --output ../../LUAD/Source/Norm/WXS_Somatic_processed > mut/luad_norm_mut.txt 2>&1 &
python methylation_preprocess.py -i ../../CHOL/Source/Tumor/Methylation/Sesame_Betas/ -o ../../CHOL/Source/Tumor/Methylation/Sesame_Betas_processed
 1998  python methylation_preprocess.py -i ../../CHOL/Source/Norm/Methylation/Sesame_Betas/ -o ../../CHOL/Source/Norm/Methylation/Sesame_Betas_processed
 1999  python methylation_preprocess.py -i ../../ESCA/Source/Norm/Methylation/Sesame_Betas/ -o ../../ESCA/Source/Norm/Methylation/Sesame_Betas_processed
 2000  python methylation_preprocess.py -i ../../ESCA/Source/Tumor/Methylation/Sesame_Betas/ -o ../../ESCA/Source/Tumor/Methylation/Sesame_Betas_processed
 nohup python methylation_preprocess.py -i ../../PAAD/Source/Norm/Methylation/Sesame_Betas/ -o ../../PAAD/Source/Norm/Methylation/Sesame_Betas_processed > methyl/paad_norm_meth.txt 2>&1 &
 2022  nohup python methylation_preprocess.py -i ../../PAAD/Source/Tumor/Methylation/Sesame_Betas/ -o ../../PAAD/Source/Tumor/Methylation/Sesame_Betas_processed > methyl/paad_tumor_meth.txt 2>&1 &
 2023  nohup python methylation_preprocess.py -i ../../COAD/Source/Tumor/Methylation/Sesame_Betas/ -o ../../COAD/Source/Tumor/Methylation/Sesame_Betas_processed > methyl/coad_tumor_meth.txt 2>&1 &
 2024  nohup python methylation_preprocess.py -i ../../COAD/Source/Norm/Methylation/Sesame_Betas/ -o ../../COAD/Source/Norm/Methylation/Sesame_Betas_processed > methyl/coad_norm_meth.txt 2>&1 &
 2025  nohup python methylation_preprocess.py -i ../../STAD/Source/Norm/Methylation/Sesame_Betas/ -o ../../STAD/Source/Norm/Methylation/Sesame_Betas_processed > methyl/stad_norm_meth.txt 2>&1 &
 2026  nohup python methylation_preprocess.py -i ../../STAD/Source/Tumor/Methylation/Sesame_Betas/ -o ../../STAD/Source/Tumor/Methylation/Sesame_Betas_processed > methyl/stad_tumor_meth.txt 2>&1 &
 2027  nohup python methylation_preprocess.py -i ../../LIHC/Source/Tumor/Methylation/Sesame_Betas/ -o ../../LIHC/Source/Tumor/Methylation/Sesame_Betas_processed > methyl/lihc_tumor_meth.txt 2>&1 &
 2028  nohup python methylation_preprocess.py -i ../../LIHC/Source/Norm/Methylation/Sesame_Betas/ -o ../../LIHC/Source/Norm/Methylation/Sesame_Betas_processed > methyl/lihc_norm_meth.txt 2>&1 &
 2029  nohup python methylation_preprocess.py -i ../../LUAD/Source/Norm/Methylation/Sesame_Betas/ -o ../../LUAD/Source/Norm/Methylation/Sesame_Betas_processed > methyl/luad_norm_meth.txt 2>&1 &
 2030  nohup python methylation_preprocess.py -i ../../LUAD/Source/Tumor/Methylation/Sesame_Betas/ -o ../../LUAD/Source/Tumor/Methylation/Sesame_Betas_processed > methyl/luad_tumor_meth.txt 2>&1 &
 python cnv_preprocess.py --input ../../ESCA/Source/Tumor/ASCAT_CNV/ --output ../../ESCA/Source/Tumor/ASCAT_CNV_processed > ./cnv/esca_tumor_cnv.txt
 1982  python cnv_preprocess.py --input ../../ESCA/Source/Norm/ASCAT_CNV/ --output ../../ESCA/Source/Norm/ASCAT_CNV_processed > ./cnv/esca_norm_cnv.txt
 1986  python cnv_preprocess.py --input ../../PAAD/Source/Norm/ASCAT_CNV/ --output ../../PAAD/Source/Norm/ASCAT_CNV_processed > ./cnv/paad_norm_cnv.txt
 1987  python cnv_preprocess.py --input ../../PAAD/Source/Tumor/ASCAT_CNV/ --output ../../PAAD/Source/Tumor/ASCAT_CNV_processed > ./cnv/paad_tumor_cnv.txt
 1988  python cnv_preprocess.py --input ../../LIHC/Source/Tumor/ASCAT_CNV/ --output ../../LIHC/Source/Tumor/ASCAT_CNV_processed > ./cnv/lihc_tumor_cnv.txt
 1989  python cnv_preprocess.py --input ../../LIHC/Source/Norm/ASCAT_CNV/ --output ../../LIHC/Source/Norm/ASCAT_CNV_processed > ./cnv/lihc_norm_cnv.txt
 1990  python cnv_preprocess.py --input ../../COAD/Source/Norm/ASCAT_CNV/ --output ../../COAD/Source/Norm/ASCAT_CNV_processed > ./cnv/coad_norm_cnv.txt
 1991  python cnv_preprocess.py --input ../../COAD/Source/Tumor/ASCAT_CNV/ --output ../../COAD/Source/Tumor/ASCAT_CNV_processed > ./cnv/coad_tumor_cnv.txt
 1992  python cnv_preprocess.py --input ../../STAD/Source/Tumor/ASCAT_CNV/ --output ../../STAD/Source/Tumor/ASCAT_CNV_processed > ./cnv/stad_tumor_cnv.txt
 1993  python cnv_preprocess.py --input ../../STAD/Source/Norm/ASCAT_CNV/ --output ../../STAD/Source//ASCAT_CNV_processed > ./cnv/stad_tumor_cnv.txt
 1994  python cnv_preprocess.py --input ../../LUAD/Source/Tumor/ASCAT_CNV/ --output ../../LUAD/Source/Tumor/ASCAT_CNV_processed > ./cnv/luad_tumor_cnv.txt
 1995  python cnv_preprocess.py --input ../../LUAD/Source/Norm/ASCAT_CNV/ --output ../../LUAD/Source/Norm/ASCAT_CNV_processed > ./cnv/luad_norm_cnv.txt
 做完了
2025-7-21:好像BRCA下完了Tumor
2025-7-22:继续下载KICH的Tumor
2025-7-23/24:KICH还没下完。
2025-7-28:拷贝数变异和外显子突变存在一点问题。
2025-7-30:这些现象是正常的,具体原因如下:
Segment_Mean=0的情况:
计算公式:Segment_Mean = log2(Copy_Number/2)
当Copy_Number=2(正常二倍体)时,计算结果为0
表示该区域没有拷贝数变异(中性)
p_value/q_value=1的情况:
p_value计算:2*pnorm(-abs(Segment_Mean), mean=0, sd=0.5)
当Segment_Mean=0时,p_value=2*0.5=1
q_value通过p_value校正得到,因此也为1
表示该变异无统计学显著性
gistic_peak为空的情况:
GISTIC判断标准:
Amplification: Segment_Mean > 0.3且q_value < 0.25
Deletion: Segment_Mean < -0.3且q_value < 0.25
当变异不满足上述条件时标记为NA
表示该区域未被识别为显著扩增/缺失
生物学意义:
这些结果表示检测到的是:
中性拷贝数区域(二倍体正常)
无统计学意义的变异
非显著性的拷贝数变化
在癌症基因组中,这类结果通常表示:
正常体细胞拷贝数区域
技术噪音范围内的变异
不具有临床意义的拷贝数变化
2025-7-31:成了,成了,cnv注释上了。。。不容易啊,后面再搞一个甲基化的注释
开始处理,CHOL,ESCA LUAD,PAAD,COAD,STAD,LIHC。终于提交
开始下载THCA_tumor
2025-8-12:嘿,高某人又回来了,继续下载。filter一下,然后我们统计一下,低量的癌症有哪些。
前面已经有了CHOL,ESCA LUAD,PAAD,COAD,STAD,LIHC
现在在下UCEC_tumor,有点难下,下PRAD_tumor咋感觉也下不动呢
2025-8-13:下载的还是PRAD_tumor
2025-8-14:继续 下载PRAD_tumor
2025-8-16:下载完PRAD_tumor,现在下载SKCM
2025-8-17:SKCM_tumor下完了,下一个是CMDC_tumor
2025-8-19:CMDC_tumor下完了,下一个UCEC_tumor
2025-8-26:UCEC_tumor下不动了,换下一个:AMIL
2025-8-27:AMIL下不动一点
2025-8-28:下HDNNK_tumor
2025-9-03:HDNNK_tumor下完了,现在是AMILI_Tumor