Return to site

File List Export 2 4 0 3

broken image
PLINK 2.0 alpha was developed by Christopher Chang , with support from GRAIL, Inc. and Human Longevity, Inc. , and substantial input from Stanford's Department of Biomedical Data Science. (More detailed credits.) (Usage questions should be sent to the plink2-users Google group , not Christopher's email.)
File List Export 2 4 0 3 Use Models To Find Draw 5th Grade Excel Math Activities
File List Export 2 4 0 3 Use Models To Find Draw The Equal Groups Free Binary downloads Build Operating system Development (20 Oct) Alpha 2.3 final (24 Jan) Linux AVX2 Intel 1 download download Linux 64-bit Intel 1 download download Linux 32-bit download download macOS AVX2 download download macOS 64-bit download download Windows AVX2 download download Windows 64-bit download download Windows 32-bit download download
Version: 4.0.0 Date: September 3, 2018 - Added support for exporting/importing single objects - File name (craOBJs.mel), Tool name (OBJI/O) and UI changed - Maya 2010 still included on the package but only v3.0.0 (no more updates sorry) Thanks to Gil Hacco for the idea of using choice nodes. Any feedback appreciated.
Grails' repositories are hosted by Artifactory Website hosting provided by Pivotal YourKit supports Grails with its Java Profiler Grails is Open Source Apache 2 License.
VCard, also known as VCF (Virtual Contact File), is a file format standard for electronic business cards. VCards are often attached to e-mail messages but can be exchanged in other ways, such as Multimedia Messaging Service (MMS), on the World Wide Web, instant messaging or through QR code.They can contain name and address information, telephone numbers, e-mail addresses, URLs, logos.
Every file system being exported to remote users via NFS, as well as the access level for those file systems, are listed in the /etc/exports file. When the nfs service starts, the /usr/sbin/exportfs command launches and reads this file, passes control to rpc.mountd (if NFSv2 or NFSv3) for the actual mounting process, then to rpc.nfsd where the file systems are then available to remote users.
1: These builds can still run on AMD processors, but they're statically linked to Intel MKL, so some linear algebra operations will be slow. We will try to provide an AMD Zen-optimized build as soon as supporting libraries are available.
Source code and build instructions are available on GitHub. (Here's another copy of the source code.) Recent version history
20 Oct 2020: --fst Weir-Cockerham method implemented. --fst ids= and chrX bugfixes. --fst variant-report OBS_CT is now specific to population pair.
19 Oct: Linux binaries should now yield reproducible results across machines unless --native is specified (previously, Intel MKL could select processor-dependent code paths with different floating-point rounding behavior). --fst Hudson method implemented. Categories within categorical phenotypes are now reported in natural-sorted order. --variant-score MISSING_CT/OBS_CT bugfix.
23 Sep: --update-ids no-FID bugfix.
14 Sep: --glm + --parameters chrX/chrY bugfix.
31 Aug: --data/--sample now supports QCTOOLv2's .sample dialect. --export 'sample-v2' exports it.
27 Jul: --glm 'cc-residualize' implemented. Note that these approximations are not recommended if you have a significant number of missing genotypes.
25 Jul: --glm 'firth-residualize' modifier added. This implements the fast Firth approximation introduced in Mbatchou J et al. (2020) Computationally efficient whole genome regression for quantitative and binary traits.
6 Jul: --af-pseudocount flag implemented; this lets you specify a pseudocount other than 0 or 1 for allele frequency estimation.
1 Jul: --make-[b]pgen 'fill-missing-from-dosage' modifier implemented, to support algorithms that require no missing hardcalls.
27 Jun: --hardy/--hwe chrX multiallelic-variant handling bugfixes.
25 Jun: Replaced a misleading 'No such file or directory' file-read error message.
20 Jun: --het implemented.
15 Jun: --glm local-covar= no longer errors out on long RFMix2 header lines, as long as ID lengths are reasonable.
31 May: Added single-precision --variant-score mode.
11 May: Fixed --glm segfault that occurred when categorical covariates were present, but none had more than 2 categories.
9 Apr: Firth regression implementation now uses the same maxit=25 value as R logistf(). 'UNFINISHED' error code added to flag logistic/Firth regression results which would change with even more iterations.
28 Mar: Fixed --glm bug in 21 Mar build that caused segfaults when zero-MAF biallelic variants were present. --glm now errors out when no covariate file is specified, unless the 'allow-no-covars' modifier is specified.
21 Mar: Fixed --glm multiallelic-variant handling bugs that could occur when 'genotypic', 'hethom', 'dominant', 'recessive', 'interaction', or --tests was specified, and corrected 'dominant'/'recessive' documentation. It is no longer necessary to trim zero- (or other-constant-) dosage alleles from multiallelic variants to get --glm results for the other alleles.
14 Mar: --make-pgen/--make-just-pvar 'vcfheader' column set added (this makes it possible to directly generate a valid sites-only VCF). Bgzipping of the .pvar file is not directly supported, but you can use a named pipe to accomplish that with low overhead.
11 Mar: Fixed --glm segfault that could occur when no covariates were specified. VCF/BCF importers now default to compressing the temporary .pvar file, so that files with lots of INFO field content don't require a disproportionally large amount of free disk space to work with. --keep-autoconv now has a 'vzs' modifier to request compression of the .pvar file (and conversely, when --vcf/--bcf is used with bare --keep-autoconv, the .pvar is not compressed).
10 Mar: Fixed --make-pgen segfault that occurred when phased dosages were present without any phased hardcalls.
8 Mar: '--export bcf' implemented. VCF-export multiallelic HDS-force bugfixes. Added missing FILTER:fa header line to whole-genome 1000 Genomes phase 3 annotated .pvar files on Resources page.
25 Feb: --ld multiallelic-phased data handling bugfix.
22 Feb: --bcf n_allele=1 (ALT='.') bugfix.
19 Feb: --bcf GQ/DP-filtering bugfixes. --vcf and --bcf now enforce VCF contig naming restrictions.
17 Feb: --bcf implemented.
11 Feb: '--vcf-half-call reference' works properly again (it was behaving like '--vcf-half-call error' in recent builds).
8 Feb: BGZF-compressed text files should now work properly with all commands that make multiple passes over the file (previously they worked with --vcf, but almost no other commands of this type). Named-pipe input to these commands should now consistently result in an error message in a reasonable amount of time; previously this could hang forever.
3 Feb: --missing-code now works properly with --haps.
24 Jan: Fixed --extract/--exclude bug that could occur when another variant filter was applied earlier in the order of operations (e.g. --snps-only, --max-alleles, --extract-if-info). This bugfix has been backported to alpha 2. File List Export 2 4 0 3 Use Models To Find Draw 5th Grade Excel Math Activities
23 Jan: Added --bed-border-bp,kb flags for extending all '--extract range'/'--exclude range' intervals.
21 Jan: '--extract range' and '--exclude range' no longer error out when their input files contain a chromosome code absent from the current dataset.
16 Jan: --pca allele/variant weight multithreading bugfix.
14 Jan: --make-king-table rel-check bugfix.
3 Jan 2020: Fixed --extract-if-info/--exclude-if-info numeric-argument bug introduced in late October.
30 Dec 2019 (alpha 3): This makes the following potentially compatibility-breaking changes:
--write-snplist and --indep-pairwise require all variant IDs to be unique. For --write-snplist, this can be overridden by adding the 'allow-dups' modifier.
.bgen/.gen import commands require the REF/ALT mode to be explicitly declared.
--glm defaults to 'firth-fallback' mode for binary phenotypes. The old behavior can be requested with the 'no-firth' modifier.
--glm errors out, instead of just skipping the phenotype and printing a warning, when there's a linear dependency between the phenotype and the covariates. The old behavior can be requested with the 'skip' modifier.
--pca's 'var-wts' subcommand has been replaced with 'allele-wts', which handles multiallelic variants properly. For datasets that contain only biallelic variants, the old output format can still be requested with 'biallelic-var-wts'.
PLINK 2 now errors out when you request an LD computation on a dataset with less than 50 founders. This can be overridden with --bad-ld.
--score's old NMISS_ALLELE_CT column ( n on miss ing allele count) has been renamed to ALLELE_CT, and the column set renamed accordingly, since in other contexts, 'nmiss' refers to the n umber of miss ing values, which is essentially the opposite.
--make-king-table's ID1,2 columns have been renamed to IID1,2, for consistency with other PLINK 2 commands.
In addition, the GRM computation (along with '--pca approx' and '--score variance-standardize') now handles multiallelic variants properly, instead of just collapsing all minor alleles together; --score allows each allele in a multiallelic variant to be assigned its own score; and --glm handles categorical covariates in a manner that's less likely to cause VIF overflow.
The final alpha 2 build has been tagged in GitHub, and will remain downloadable from here for the next several months.
29 Dec: Fixed a bug which affected processing of some heterozygous-double-ALT multiallelic variants, and a bug that caused ALT2/ALT3/etc. allele frequencies to not be properly initialized in some circumstances.
13 Dec: Fixed bug introduced in 22 Nov build which caused some reported dosages/counts (such as --freq's OBS_CT column) to be doubled. --loop-cats bugfixes.
28 Nov: Fixed a VCF half-call handling bug introduced last month.
26 Nov: Fixed recent bug which caused a segfault when no-duplicate-allowed variant ID lookup was performed with more than 16 threads.
25 Nov: Fixed bug that caused --sort-vars to segfault when the number of contigs was a multiple of 16. --keep-fcol and --extract-fcol were judged to be poopy names, and have been renamed to --keep-col-match and --extract-col-cond respectively (the old names will still work in this build).
The online documentation is now almost complete. Veertu 1 0 4 mas inapp download free . The sidebar search box works.
22 Nov: Firth regression speed improvement. '--freq counts' now exports dosages with enough precision for --read-freq to perfectly reconstruct the original allele frequencies from the .acount file, and --read-freq has been modified to do that.
20 Nov: Fixed --make-king[-table] + --parallel bug.
15 Nov: Fixed '--glm cols=+err' bug that could cause garbage output when 'hide-covar' was not specified. --covar-number retired (previously it was being incorrectly converted to --covar-col-nums, which does not have the same semantics).
12 Nov: All-vs.-all --make-king[-table] runs now handle MAF 1 variants much more efficiently. --no-input-missing-phenotype option added. --variant-score now supports binary output.
10 Nov: Fixed bug introduced in 29 Oct build that caused a segfault when a 'NA'/'nan' phenotype or covariate value was encountered.
9 Nov: --variant-score (transpose of --score) implemented. Realbasic 2011 keygen torrent .
4 Nov: Restored '--export vcf' invalid-allele-code warning.
31 Oct: --split-cat-pheno 'omit-most' modifier implemented; it works better with --glm's built-in variance-inflation-factor check than 'omit-last', and --glm will switch to handling categorical covariates in this manner in alpha 3.
30 Oct: Fixed bug that caused --covar-col-nums and --covar iid-only to get mixed up. Stricter blank-line policy for most text input files: they're allowed at the end (since this happens every once in a while with manually edited files), but they're no longer allowed elsewhere. Removing the FILTER and/or INFO columns when generating a .pvar file (with e.g. 'pvar-cols=-info') now removes the corresponding header lines.
29 Oct: --q-score-range implemented. Strings which start with a number but contain nonnumeric content (e.g. '-123.4abc') now trigger an error when a floating-point number is expected; the example string was previously just parsed as -123.4.
25 Oct: --make-king-table 'rel-check' modifier added; this has the same effect as it did for PLINK 1.9 --genome. --pca 'var-wts' modifier deprecated: switch to 'biallelic-var-wts' when your data contains only biallelic variants and you want to continue generating only one weight per variant. (Alpha 3 will introduce an 'allele-wts' modifier which generates one weight per allele instead; this is necessary to support multiallelic variants in an analytically sound manner.)
22 Oct: --recover-var-ids implemented. (This is designed to reverse --set-all-var-ids.)
20 Oct: --sample-counts implemented; this provides the main (non-indel) sample counts reported by 'bcftools stats's -s flag, and is 100x as fast for plink2-formatted large datasets. --extract-fcol extended to support substring matches.
15 Oct: Fixed bug in 12 Oct Linux builds that caused plink2 to hang on --extract/--exclude/--snps and similar variant ID filters. Implemented --extract-fcol, which filters variants based on a TSV column (this is an extension of PLINK 1.x --qual-scores).
12 Oct: '--hwe 0' no longer removes a small number of very-low-HWE-p-value variants.
9 Oct: --pheno/--covar 'iid-only' modifier added, supporting headerless files with a single ID column. Windows BGZF compression is now multithreaded. Improved read-error messages.
6 Oct: Windows --silent bugfix. Source code now supports dynamic linking with libzstd (though performance may suffer if you don't build the multithreaded version of that library).
4 Oct: --king-table-subset + --parallel bugfix. Automatic Zstd text-file decompression was broken for a few commands by the 28 Sep build; that should work properly now.
3 Oct: Fixed BGZF decompression bugs in 28 Sep build. (This did not affect VCF .bed/.pgen conversion, though some rarer use cases were affected.) SID-loading bugfix.
28 Sep: Mixed-provisional-reference bugfixes. --ref-allele/--alt1-allele/--update-map/--update-name skip-count bugfix. --glm local-covar line-skipping bugfix. Automatic-rename when an input filename matches an output filename should work properly again instead of erroring out (though it should still be avoided).
10 Sep: --glm joint test p-value bug fix. (This bug only affected runs where --tests was invoked with 4 or more predictors.)
26 Aug: --read-freq now prints a warning, instead of segfaulting or entering an infinite loop, when all variants have already been filtered out.
21 Aug: Fixed --ref-from-fa/--ref-allele + VCF export interaction that caused spurious 'PR' INFO flags to be reported.
10 Aug: Open-fail and write-fail error messages now include a more detailed explanation of what went wrong. --bgen, --data, and --gen now have a 'ref-unknown' modifier for explicitly specifying that neither the first nor last allele is consistently REF.
31 Jul: --score prints an error message instead of segfaulting when an input-file line is truncated. Fixed rare --glm bug that could cause all results to be reported as 'NA' when exactly one covariate is defined. .log files print '--out' and '--d' properly again (this was broken by the 24 Jul build). --glm now has an optional output column ('err') which reports the reason for each 'NA' coefficient.
24 Jul: --d implemented.
8 Jul: --rm-dup/--sample-diff/--ld multiallelic variant bugfix.
5 Jul: --read-freq moved before usual allele frequency/count computation in order of operations. Loaded allele frequencies are not recomputed any more.
28 Jun: --king-table-subset should work properly again.
26 Jun: Fixed --glm multiallelic-variant bug that could cause one allele to be reported twice and one covariate test to be unreported, when neither 'hide-covar' nor 'intercept' was specified. Fixed issue that could cause --glm genotypic/hethom to segfault with no covariates.
17 Jun: Fixed rare underflow in --glm p-value computation which could cause an assertion failure.
27 May: Unbroke --adjust-file. '--export ind-major-bed' performance improvement.
12 May: Fixed --glm linear regression phenotype-batch handling bug that could cause a crash (or, on .bed-formatted data, generate incorrect results) on batches of size 240.
29 Apr: BGEN 1.2/1.3 phased-dosage import bugfixes. --make-pgen + --dosage-erase-threshold without --hard-call-threshold no longer crashes.
28 Apr: PLINK 2-specific extensions to --update-ids and --update-parents simplified. --id-delim/--sample-diff 'sid' modifier for specifying that single-delimiter sample IDs should be interpreted as IID-SID changed to --iid-sid flag.
27 Apr: --haps bugfix for sample counts congruent to 17.31 (mod 32). This only affected the last few samples of the file, but if you used --haps with an earlier build, we strongly recommend rerunning it. --glm logistic regression 'SE' column renamed to LOG(OR)_SE when reporting odds ratio, to make it more obvious that the reported standard error does not use odds ratio units. --update-parents implemented.
2 Apr: Fixed --hwe bug that could cause chrY and MT variants to be improperly filtered. --glm 'pheno-ids' now works for groups of quantitative phenotypes.
1 Apr: --glm without --adjust now detects groups of quantitative phenotypes with the same 'missingness pattern', and processes them together (with a large speed increase; but be careful re: disk space, you probably want to use the 'hide-covar' modifier, 'zs' and/or --pfilter might also be useful). --glm linear regression local-covar= bugfix.
26 Mar: Minimac3-r2 computation bugfix. --glm no longer generates .id files listing all samples used for each phenotype, unless the 'pheno-ids' modifier is added. --update-ids implemented.
23 Mar: Fixed multiallelic-variant writer bug that could affect files where the largest number of alleles is 6 or 18. --minimac3-r2-filter and --freq minimac3r2 column implemented.
18 Mar: --write-covar can now be used when no covariates are loaded, if at least one phenotype is loaded and phenotype output was requested.
9 Mar: plink2 --version and --help no longer return nonzero exit codes.
A draft PGEN specification is now available.
6 Mar: Fixed allele frequency computation bug that could cause a spurious 'Malformed .pgen file' error when a variant filter was active.
5 Mar: Multithreaded --extract/--exclude.
4 Mar: --tests linear-regression output bugfix.
3 Mar: Fix --glm odds-ratio printing bug introduced on 1 Mar.
2 Mar: More help text cleanup (now including online documentation).
1 Mar: --recode-allele implemented (and renamed to --export-allele for consistency). VCF import now errors out when a space-containing INFO value is imported. Brackets in command-line help text are now used in a manner more similar to other tools.
21 Feb: --glm joint tests are now based on F-statistics, for better small-sample accuracy.
20 Feb: --import-dosage-certainty now always produces a missing call, instead of falling back on the VCF GT field, when dosage certainty is inadequate. --extract-intersect flag added.
19 Feb: --glm works properly again with no covariates (it was exiting with a spurious 'out of memory' error). --import-dosage-certainty now has the expected effect on single-valued dosages, instead of just genotype-probability triplets.
18 Feb: Fixed a bug that could cause --missing to crash on dosage data.
14 Feb: Command-line integer parameters can now use scientific notation.
12 Feb: Phased-dosage import bugfix.
2 Feb: --tests + --parameters bugfix.
31 Jan: --pca approx now errors out instead of reporting inaccurate results when the number of variants is too small relative to the number of PCs. --pca approx eigenvalue bugfix.
30 Jan: --glm covariate-scale error is now propagated properly, instead of producing a mysterious out-of-memory error message.
27 Jan: --tests implemented.
22 Jan: --glm now errors out and recommends adding --covar-variance-standardize when covariates vary enough in scale for numeric instability to be a major concern.
2 Jan 2019: Phased-dosage import bugfix.
27 Dec 2018: --ref-allele/--alt1-allele skipchar was broken for the past few months; it should work properly again. Fixed a bug which occurred when importing an all-noninteger-dosage variant.
28 Oct: --keep-fam/--remove-fam bugfix.
2 Oct: Fixed bug that could occur when loading very long text lines (e.g. VCF lines longer than 5 MB).
22 Sep: Fixed rare bug that could occur when processing variants out of order. --sample-diff command implemented.
12 Sep: --normalize 'list' modifier added.
11 Sep: --rm-dup 'list' modifier added, for listing all duplicated variant IDs. (This can be run as a standalone command.)
9 Sep: Fixed rare race condition in text decompressor that could cause input lines to be skipped. (We believe this was the cause of the VCF-import 'File read failure' crashes reported over the last few months.)
8 Sep: Fixed VCF-export bug that could occur when extra contig header lines were present. --sort-vars bugfix. --normalize now detects when post-normalization variants are no longer in sorted order, and prints a warning in that case.
7 Sep: --ld bugfix for phased multiallelic variants. --rm-dup flag added (removes duplicate-ID variants, can check for genotype/INFO/etc. equality).
4 Sep: Fixed A1_CASE_FREQ and related columns in --glm output broken by recent multiallelic update. Cleaned up a few column names in --geno-counts and --hardy output.
31 Aug: Fixed --glm bug with handling constant and all-constant-but-1 covariates.
30 Aug: AVX2 and 32-bit --export bgen-1.2/1.3 bugfixes (mainly affects missing genotypes). '--export vcf-4.2' mode added for compatibility with programs (e.g. SNPTEST) which reject VCF 4.3 files. Exported VCFs should now have more appropriate contig headers when PAR1 and/or PAR2 are present in the input. Left-normalization (--normalize) flag added.
26 Aug: Last column of --pca .eigenvec header line is no longer omitted.
21 Aug: Fixed --mac/--max-mac 'nref' and 'alt1' mode bugs in yesterday's build. File List Export 2 4 0 3 Use Models To Find Draw The Equal Groups Free
20 Aug: Fixed '--vcf dosage=GP' bug introduced on 7 May; if you used any build from the last three-and-a-half months to import VCF FORMAT:GP data, rerun with a newer build. '--vcf dosage=GP' now errors out with a suitable message when the file also contains a FORMAT:DS field, and a 'dosage=GP-force' option has been added to cover the rare cases where importing the GP field might still be worthwhile. --maf/--max-maf/--mac/--max-mac now let you filter on nonmajor (default), non-reference, alt1, or minor allele frequencies/counts; you can use bcftools notation for this (e.g. '--min-af 0.01:minor'), but keep the different default in mind.
18 Aug: plink2-formatted 1000 Genomes phase 3 files, with phased haplotypes and annotations included, and a few corrections to the official pedigree (determined via KING-robust analysis), can now be downloaded from the Resources page. --king-cutoff can now handle sample ID files containing a header line.
16 Aug: --glm logistic regression now supports multiallelic variants. Fixed --glm linear-regression dosage handling bug in yesterday's build.
15 Aug: --glm linear regression now supports multiallelic variants. --ld bugfix. --parameters + '--glm interaction' now works properly when a covariate is only involved as part of an interaction.
9 Aug: --make-king[-table] singleton/monomorphic-variant optimization implemented.
7 Aug: GRM construction and --missing no longer break with multiallelic data.
6 Aug: VCF multiallelic(-phased) import and export implemented. --hwe now tests each allele separately for multiallelic variants. --min-alleles/--max-alleles filtering flags added.
(--glm doesn't support multiallelic variants yet; that update is planned for next week.)
30 Jul: --vcf-max-dp flag added.
26 Jul: --vcf-half-call should now work properly on unphased data.
25 Jul: Fixed --sort-vars/low-memory-make-pgen dosage-handling bug that could trigger unwanted hardcall thresholding. If you used a build from 14 Apr - 19 Jul 2018 to work with dosage data, the hardcalls may not have been thresholded correctly. Unfiltered dosage datasets imported by an affected build can be corrected by running --make-pgen + explicit --hard-call-threshold. Hardcall-based filters such as --geno/--mind should be rerun (after the hardcalls have been corrected).
19 Jul: --update-alleles implemented.
16 Jul: Added more multithreaded-VCF-parse debug logging code.
13 Jul: Fixed chrX/Y/MT autoremoval bug in --make-king/--make-grm/--pca.
12 Jul: Unbroke --mach-r2-filter.
3 Jul: .fam/.psam files now load properly when only the IID column is requested or present.
29 Jun: .bim/.pvar files with more than 134 million variants load properly again (given sufficient memory).
25 Jun: Fixed a few odd-sample-count export cases which were broken around 30 May.
22 Jun: Fixed a few log messages which were broken in the 19-20 Jun builds. Added debug-print code to support an ongoing multithread-VCF-dosage-import bug investigation (if you are encountering mysterious 'File read failure' errors during VCF import or 'Malformed .pgen' errors when reading the result, adding '--threads 1' to your VCF-import command will probably solve your immediate problem, but if you can also send me a .log file from the failing multithreaded run (or even better, test data) that would be very helpful).
20 Jun: Fix GRM/PCA/score-computation bug introduced on 30 May. If you used the 30 May or an early June build for GRM/--pca/--score, you should repeat the operation(s) with this build; apologies for the error.
19 Jun: Fixed rare --ref-allele/--alt1-allele corner case which could occur when a missing allele was replaced with a very long allele.
Duplicate detective find and delete duplicate files 1 96 . 5 Jun: VCF import uninitialized-variable bugfix. --score 'ignore-dup-ids' modifier added.
30 May: '--export haps[legend]' bugfixes and bgzip support. '--export vcf vcf-dosage=DS' no longer exports undeclared HDS values when phase information is present. Unbreak --import-dosage + --map, for real this time.
21 May: --pgen-info command added (displays basic information about a .pgen file, such as whether it has any phase or dosage data).
17 May: --import-dosage and .gen import were broken for the last several weeks; this should be fixed now. A1 column added to --adjust output in preparation for multiallelic variants. --glm 'a0-ref' modifier renamed to 'omit-ref'.
15 May: Fixed chrX allele frequency computation bug when dosages are present. --ld modified to be based on major instead of reference alleles, to play better with multiallelic variants. --hardy header line and allele columns changed in preparation for multiallelic variant support.
8 May: --vcf dosage=HDS should now handle files with no DS field properly.
7 May: Fixed rare I/O deadlock. Improved VCF-import parallelism.
4 May: Fixed --bgen import/export when dosage precision bits isn't a multiple of 8 (previously misinterpreted the spec for those cases, sorry about that).
3 May: --bgen can now import variant records with up to 28 bits of dosage precision (though only 15 bits will survive). '--export vcf-dosage=HDS-force' bugfix.
2 May: --vcf dosage= import no longer requires GT field to be present. Fixed potential --vcf dosage=HDS buffer overflow.
28 Apr: Fixed a --glm bug which occurred when autosomes and sex chromosome(s) were both present, or both chrX and chrY were present. If you performed a whole-genome --glm run with the 9 Feb 2018 build or later, you should rerun with the latest build. However, single-chromosome and autosome-only --glm runs were unaffected by the bug.
24 Apr: VCF phased-dosage import ('--vcf dosage=HDS') and export ('--export vcf vcf-dosage=HDS'). --pca and GRM computation now use correct variance for all-haploid genomes.
22 Apr: --export bgen-1.2/bgen-1.3 should now work for chrX/chrY/chrM; also fixed import bugs for those chromosomes.
16 Apr: --ref-from-fa contig line parsing bugfix.
14 Apr: --export bgen-1.2/bgen-1.3 implemented for autosomal diploid data. Operations like --pca which require decent allele frequencies now error out when frequencies are being estimated from less than 50 samples, unless you add the --bad-freqs flag. Phased dosage support implemented. Sample missingness rate in exported .sample files is now based on dosages rather than hardcalls. Non-AVX2 phase subsetting bugfix. --vcf + --psam bugfix. --vcf dosage= now ignores the hardcall when a dosage is present; instead, it's regenerated under --hard-call-threshold 0.1 (unless you specified a different threshold). --bgen 'ref-second' modifier renamed to 'ref-last', to generalize properly to multiallelic variants.
31 Mar: --export haps[legend] should now work properly when --ref-allele/--ref-from-fa/etc. flips some alleles in the same run.
29 Mar: --set-missing,all-var-ids non-AVX2 bugfix. --pheno/--covar autonaming bugfix.
28 Mar: --bgen 1-bit phased haplotype import implemented.
26 Mar: --make-bed + --indiv-sort bugfix.
https://heremfil325.weebly.com/pes-2019-for-mac.html . 23 Mar: Windows builds should work properly again (the 20-21 Mar Windows builds were badly broken). --glm now supports log-pvalue output (add the 'log10' modifier), and these remain accurate below the double-precision floating point limit of p=5e-324.
21 Mar: 3-column .sample file loading works properly again. Fixed a file-reading race condition.
20 Mar: Fix possible deadlock in recent builds when loading very long lines.
19 Mar: Fix --sample segfault in recent builds. .bgen import/export speed improvement. --oxford-single-chr wasn't extended correctly in the 4 Mar build; this should be fixed now.
11 Mar: Fix --pheno segfault in last week's builds that could occur when the file didn't have a header line.
9 Mar: Fix 'File write failure' bug that occurred when a single write operation was larger than 2 GB (this could occur when running --make-bed with more than 128k samples). Reduced --make-bed memory requirement.
7 Mar: Fixed potential file-reading deadlock in recent builds (23 Feb or later).
5 Mar: --glm local-covar= should work properly again.
4 Mar: --oxford-single-chr can now be used on .bgen files. --make-pgen partially-phased data handling bugfix.
26 Feb: --keep/--remove/etc. should work properly now on IID-only files with no header line.
23 Feb: Fixed alpha 2 --vcf + --id-delim bug. Improved parsing speed for compressed VCF and .pvar files.
20 Feb: '--xchr-model 1' should work properly now.
16 Feb 2018 (alpha 2): This makes the following potentially compatibility-breaking changes:
FID is now an optional field: if it isn't in the input .psam file, it's omitted from several output files by default (these now have 'maybefid' and 'fid' column sets, where the default set includes 'maybefid'), and treated as always-'0' by any operation which requires FID values (such as --make-bed). When exporting genomic data files, 'maybefid' also treats the column as missing if all remaining values are '0'.
Relatedly, when importing sample IDs from a VCF or .bgen file, the default mode is now '--const-fid 0', and no FID column will be written to disk at all. --keep, --remove, and similar commands also now have '--const-fid 0' semantics when an input line contains only one token. You can now act as if IID is the only sample ID component, if that's what makes the most sense for your workflow. Conversely, it is now necessary to explicitly use --id-delim when you want to split the VCF/.bgen sample IDs into multiple components.
MT is treated as a haploid chromosome again. In PLINK 1.9 and earlier plink2 builds, MT was treated as diploid-ish to avoid throwing away information about heteroplasmic mutations; as a consequence, the --glm(/--linear/--logistic) genotype column and commands like '--freq counts' used a 0.2 scale. Now that plink2 has proper support for dosages, this kludge is no longer necessary.
--glm's 't' column set has been renamed to 'tz', to reflect it being a T-statistic for linear regression but a Wald Z-score for logistic/Firth. The corresponding column in .glm.logistic[.hybrid] and .glm.firth files now has 'Z_STAT' in the header line.
Also, --glm now defaults to regressing on minor instead of ALT allele dosages (this can be overridden with 'a0-ref').
The final alpha 1 build has been tagged in GitHub, and will remain downloadable from here for the next few months.
11 Feb: .king.cutoff.in/.king.cutoff.out files now end in .id, for consistency with other output files with sample IDs and no other information. Similarly, --mind's output file now has the extension .mindrem.id and defaults to having a header line. You can now use --no-id-header to suppress the header line (and force the columns to be FID/IID) in all .id output files.
10 Feb: --update-sex 'male0' option added, and custom column selection interface changed (now 'col-num='). --glm 'gcountcc' column names updated (now 'CASE_NON_A1_CT', 'CASE_HET_A1_CT', etc.) in preparation for switch to A1=major allele. --make-just-pvar + --ref-allele/--ref-from-fa no longer treats all initial reference alleles as provisional when the input .pvar has a header line.
9 Feb: Forcing .pvar QUAL/FILTER output when no such values are loaded no longer causes a segfault.
5 Feb: AVX2 phase-subsetting bugfix.
3 Feb: --score 'dominant' and 'recessive' modifiers added.
30 Jan: Fix .pgen writing bug which occurred when the number of variants was a multiple of 64 and the number of samples was large.
24 Jan: '--export oxford' now supports bgzipped output.
21 Jan: --glm now always reports an additional 'A1' column, indicating which allele(s) correspond to positive genotype column values. --glm column sets have been changed to revolve around A1 instead of ALT, so minor script modifications may be necessary when switching to this build.
In this build, A1 and ALT are still synonymous. This will change in alpha 2: A1 will default to the minor allele(s) to reduce multicollinearity (imitating PLINK 1.x's behavior in the absence of --keep-allele-order), though you will still have the option of forcing A1=ALT.
12 Jan: Fixed '--glm interaction' bug that occurred when multiple consecutive variants had no missing calls. We recommend redoing all --glm runs with the 'interaction' modifier which were performed with a build produced between 27 Nov 2017 and 10 Jan 2018 inclusive.
10 Jan: --adjust-file implemented (performs --adjust's multiple-testing correction on any association analysis file).
9 Jan: Added 'no-idheader' modifiers to a few commands, and made that the default for --make-grm-bin/--make-grm-list to avoid breaking interoperability.
7 Jan: --vcf can now be given a sites-only VCF when the run doesn't require genotype data. Sample ID files, such as those produced by --write-samples, now include a header line by default; this will be necessary to distinguish between FID-IID and IID-SID output in the future. (With --write-samples, you can suppress the header line by adding the 'noheader' modifier.)
5 Jan: --pheno-col-nums/--covar-col-nums implemented.
2 Jan 2018: --keep-fcol (equivalent to PLINK1.x --filter) implemented.
19 Dec 2017: --adjust implemented. --zst-level implemented (lets you control Zstd compression level). Un-broke --rerun.
18 Dec: --extract/--exclude can now be used directly on UCSC interval-BED files (ok for coordinates to be 0-based or for no 4th column to be present). '--output-chr 26' now causes PAR1/PAR2 to be rendered as '25' (for humans), to restore interoperability with programs like ADMIXTURE which can't handle alphabetic chromosome codes. --merge-x implemented (usually needs to be combined with --sort-vars now). --pvar can usually handle 'sites-only' VCF files (e.g. those released by the gnomAD project) now. --thin, --thin-count, --thin-indiv, and --thin-indiv-count implemented.
16 Dec: Multithreaded zstd compression implemented (on Linux and macOS). --make-grm-gz renamed to --make-grm-list, and gzip mode removed.
15 Dec: Fixed --extract-if-info and --exclude-if-info's behavior for non-numeric values which start with a number. Existence-checking flags renamed to --require-info and --require-no-info for naming consistency.
13 Dec: --extract-if-info and --exclude-if-info flags added, for simple filtering on INFO key/value pairs or key existence.
11 Dec: --king-table-subset flag added. This makes it straightforward to perform two-stage relationship/duplicate detection: start with --make-king-table on a small number of higher-MAF variants scattered across the genome, and then rerun it with --king-table-subset on an appropriate subset of candidate sample pairs from the first stage. --bp-space implemented (useful for the first stage above).
The two-stage workflow was first implemented by Wei-Min Chen in a recent version of KING; contact him for citation information.
7 Dec: Fixed bug which could occur when filtering samples from a phased dataset. Windows AVX2 build now available.
28 Nov: --import-dosage 'format=infer' (this is now the default) and 'id-delim=' (needed for reimport of '--export A-transpose' data) options added. Fixed --import-dosage bug that caused it to error out on missing genotypes under format=1. --no-psam-pheno (or --no-pheno/--no-fam-pheno) can now be used to ignore all phenotypes in the sample file, while keeping the phenotype(s) in the --pheno file if one was specified.
27 Nov: Implemented fast path for --glm no-missing-genotype case (mainly affects linear regression). --make-king[-table] can now automatically handle matrices too large to fit in memory without explicit use of --parallel. AVX2 sample filtering performance improvement. --validate bugfix.
19 Nov: Fix VCF FORMAT:GT header line parsing bug introduced in 14 Nov build.
18 Nov: --make-king[-table] performance improvements.
16 Nov: Fixed bug in 14 Nov build that broke chrSet header line parsing.
14 Nov: Fixed bug that caused --export A,AD to hang when the number of variants was between 65 and about a thousand.
4 Nov: Linux and macOS prebuilt AVX2 binaries now available; these should work well on most machines built within the last 4 years. Fixed another Firth regression spurious NA bug. Fixed --score bug that occurred when sample filter(s) were applied simultaneously. Fixed a --ld phased-hardcall handling bug. Array-popcount upgrade in progress (thanks to recent work by Wojciech Mua, Nathan Kurz, Daniel Lemire, and Kim Walisch).
3 Nov: Fixed multipass --export A,AD bug. --dummy dosage-freq= now fills in hardcalls with the default --hard-call-threshold cutoff of 0.1 when --hard-call-threshold is not explicitly specified.
2 Nov: --export A,AD implemented (with dosage support). --dummy dosage-freq= modifier now works properly for dosage frequencies above 0.75.
16 Oct: --ref-from-fa flag implemented, to set reference alleles from a FASTA file. (Note that this may be unable to determine which allele is reference when length changes are involved, but it should always work for SNPs and multi-nucleotide polymorphisms.) --update-name implemented. Fixed column-set parsing bug in 13 Oct build.
13 Oct: Fixed --glm logistic/Firth regression bug which could produce spurious NA results.
9 Oct: Fixed --ld's handling of some dosage and haploid cases. Fixed bug which could cause --make-pgen to discard phase/dosage information when extracting a small variant subset. --geno-counts no longer double-reports chrY counts.
8 Oct: --ld implemented, with supported for phased genotypes and dosages (try '--ld var1 var2 dosage'). Fixed tiny bgen-1.1 import bug that triggered when the number of threads exceeded the number of variants. Allele frequency computation no longer crashes on chrX when dosages are present but only hardcalls are needed.
1 Oct: Fixed GRM computation bug which sometimes caused segfaults when both dosages and missing values were present. --glm is now a bit faster when many covariates are present.
20 Sep: Firth regression Hessian matrix inversion step raised to double-precision, after last week's builds revealed that single-precision inversion could be unreliable.
15 Sep: --vif/--max-corr per-variant checks are now working. These are no longer skipped during logistic regression.
8 Sep: Alternative VCF INFO:PR fields are now tolerated. Removed debug code that slowed down yesterday's --make-pgen.
7 Sep: --score uninitialized memory bugfix. Partially-phased data handling bugfix.
6 Sep: Fix macOS stack size issue (could cause --pca and some other commands to crash in recent builds; 1 Sep build had an incomplete workaround).
4 Sep: --[covar-]variance-standardize missing value handling bugfix. --ref-allele/--alt1-allele implemented (--a2-allele and --a1-allele are treated as aliases).
1 Sep: --pheno,covar-quantile-normalize missing-phenotype handling bugfix.
29 Aug: --glm 'gcountcc' column set option added (reports genotype hardcall counts, stratified by case/control status). --write-samples command added (analogous to --write-snplist).
2 Aug: --sort-vars implemented.
25 Jul: --loop-cats now works properly with genotype-based variant filters.
24 Jul: Fixed '--pca approx' allele frequency handling bug introduced in 4 Jun build; we recommend redoing any '--pca approx' runs performed with an affected build . (Regular --pca was not affected.) --loop-cats implemented (similar to PLINK 1.x --loop-assoc, except it's not restricted to association tests). VCF export now supports 'vcf-dosage=DS-force' mode. --dummy multithread + dosage bugfix.
17 Jul: BGEN v1.2/1.3 importer memory allocation bugfix. Size of failed allocation is now logged on most out-of-memory errors.
2 Jul: Improved multithreading in BGEN v1.2/1.3 importer. Python writer can now be called with multiple variants at a time.
25 Jun: Basic BGEN v1.2/1.3 import (unphased biallelic dosages; suffices for main UK Biobank data release). --warning-errcode flag added (causes an error code to be returned to the OS on exit when at least one warning is printed).
20 Jun: --condition-list + variant filter bugfix.
5 Jun: --make-pgen memory requirement greatly reduced. End time now printed to console in most situations.
4 Jun: --hwe no longer causes a segfault when chrX is present and no gender information is available. Fixed --dummy bug.
29 May: --import-dosage format=1 bugfix.
26 May: --glm 'standard-beta' modifier replaced with --variance-standardize flag. --quantile-normalize function added. Fixed a missing-sex allele counting bug.
25 May: --hardy/--hwe works properly again when chrX is present but not at the beginning of the dataset.
22 May: Fixed major dosage data + sample-filter bug; we recommend rerunning any operations involving both dosage data and sample filtering performed with earlier plink2 builds . --score 'list-variants' modifier added.
19 May: Fixed a bug with allele frequency computation on dosage data when sample filter(s) are applied.
18 May: Many categorical phenotype-handling flags (--within, --keep-cats, --split-cat-pheno, .) implemented. Basic phenotype-based filtering implemented (e.g. '--remove-if PHENO1 '' 2.5'; note that unnamed phenotypes are assigned the names 'PHENO1', 'PHENO2', etc., and that the '' and '' characters must be quoted in most shells). --write-covar implemented. --mach-r2-filter implemented, and raw MaCH r 2 values can be dumped with '--freq cols=+machr2'.
11 May: --condition[-list] + --covar bugfix.
8 May: Fix quantitative phenotype/covariate loading bug introduced in 6 May build.
7 May: --import-dosage implemented.
6 May: Fixed bug which caused '0' to be treated as control instead of missing for binary phenotypes. Minor change to --glm's column headers, in preparation for multiallelic data.
2 May: --score bugfix. --maj-ref bugfix. --vcf-min-dp and '--export A-transpose' implemented.
1 May: VCF dosage import/export, --vcf-min-gq, and --read-freq implemented. --score can now work with standard errors. --autosome[-par] now works properly. SNPHWE2 and SNPHWEX functions relicensed as GPL-2+, to enable inclusion in the HardyWeinberg R package.
20 April: .sample export bugfix (didn't work if file was over 256 KB and no phenotypes were present). --dummy implemented (can now generate dosages).
19 April: --hardy/--hwe chrX bugfix (thanks to Jan Graffelman for catching the problem and validating the fix). --new-id-max-allele-len now has three modes ('error', 'missing', and 'truncate'), and the default mode is now 'error' (i.e. --set-missing-var-ids and --set-all-var-ids now error out when an allele code longer than 23 characters is encountered, instead of silently truncating). --score implemented, and extended to support variance-normalization and multiple score columns (these two features provide a simple way to project new samples onto previously computed principal components).
11 April: --pca var-wts bugfix, and --pca eigenvalue ordering bugfix. --glm linear regression and --condition[-list] support added. --geno/--mind/--missing/--genotyping-rate can now refer to missing dosages instead of just missing hardcalls (note that, when importing dosage data, dosages in (0.1, 0.9) and (1.1, 1.9) are saved but there usually won't be associated hardcalls).
20 March 2017: Initial public release.
What's new?
Preservation of reference alleles (without requiring constant use of --keep-allele-order), phase information, and the VCF QUAL, FILTER, and INFO fields. Use --make-pgen instead of --make-bed when importing a VCF; the fileset can then be referenced with --pfile.
The new .pgen file format incorporates SNPack-style genotype compression, frequently reducing file sizes by 80+ with negligible computational cost. Note that this captures some major patterns that are missed by the usual general-purpose compression algorithms: our 1000 Genomes phase 3 downloads are 70+ smaller than the gzipped originals (and remain 45+ smaller after .pgen un-archiving), without throwing away any relevant information.
To allow users to take advantage of genotype compression without sacrificing compatibility with scripts expecting old-style .bim and .fam text files, PLINK 2.0 also supports a hybrid .pgen + .bim + .fam usage mode (--make-bpgen/--bpfile). We've also provided a Python library for reading and writing .pgen files, to simplify migration to the new format. (PLINK 1 .bed files are valid .pgen files, so code written on top of the library is backward-compatible.)
Firth regression. Standard logistic regression fails to converge, yielding 'NA' or nonsense results, when the 2x2 allele/phenotype contingency table has an empty cell ('quasi-complete separation'); this is common, and especially likely to happen with the strongest associations. Firth regression can prevent you from missing these associations. --glm's default 'firth-fallback' mode (only use Firth regression when there's either an empty contingency table cell or regular-logistic-regression convergence failure) gets you most of the benefit for a fraction of the computational cost.
Relatedly, --glm now provides a reason for each 'NA' result.
--glm linear regression is often hundreds of times as fast than PLINK 1.9 --linear. When multiple phenotypes with the same 'missingness pattern' are provided, the speedup can exceed 1000x, especially when imputation has been used to replace missing genotypes with dosages. (Note that mean-imputation of missing genotypes is deliberately not supported by --glm and many other plink2 functions: when filling in missing values can be justified at all, the dosage should come from your variant caller or modern imputation software.)
'--pca approx' (equivalent to EIGENSOFT 6+ fastmode with default parameters). If you have more than ten thousand samples, only need the top principal components, and can tolerate 1 error in the last PC, this can save you a ton of compute time.
The 64-bit Linux build can handle linear algebra on matrices with more than 2 31 elements (so regular --pca is no longer limited to 46000 samples), as long as your system has enough memory.
KING-robust kinship coefficients (--make-king, --make-king-table, --king-cutoff). These remain accurate when good population allele frequency estimates are unavailable. It still has limitations, but we have found --king-cutoff to be much more reliable than the PLINK 1.9 --rel-cutoff flag for general-purpose removal of close relations.
Proper support for dosages (decimal allele count expected values). When .gen/.bgen files are imported, hardcalls and dosages are saved to the .pgen. Operations which naturally extend to decimals (e.g. --pca, --glm, --freq, --maf/--mac) use the dosage information when it's present, while methods that can only make use of hardcalls (e.g. KING-robust, Hardy-Weinberg exact test) simply ignore the dosages. --hard-call-threshold can now be used to change the saved hardcalls without changing the dosages.
Much more multithreaded code.
Most commands let you control which columns appear in the main output file(s).
Broad support for both gzipped and Zstd-compressed text input files.
Graffelman and Weir's extended chrX Hardy-Weinberg exact test, which takes male allele frequencies into account. We've found that this tends to identify quite a few obviously-miscalled chrX variants which were not caught by the usual QC filters.
Oxford-style haplotype filesets can now be imported and exported (--haps, '--export haps[legend]').
Sample-major PLINK binary files can now be efficiently exported ('--export ind-major-bed'). This is 1000x as fast as the previous implementation (PLINK 1.07 --make-bed + --ind-major).
The relationship matrix (GRM) computation (as well as '--pca approx') now handles multiallelic variants properly, instead of just collapsing all minor alleles together.
--score allows each allele in a multiallelic variant to be assigned its own score. Coming next
Variant split/join.
Multiallelic dosage support.
Merge. (Once this is operational, a stable version of the .pgen specification will be provided, and PLINK 2.0 beta testing will begin.)
broken image