Arabidopsis Bay-0 × Sha SFP Haplotype Map



Supplemental Data to the paper:
Marilyn A.L. West, Hans van Leeuwen, Alexander Kozik, Daniel J. Kliebenstein, R. W. Doerge, Dina A. St.Clair, Richard W. Michelmore 2006.
High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis. Genome Research 16: 787-795, PubMed link PubMed - National Library of Medicine.



Summary

We detected 1257 Single Feature Polymorphisms (SFPs) between the Arabidopsis thaliana accessions Bay-0 and Shahdara, using the data of the microarray analyses as we described in our paper (West et al. 2006). These SFPs were used for genotyping of 148 RILs from the Bay-0 × Shahdara population. The 1257 SFPs were detected in 968 genes. We selected one SFP marker per gene and then performed a clustering analysis that resulted in a set of non-redundant SFP markers. The final SFP map contains 599 SFP markers plus the 38 microsatellite core markers from the Bay-0 × Shahdara RIL population (Loudet et al. 2002).





Methods and Results


The general outline of the SFP detection and genotyping process is illustrated in Figure 1. Click on the image to see a bigger version.

SFP detection and genotyping method
Figure 1. SFP detection and genotyping method.


Part A. SFP detection, genotyping of the RIL population and filtering of the SFP markers.

The Perl script SFPscanV10.pl (http://elp.ucdavis.edu/data/analysis/sfp_map/SFPscan.html) was used to detect SFPs between Bay-0 and Shahdara, and to genotype the 148 RILs, using the default options and the following input files:

1. A conversion file with the Affymetrix probeset IDs and the AGI gene codes (ATH1_conv-probeset-genecode.txt).

2. The files with the RMA background corrected microarray data for the individual PM probes. These files are created with R and Bioconductor packages using the R script BioC_getBGPM.r, the following input file chips.txt, and the original .CEL files.

The obtained files should have a format similar to:
"471.CEL"
"244901_at1",363.582352941176
"244901_at2",124.111764705882
 ...etc
3. A file with the experimental setup, first column is RIL code, next columns have replicates chip codes (RILexp_setup_148_SW.txt).

4. A file with a list of chip files that will be used in this analysis. This file should contain both the RIL chip file names and the parental chip file names (chips_148_SW.txt).

5. Two files with the chip file names of the parental chips: parental_A_chips.txt, parental_B_chips.txt.


A more detailed explanation of the SFPscan script can be read at: http://elp.ucdavis.edu/data/analysis/sfp_map/SFPscan.html.


The 1258 detected SFP markers were evaluated based on their genotyping scores for the 148 RILs (SFPmap_1258_markers.txt). An SFP marker was discarded if: 1) more than 10% of the RILs could not be scored (missing data), 2) the allele frequency distortion was higher than 4 to 1. This filtering step was done automatically by the SFPscanV10 script, and resulted in 1257 filtered SFP markers (SFPmap_1257_markers.txt). Next, for the probesets/genes for which multiple SFP markers had been obtained, we selected the marker with the lowest percentage of missing data, resulting in a set of 968 SFP markers (SFPmap_968_markers.txt).



Part B. Clustering of the markers and selection of frame work markers.

The set of 968 SFP markers (SFPmap_968_markers.txt) was analysed for redundancy by performing a clustering analysis with the MadMapper software (http://cgpdb.ucdavis.edu/XLinkage/MadMapper/, Python_MadMapper_V248_RECBIT_007.py). For the clustering analysis the 38 microsatellite markers (Loudet et al. 2002; MS_markers.txt) were added to the 968 SFP markers. The output file with the x_tree_clust extension (SFPmap_968_SFPmarkers_plus_38_MSmarkers.out.x_tree_clust) was evaluated to determine the number of non-redundant bins. Due to missing data, in some cases the exact number of bins can't be determined. These cases are marked as linked groups. Through manual inspection of the allele calls in the linked groups we could determine the minimum number of bins for each linked group and used that number in our calculations. The clustering analysis showed 563 bins of non-redundant markers. Next, two frame work markers with the lowest percentage of missing data were selected from multiple marker bins (if a marker with 0% missing data was present in a bin then only that marker was selected). The microsatellite markers were always selected. This resulted in a set of 599 SFP markers (SFPmap_599_markers.txt). The integrated map with the 599 SFP markers and the 38 microsatellite markers contains 637 markers (SFPmap_637_markers.txt).


Part C. Calculation of genetic distances.

The integrated set of 637 markers was analysed with JoinMap 3.0 (Van Ooijen and Voorrips, 2001) to group the markers into linkage groups and calculate genetic distances between the markers. The following input file was used: SFPmap_637_markers_jm.loc. The default JoinMap options were used. The result can be seen in Table 1.


Table 1. Characteristics of an integrated genetic map derived from the Bay-0 × Shahdara RIL population (148 RILs).
Markers1 Size (cM) Marker density (cM) Maximum gap (cM)
Linkage Group 1 153 (144+9) 94.6 0.62 4.22
Linkage Group 2 92 (85+7) 67.2 0.74 4.12
Linkage Group 3 131 (125+6) 73.2 0.56 4.25
Linkage Group 4 110 (102+8) 74.0 0.68 4.77
Linkage Group 5 151 (143+8) 100.4 0.67 7.21
All linkage groups 637 (599+38) 409.4 0.64 7.21
1 Between parentheses the number of our SFP markers, and microsatellite markers (Loudet et al. 2002), respectively.



Part D. Data Visualization

The genotype scores of the 148 RILs for the five linkage groups were used to calculate pairwise distances between markers using the MadMapper software (http://cgpdb.ucdavis.edu/XLinkage/MadMapper/, Python_MadMapper_V248_RECBIT_007.py). The CheckMatrix software (http://cgpdb.ucdavis.edu/XLinkage/MadMapper/, py_matrix_2D_V248_RECBIT.py) was then used to create a graphical genotyping map and a heat map of linkage values.

For the visualization of all five linkage groups together, physical positions of the genes corresponding to the SFPs were obtained from the Arabidopsis annotation Version 4, TIGR release May 2003 (positions per chromosome, positions in genome). Physical positions of the 38 reference microsatellite markers previously mapped in this RIL population by Loudet et al. (2002) were obtained by BLASTing the PCR primer sequences of each microsatellite marker against the Arabidopsis annotation Version 4, TIGR release May 2003 (positions per chromosome, positions in genome). For the visualization of the individual linkage groups, genetic distances were obtained as described above in Part C.


Order of the markers according to genetic maps constructed with JoinMap:
CheckMatrix heat map of linkage values Graphical genotype map of RILs1 Circular map of linkage between markers2
Linkage
Group 1
Linkage
Group 2
Linkage
Group 3
Linkage
Group 4
Linkage
Group 5
1The last four columns are two Sha and two Bay-0 controls, respectively.
2For the circular maps a linkage cutoff value of 0.9 was used.



Order of the markers according to the physical order of the genes in the Columbia genome:
CheckMatrix heat map of linkage values Graphical genotype map of RILs1 Circular map of linkage between markers2
All five
linkage
groups
1The last four columns are two Sha and two Bay-0 controls, respectively.
2For the circular maps a linkage cutoff value of 0.9 was used.



Part E. Data files

Table 2. Data files of haplotypes and genetic distances of the high-resolution map of the Bay-0 × Shahdara RIL population (148 RILs).
Haplotypes1 Genetic Distances (cM)2
Linkage Group 1 SFPmap_637_At1_genotypes.txt SFPmap_637_At1_map.txt
Linkage Group 2 SFPmap_637_At2_genotypes.txt SFPmap_637_At2_map.txt
Linkage Group 3 SFPmap_637_At3_genotypes.txt SFPmap_637_At3_map.txt
Linkage Group 4 SFPmap_637_At4_genotypes.txt SFPmap_637_At4_map.txt
Linkage Group 5 SFPmap_637_At5_genotypes.txt SFPmap_637_At5_map.txt
All linkage groups SFPmap_637_genotypes.txt SFPmap_637_map.txt3
1 First column contains marker ID's. First row contains column numbers, second row contains Loudet's RIL numbers. "A" is the Shahdara genotype, "B" is the Bay-0 genotype, and "-" is missing data.
2 Genetic distances calculated with JoinMap 3.0, see details in Part C. First column contains marker ID's; second column contains genetic distances (cM).
3 Physical distances, see details in Part D. First column contains marker ID's; second column contains physical distances (Mb).






References

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402.

Loudet, Chaillou, Camilleri, Bouchez and Daniel-Vedele, 2002. Bay-0 x Shahdara recombinant inbred lines population: a powerful tool for the genetic dissection of complex traits in Arabidopsis. Theoretical and Applied Genetics, vol 104 (6-7), pp 1173-1184. http://www.inra.fr/qtlat/BayxSha/index.htm.

Van Ooijen, J.W. & R.E. Voorrips, 2001. JoinMap® 3.0, Software for the calculation of genetic linkage maps. Plant Research International, Wageningen, the Netherlands. http://www.kyazma.nl/index.php/mc.JoinMap.

Marilyn A.L. West, Hans van Leeuwen, Alexander Kozik, Daniel J. Kliebenstein, R.W. Doerge, Dina A. St.Clair, Richard W. Michelmore 2006. High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis. Genome Research 16: 787-795, PubMed link PubMed - National Library of Medicine.



Last modified: June 9, 2006.

email to: hvanleeuwen@ucdavis.edu Hans van Leeuwen (Data analysis, SFP detection and marker development, Web design)

email to: mlwest@ucdavis.edu Marilyn West (microarray experiments)

email to: akozik@atgc.org Alexander Kozik (MadMapper and CheckMatrix software)