E coli Classification
STEC O157 strains of bovine origin (n = 102) that varied by source, and epidemiologically unrelated human clinical STEC O157 strains (n = 91) were used for polymorphism discovery (Additional data file 1) [, –]. Each strain was characterized as STEC O157 by an enzyme-linked immunosorbent assay using an O157 monoclonal antibody and multiplex PCR for stx1, stx2, eae, hlyA, rfb157 and fliC7 [–]. Additionally, each strain was genotyped for a polymorphism residing within the translocated intimin receptor gene (tir 255 T>A) . A total of 261 STEC O157 strains, 164 isolated from cattle and 97 isolated from human were targeted for genotyping of: 255T>A; 178 polymorphisms identified in this study; and PFGE (Additional data file 1).
DNA isolation
Genomic DNA was extracted from STEC O157 strains using Qiagen Genomic-tip 100/G columns (Valencia, CA, USA) and a modified manufacturer's protocol. Following overnight growth in 5 ml of Luria broth, bacteria were pelleted by centrifugation at 5, 000 × g for 15 minutes, re-suspended in Qiagen buffer B1 containing RNase A (0.2 mg/ml), and vortexed per the manufacturer's instructions. Importantly, the samples were then incubated at 70°C for 10 minutes, vortexed, and equilibrated at 37°C (failure to include the 70°C step frequently resulted in the columns becoming plugged and/or a significant decrease in DNA yield). Following the addition of 80 μl lysozyme (100 mg/ml), 100 μl proteinase K (Qiagen), and a 37°C incubation for 30 minutes, the DNAs were extracted and air dried per the manufacturer's protocol. Purified DNAs were suspended in 500 μl TE (10 mM Tris pH 8.0, 0.1 mM EDTA) and incubated for 2 hours at 50°C, followed by an overnight incubation at room temperature with gentle mixing. Strain DNA preparations were assessed by 260 nm/280 nm absorptions, which were determined with a NanoDrop Technologies ND-1000 spectrophotometer (Wilmington, DE, USA), and by gel electrophoresis.
STEC O157 DNA pools, GS FLX sequencing, and polymorphism identification
Three STEC O157 DNA pools were created for GS FLX sequencing and polymorphism discovery. One consisted of DNAs from 51 STEC O157 strains (3 μg/strain), all of cattle origin and all with the 255 T>A A allele. Another consisted of DNAs from 51 STEC O157 strains (3 μg/strain), all of cattle origin and all with the 255 T>A T allele. Another consisted of DNAs from 91 STEC O157 strains (3 μg/strain) originating from clinically ill humans, all with the 255 T>A T allele. Genomic libraries were prepared from each of the three DNA pools for Roche 454 GS FLX shot-gun sequencing according to the manufacturer's protocol (Nutley, NJ, USA). A total of 11 emulsion-based PCRs and sequencing runs were performed, three for the DNA pool of cattle origin, 255T>A A allele, three for the DNA pool of cattle origin, 255T>A T allele, and five for the DNA pool of human origin. SNPs were mapped to a reference sequence of STEC O157 (Sakai strain) and identified with Roche GS Reference Mapper Software (version 1.1.03).
Polymorphism genotyping
A file containing all targeted polymorphisms was prepared for assay design and multiplexing by MassARRAY® assay design software as recommended by the manufacturer (Sequenom, Inc., San Diego, CA, USA). A target of maximum 36 and minimum 21 polymorphisms per multiplex was set for design, with default settings for all other parameters. Seven multiplexes containing 225 polymorphisms were designed (average 32 polymorphisms per multiplex, range 21 to 36). Assays were performed using iPLEX Gold® chemistry on a MassARRAY® genotyping system as recommended by the manufacturer (Sequenom Inc.). Genotypes designated as high confidence by the Genotyper® software were accepted as correct; those with lower confidence (marked 'aggressive' in the software) were manually inspected. Replicate iPLEX assays and/or Sanger sequencing were used to verify genotypes.
Polymorphism-derived genotype analyses
The alleles of 178 polymorphisms were concatenated by physical order along the STEC O157 genome for 261 STEC O157 strains and aligned using Clustal X (version 1.83) . Redundant polymorphism-derived genotypes were identified using TreePuzzle (version 5.2) [, ], and removed from Clustal X alignments. Neighbor-joining and parsimony phylogenetic trees were generated using a collection of software programs in PHYLIP (version 3.65, Consense, DnaDist, DnaPars, Neighbor, Retree, Seqboot) . To construct a neighbor-joining tree, a distance matrix was first produced in DnaDist using an F84 distance model of substitution and a transition/transversion ratio of 2. The output of DnaDist was used to construct a neighbor-joining tree in Neighbor, which was mid-point rooted using Retree. Neighbor-joining bootstraps (1, 000) were determined with Seqboot, DnaDist, Neighbor, and Consense. A parsimony tree with 1, 000 bootstraps was generated with Seqboot, DnaPars (best tree thorough search) and Consense. Maximum-likelihood trees were generated in Tree-Puzzle (version 5.2) with 10, 000 puzzling steps and an HKY model of substitution. Neighbor-joining, parsimony, and maximum-likelihood trees were all viewed in TreeView (version 1.6.6) .
Haploview v 4.1 was used to identify a minimal set of polymorphisms (tagging polymorphisms) that distinguish each of the unique polymorphism-derived genotypes observed in this study. All 178 polymorphism genotypes were used to infer STEC O157 haplotypes in Haploview at a haplotype frequency threshold of 0% or higher. Neighbor-joining, parsimony, and maximum-likelihood trees were generated from concatenated tagging polymorphism genotypes using model assumptions identical to those used for the full genotype data sets. Additionally, a median-joining network was constructed in Network (version 4.5.0.2) for the concatenated tagging polymorphism genotypes.
Pulsed field gel electrophoresis
The standardized PFGE method was performed on 261 STEC O157 strains that were also targeted for SNP genotyping (Additional data file 1). Gel images were analyzed using Bionumerics (Applied Maths, Sint-Martens-Latem, Belgium), and banding patterns were clustered using an unweighted pair-group method with arithmetic mean algorithm and a band-based Dice coefficient. Default tolerance settings were used. No restriction enzymes additional to XbaI were used. Strains were assigned to the same PFGE group only if I banding patterns were indistinguishable.