Difference between revisions of "Text to CAS xml"
(upload converter script) |
Physchim62 (talk | contribs) |
||
(One intermediate revision by one other user not shown) | |||
Line 66: | Line 66: | ||
EOXML | EOXML | ||
</pre> | </pre> | ||
+ | |||
+ | [[Category:Chembox validation]] | ||
+ | |||
+ | {{CC-BY-SA-3.0 and GFDL-1.3}} |
Latest revision as of 10:40, 17 August 2009
Quick'n'dirty converter script I hacked up:
#!/usr/bin/perl use strict; use warnings; if (@ARGV != 1) { die <<"EODIE"; Usage: $0 filename.txt $0 filename.txt > filename.xml "filename.txt" is a tab-delimited text file, perhaps exported from Excel. The file is converted to xml format and printed on STDOUT. NB: the xml DTD is a guess based on commonChemMerge.10012008.xml The expected column layout is: 1: (ignored) 2: Name 3: CAS Number 4: Molecular Formula Any further columns are also ignored. Any row where the CAS Number field does not match the normal format for CAS format is omitted. EODIE } print <<'EOXML'; <?xml version="1.0" encoding="UTF-8"?> <CommonChemistryRecords> EOXML my $datafile = shift; if (open my $datafile_FH, '<', $datafile) { my $row = 0; local $/ = "\r"; while (defined($_=<$datafile_FH>)) { $row++; chomp; my %entry; @entry{qw/ x name cas mf /} = split /\t/; if (!defined $entry{cas}) { warn "Skip row $row: no CAS# defined\n"; next; } elsif ($entry{cas} !~ /^\d+-\d+-\d+$/) { warn "Skip row $row: '$entry{cas}' not valid CAS# format\n"; next; } print "<CommonChemistryRecord registryNumber=\"$entry{cas}\">\n"; print "<MolecularFormula>$entry{mf}</MolecularFormula>\n" if defined $entry{mf}; print "<NT1Name>$entry{name}</NT1Name>\n" if defined $entry{mf}; print "</CommonChemistryRecord>\n"; } close $datafile_FH; } else { die "Could not read $datafile: $!\n"; } print <<"EOXML"; </CommonChemistryRecords> EOXML
Error creating thumbnail: Unable to save thumbnail to destination | This page is currently licensed under both the Creative Commons Attribution–Share Alike 3.0 Unported license and the GNU Free Distribution License version 1.3 and any later versions of that license. |