Originally from http://www.openvpms.org/forum/dataload-bug-drops-species-when-there-are-large-number from Tim Gething:
Attached to this post are two setup xml files (renamed with .txt extensions so that they can be uploaded). They differ only in the order of the species data. setup-bad.xml has the species data in breed/species order (it starts at line 934 and ends at 2321). setup-OK.xml has the data in the same order, but with the L to Z part moved before the A to L part (again the data starts at line 934 and ends at 2322, with the order change at line 1588).
If you dataload the setup-bad.xml file, then the Breed lookup table contains the full data set. However, if you create or edit a patient and set the species to Dog or Cat, then the pull-down list of breeds is missing the A to G breeds. The other species which have far fewer breeds are OK.
If you use the setup-OK.xml file then all is good for the dogs - all 469 dog breeds are there. However, the cat breeds go from A to K - those after K are missing from the dropdown list. So my order swap did not do a full fix.
Note that before doing the dataload there is a full reset, ie the openvpms database is dropped and then recreated, and the archetypes loaded. Similarly the Tomcat service is restarted after the dataload before logging into OpenVPMS.
When you look at the xml files you will see that they contain most of the setup data required for the practice. I have not played with stripping the species/breed data into a separate file so that it can be tested independently of the other information.
I have also attached the log of the dataload - this does not change from the OK to the bad case.
After further playing & testing, I understand what is going wrong. Its true that all the breeds are added to the lookup list. However, the problem is that that some of the species targets are missing. ie the cats and dogs 'missing' from the pull down breed selection list have no species set for them.
ie the dataload program appears to be screwing up when there are a large number of lines that refer to the same "source" - ie when you have many breeds of the same species each with entries like: