Hi All
I am embarking on a Data Migration project which involves migrating data from one I-Series database application to a different I-series database application. I guess as with all migrations the data does not excatly match in both appplications. I have identified that i need to populate Excel files with all the relevant data from our legacy system, once this has been done then the users of our application will be adding data to these excel files which is required by our new application. Once these Excel files have been fully populated i then need to upload to the new application,
I guess my question is have any of you gone through this kind of process, and if so how did you manage it.
One problem i can see is when i bring the data down to excel from legacy system, Excel does not keep the formats of each field, this would mean i need to specify the format for eac field again in Excel before uploading to new application.
Thanks in advance for any help
Software/Hardware used:
ASKED:
September 5, 2006 4:02 AM
UPDATED:
September 11, 2006 2:18 PM
When you export or import data to and or from the iSeries, a file is created that ends with the extension .FDF. this is a definition of the Database that exists on the iSeries. Another file is also created if you request it that is either a .dtf from iSeries) or a .dtt (To iSeries) file. These files make sure that the data in the file that your are importing/exporting to/from the iSeries makes sure that the data is in the fight format for the database that you are using. I sould suggest looking into iSeries Client Access Expres as it explains this process in more detail there.
That is also my problem in putting data to excel. Most especially if the data type is date. It becomes a number if you will put to excel. What I did is converting all date data type to string or text before putting data to excel and it works.
Im not sure if this strategy is also applicable to your problem.
I’m curious why the database format you’re moving your data to needs to be in excel. Excel is not a database, it’s a spreadsheet. I personally have never experienced a excel application which , when analyzed, could not have actually used a real database as the backend. Sorry, I know this doesn’t solve your problem but I’ve been designing and implementing applications for over 15 years and I think your application designers should reconsider. My $0.02.
My 2 cents worth.
1. Excell has a row limit. If you exceed it users will not be able to add data to the worksheet.
2. I have done several splits/ merges with other companies as many odf you probably have.
I would move the new files into a test area.
I would then have the users create cross references in excell that have the knowlege of what extra data that is needed in the current systems files.
I would upload the cross references.
You then write a program that reads the new files and adds the new data, from cross references, into a copy of your current systems file in a test area.
Test the data and the system in test.
IF all goes well copy add the new file to your current system.
I know i have simplified this but this is how I have done this in the past.
Hope this helps
I highly recommend that you make at least one change to your strategy:
instead of following the route:
-old datastructure – production platform (iSeries)
-data entry tool and database (Excel or hopefully something better)
-data entry of new and changed data
-new datastructure (back to iSeries)
that you take this route:
- old datastructure – production platform (iSeries)
-programmatic ETL of data
- new datastructure – production platform (iSeries) in a temporary
workspace
- transfer only necessary data to separate database for data
entry
- data entry tool and database (Excel or hopefully something better)
- transfer the new data from the data entry tool database to
the respective columns in the new struction
- new datastructure – production platform (iSeries) in a temporary
workspace
- now contains data from old datastructure and new/
transformed data through data-entry.
- procedure to move the new datastructure, when verified, from
temporary to production space.
This looks like more steps and more work, but if you add the I.T. work and
the Data Entry work, and the time you will save by preserving as much of the
original formatting and data as possible, and using programs to do as much
of the transformation as possible, it will, in the end, save time.
It is also: -saveable and repeatable
-safer
-provides a guide and tools for future such migrations
My experience in database migration is with Oracle, but, except for the
specific tools involved, the basic steps should be similar.
1. MAPPTING
Map the current data structure to the new data structure.
a. Specify the correspondence between old and new tables
b. Specify the columns to be transfered
i. Columns that remain the same, data and format
` ii. Columns that whose data remains the same, but format
differs
iii. Columns whose data and format is altered
iv. Columns that are new
c. The first 2 (b.i. and b.ii.) can all be transfered to the new
structure programmatically. Of the 2nd 2 (b.iii. and b.iv.) you
need to categorize them into:
i. Those which can be altered or assigned programmatically
(i.e. there is enough data for a program to figure out
what data should be added for each record.)
ii. Those which can be partially assigned programmatically
(i.e. there is enough data for a program to figure out
what data should be added for either
-MOST RECORDS or
-MOST OF THE DATA in a record.)
iii. The data which must be input by a user.
2. STRATEGY AND PREPARATION
a. Strategy
i. Decide how the migration is going to occur:
A) How much time do you have: how much time can the
production data be off-line.
B) Work with your Data Administrator to find out how often
and how easily you can get extracts of current data
C) Save all your mapping, extracting, transormation and
loading programs, scripts and settings so you can do any
number of practice runs before actually loading the data
into the new, live, production structure.
D) CREATE NEW STRUCTURE IN TEMPORARY WORKSPACE
Create the new data structure with production tools
(in your case, an iSeries database) but in a temporary
location.
ii. Use your new structure in your temporary workspace on the steps
below until all the kinks are worked out. This is essential as it
gives you the luxury of “redoing” as often as needed, as well as
the PROTECTION of production data against errors.
3. ETL -Extract, transform and load
Use export/import tools along with programming tools (data modification or
ETL tools) to take the data from current production and import it into the
new structure including all the new/changed columns that can be changed/
inserted/loaded or partially filled as analyzed in MAPPING, 1.c.i and ii.
a. Create/assign new data programmatically
- Example of programmatic assigning (1.c.i): if there is a new field
ZIP CODE, and the zip code is currently part of another field, your
programming script or ETL tool can look for the zip code in the other
field and insert it into the new field
b. Alter/reformat new data programmatically
- Example of programmatic altering (1.c.i): if there is a new format
on a field, often exporting the old value (a number for example)
and simply loading it into the new format (a different number
datatype, for example) is sufficient. In those cases where the new
format is not compatible with the old data, programmatic alteration
of the data can be used.
c. Partially alter/assign new data
- Example of partial programmatic loading (1.c.ii): If there is
NOTHING in the CURRENT data that you ETL tool or data alteration
program can use to figure out the new data, IN SOME CASES, you can
still insert some data programmatically when:
i. Much of the data is repeated. For example, if you have a
new field “city” that you can’t derive from existing data,
but say 90% of your addresses are in “San Antonio”, you
can insert “San Antonio” into all the records, and make
note of the field as one that requires user intervention.
ii. Part, but not all, of the data can be assigned
programmatically, or the program can make a good guess.
Examples of these are:
A) Incomplete data is either known or exists in the
current database. Say the area code and PBX is
known, but not the extension. The column could be
populated with (XXX) XXX-, and a note made for the
data-entry phase to fill these out.
B) Addresses: Programs can parse old address columns
to fit new address columns but it is a complex
enough issue that it is unlikely that the
transformation will be 100% accurate without spending
far too much time on the parsing code. Again, that
column would be added to the list of those requiring
data-entry intervention.
4. Data-entry
Once the programmatic transfer of data from the old datastructure to the new
datastructure is complete, set up the tools necessary for the data entry of
new/transformed data that could not be completely done programmatically
(3.c. above)
a. Load the data from your temporary workspace created in step 3 into
temporary tables in whatever the best tool for data-entry you have is.
If it is Excel, so be it, but if you have forms or tools that use
iSeries databases on their back-end natively, that would be ideal.
MS Access is also better, as it has ODBC capability that can access
your tables in their native format directly.
b. Create forms or format the spreadsheets so that the data-entry
personnel can only edit/add the data that is required of them.
c. Also provide as strict an edit (typing or formatting or otherwise
creating edit rules for the columns to be entered)
d. Once data entry has been completed, load the data back up into the
iSeries columns in the new structure in your temporary work space.
5. Production
Once the data in your temporary workspace is verified, it can be moved into
production. Not that the best thing about this process is that all the steps
completed thus far can be repeated, reprogrammed, revised etc. as many times
as necessary to get the data right, BEFORE any production data is affected.
Data formatting is NEVER LOST, as the data always remains mapped to a column
in your new data structure in your temporary iSeries work space, whether it
is in there or whether it is awaiting up load from a data-entry source.
I hope this at least gives you an idea or two that will be helpful.
If there’s anything I’ve mentioned that you have questions on, please ask and
I will be glad to clarify.
If you have any IBM/iSeries/DB2-specific questions, those are best directed to
others, as my experience is with UNIX/Oracle.
Best wishes in your project.