0

I am working on a data integration project, where I need to extract data from oracle source and load it to XML file. The requirement is to get the list of customers and foreach customer create an xml file with customer data and its associated products and the product related data.The way I have designed it as follows.

  1. Using execute sql task get list of customers and hold it in object variable.

  2. Using foreach loop container loop through each customer.

  3. Inside the foreach loop, I have as script task to connect to oracle using odp.net and get the required data and write it to xml using xmlwriter class. The code inside the script task is below

-

 try
            {
                using (XmlWriter writer = XmlWriter.Create("FilePath" + customer + "_" + serviceType + ".xml", settings))
                {
                    //XML Header
                    writer.WriteStartDocument();
                    writer.WriteStartElement("ATOBTO"); //end
                    writer.WriteStartElement("ATOBTOSSet"); //end
                    writer.WriteStartElement("ATOBTHEADER"); //end
                    writer.WriteStartElement("ACTIONID");
                    writer.WriteString(DateTime.Now.ToString("yyyyMMddHHmmss"));
                    writer.WriteEndElement();
                    writer.WriteStartElement("ATOBTSOURCE");
                    writer.WriteString("MOI");
                    writer.WriteEndElement();

                    //Connect To Oracle DB using ODP.net Driver this driver buffers data hence requires minimum RAM usage
                    con = new OracleConnection(moiConnection);
                    con.Open();

                    // Execute PL/SQL For Customer---Only one row will be returned
                    OracleCommand cmdCustomer = con.CreateCommand();
                    cmdCustomer.CommandText = sqlCustomer;
                    OracleDataReader readerCustomer = cmdCustomer.ExecuteReader();

                    while (readerCustomer.Read())
                    {
                        writer.WriteStartElement("ATOBTCUSTOMER");

                        WriteFullElementString(writer, "CUSTOMERID", readerCustomer["CUSTOMERID"].ToString());
                        WriteFullElementString(writer, "CUSTOMERNUMBER", readerCustomer["CUSTOMERNUMBER"].ToString());
                        WriteFullElementString(writer, "CUSTOMERNAME", readerCustomer["CUSTOMERNAME"].ToString());
                        WriteFullElementString(writer, "CUSTOMERSINCE", readerCustomer["CUSTOMERSINCE"].ToString());
                        WriteFullElementString(writer, "CUSTOMERSOURCE", readerCustomer["CUSTOMERESOURCE"].ToString());
                        WriteFullElementString(writer, "SERVICEGROUP", readerCustomer["SERVICEGROUP"].ToString());
                        WriteSpec(writer, "ATOOBTCUSTOMERSPEC", "ATSPECIFICATIONNAME","SERVICECATEGORY", readerCustomer["SERVICECATEGORY"].ToString());
                        WriteSpec(writer, "ATOOBTCUSTOMERSPEC", "ATSPECIFICATIONNAME", "PORTALNUMBER", readerCustomer["PORTALNUMBER"].ToString());
                        WriteSpec(writer, "ATOOBTCUSTOMERSPEC", "ATSPECIFICATIONNAME", "SERVICE_CODE", readerCustomer["SERVICECODE"].ToString());
                        WriteSpec(writer, "ATOOBTCUSTOMERSPEC", "ATSPECIFICATIONNAME", "SERVICE_DESCRIPTION", readerCustomer["SERVICEGROUP"].ToString());

                        writer.WriteEndElement(); //ATOBTCUSTOMER
                    }
                    // Clean up CustomerCommand
                    readerCustomer.Dispose();
                    cmdCustomer.Dispose();


                    // Execute PL/SQL For CircuitCI --  100 to 1000's of rows can be returned
                    OracleCommand cmdCircuitCI = con.CreateCommand();
                    cmdCircuitCI.CommandText = sqlCircuitCi;
                    cmdCircuitCI.InitialLOBFetchSize = -1;
                    OracleDataReader readerCircuitCI = cmdCircuitCI.ExecuteReader();
                    readerCircuitCI.FetchSize = cmdCircuitCI.RowSize * 100;

                    //Write CircuitCI XML
                    while (readerCircuitCI.Read())
                    {
                        writer.WriteStartElement("ATOBTCI"); //end
                        WriteFullElementString(writer, "CICLASSIFICATION", "6103");
                        WriteFullElementString(writer,"CINUM", readerCircuitCI["CINUM"].ToString());

                        //CLOB DATA--Call method to clean up invalid xml charecters
                        string notes = CleanInvalidXmlChars(readerCircuitCI["notes"].ToString());
                        WriteFullElementString(writer, "CIDESCRIPTION_LONGDESCRIPTION", notes);
                        WriteFullElementString(writer, "CUSTDEV", readerCircuitCI["CUSTDEV"].ToString());
                        WriteFullElementString(writer, "INSTALLEDDATE", readerCircuitCI["INSTALLEDDATE"].ToString());
                        WriteFullElementString(writer, "STATUS", "OPERATING"); 

                        // Execute PL/SQL For ServiceAttributes -- Less than 10 to 30 rows for each serviceid 
                        OracleCommand cmdServAttr = con.CreateCommand();
                        cmdServAttr.CommandText = sqlServiceAttr + " AND SERVICEID=" + readerCircuitCI["SERVICEID"].ToString();
                        OracleDataReader readerServAttr = cmdServAttr.ExecuteReader();
                        readerServAttr.FetchSize = cmdServAttr.RowSize * 100;

                        //Write ServiceAttributes XML
                        while (readerServAttr.Read())
                        {
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", readerServAttr["Name"].ToString(), readerServAttr["Value"].ToString());
                        }

                        // Clean up ServAttrCommand
                        readerServAttr.Dispose();
                        cmdServAttr.Dispose();

                        // Execute PL/SQL For CircuitAttributes  Less than 10 to 30 rows for each circuitid
                        OracleCommand cmdCircAttr = con.CreateCommand();
                        cmdCircAttr.CommandText = sqlCircuitAttr + " AND CIRCUITID=" + readerCircuitCI["CIRCUITID"].ToString();
                        OracleDataReader readerCircAttr = cmdCircAttr.ExecuteReader();
                        readerCircAttr.FetchSize = cmdCircAttr.RowSize * 100;
                        //Write CircAttr XML
                        while (readerCircAttr.Read())
                        {
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", readerCircAttr["Name"].ToString(), readerCircAttr["Value"].ToString());
                        }

                        // Clean up CircAttrCommand
                        readerCircAttr.Dispose();
                        cmdCircAttr.Dispose();

                        // Execute PL/SQL For ContractSLA -- One row for each serviceid
                        OracleCommand cmdContractSLA = con.CreateCommand();
                        cmdContractSLA.CommandText = sqlContractSla + " AND SERVICEID=" + readerCircuitCI["SERVICEID"].ToString();
                        OracleDataReader readerContractSLA = cmdContractSLA.ExecuteReader();
                        readerContractSLA.FetchSize = cmdContractSLA.RowSize * 100;
                        //Write ContractSLA XML
                        while (readerContractSLA.Read())
                        {
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "HDCONTROL", readerContractSLA["HDCONTROL"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "MTTRTARGET", readerContractSLA["MTTRTARGET"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "RURALMTTR", readerContractSLA["RURALMTTR"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "SLAINFOURL", readerContractSLA["SLAINFOURL"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "SLASIGNED", readerContractSLA["SLASIGNED"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "UPDATEINTERVALMINUTES", readerContractSLA["URBANMTTR"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "URBANMTTR", readerContractSLA["MTTRTARGET"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "SEGMENT", readerContractSLA["MTTRTARGET"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "STATUS", readerContractSLA["MTTRTARGET"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "CONTRACTTYPEKEY", readerContractSLA["MTTRTARGET"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "DESCRIPTION", readerContractSLA["MTTRTARGET"].ToString());
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "CONTRACTNUMBER__CONTRACT_NAME_", readerContractSLA["MTTRTARGET"].ToString()); //CONTRACTCOVERAGE
                            WriteSpec(writer, "ATSPECIFICATION", "ATSPECIFICATIONNAME", "CONTRACTCOVERAGE", readerContractSLA["MTTRTARGET"].ToString());
                        }

                        // Clean up ContractSLA Command
                        readerContractSLA.Dispose();
                        cmdContractSLA.Dispose();

                        writer.WriteEndElement();  //Close Root XML Element
                    }

                    // Clean up CircuitCI Command
                    readerCircuitCI.Dispose();
                    cmdCircuitCI.Dispose();

                } // Dispose XMLWriter
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.Message);
                Dts.Events.FireError(18, "Error in Creating XML", ex.Message, "", 0);
            }

            finally { con.Dispose(); } //Close DB Connection

            Dts.TaskResult = (int)ScriptResults.Success;
        }

            public static void WriteFullElementString(XmlWriter writer, string localName, string value)
            {
                if (value != "" && value!=null)
                {
                    writer.WriteStartElement(localName);
                    writer.WriteString(value);
                    writer.WriteFullEndElement();
                }
            }

            public static void WriteSpec(XmlWriter writer, string specName, string localName, string localValue, string value)
            {
                if (value != "" && value != null)
                {
                    writer.WriteStartElement(specName);
                    writer.WriteElementString(localName, localValue);
                    writer.WriteElementString("VALUE", value);
                    writer.WriteFullEndElement();
                }
            }



        //Method to clean Invalid XML Charecters
        public static string CleanInvalidXmlChars(string text)
        {
            string regex = @"[^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u10000-\u10FFFF]";
            return Regex.Replace(text, regex, "");
        }

The problem I am facing with this approach is when the customer data is huge(about 200mb), the script task is very very slow, potentially takes more than 5 hours.

I have tried to optimize the code as much as possible with little to no improvement in performance. Is this the right approach for this project or is there any other efficient solution for this?

Ewan
  • 70,664
  • 5
  • 76
  • 161
sab
  • 109
  • 1

1 Answers1

0

Anytime you loop though a datareader and within that loop you run another select and lop through that. You are not being optimal with your performance.

However, SSIS is probably also not helping you here. have you got another SSIS task before this which selects all the customers and runs this once for each?

Your best bet is to take this code and turn it into a simple stand alone program, lets just say a console app for simplicity.

Make it connect and get all the info for a single customer back from the database. Once it has the data, disconnect from the db and write the file.

Now you just need an extra table to keep track of which customers have been processed and which are still to be worked on. Use this to allow you to run your program on two computers at the same time. Add a bit of code at the start to grab a customer than hasn't been processed yet and a bit at the end to mark it as complete and to send the file to some central ftp site.

Once you have that you can now run this program on every computer you have access to. Your database will run at 100% but it will only be answering queries, not writing XML and that's what it does best.

The load of looping through the data and writing XML files will be spread over all those computers. You can do some tests to see how many instances is the best number to run on a single computer.

Ewan
  • 70,664
  • 5
  • 76
  • 161
  • thanks for your reply.Yes, there is a forachloop contaier in SSIS and the sctipt task is inside the foreach loop. Are you saying if I use the same code in a console app it will perform better? – sab Sep 16 '18 at 17:02
  • it will be more scaleable. ie. you will be able to run it on more computers at the same time. – Ewan Sep 16 '18 at 17:49
  • How exactly? I will have a console app deployed in one server which will do all the operations isnt it?Could you please elaborate what you mean by one or more computers at the same time? sorry if it didn't get it right – sab Sep 16 '18 at 18:00
  • read through my answer again more slowly "Use this to allow you to run your program on two computers at the same time" – Ewan Sep 16 '18 at 18:11
  • so my understanding is, deploy the console app in as many machines as possible and have a control table to identify which customer has been processed and which customers not. Am I correct, is this what you meant? – sab Sep 16 '18 at 18:26
  • yes. obviously its a simplistic approach, you could improve things by making it a service and introducing a MQ. Also you are ignoring the part about reading all the data first rather than having selects in loops – Ewan Sep 16 '18 at 18:48