Using LINQ to XML in poly-hierarchical tree structures: A programmatic approach
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Using LINQ to XML in poly- hierarchical tree structures: A programmatic approach Jonathan Sexton, Data & Services Developer, UK Data Archive 17 April 2014 V1.0
UK Data Archive • based at the University of Essex since 1967 • curator of the UK’s largest collection of digital data in the social sciences • currently holds nearly 6,000 data collections for research and teaching, both quantitative and qualitative • certified to ISO 27001, the international information security standard • makes these available via the new UK Data Service Website: www.data-archive.ac.uk 2
UK Data Service • the UK Data Service indexes all data collections in the Archive – all catalogued at thematic level – many indexed at variable level • also harvests metadata from other sources • all are available for download via Discover search-and-browse catalogue: discover.ukdataservice.ac.uk 3
UK Data Service • led by experts at University of Essex along with colleagues at Manchester, Leeds, Southampton, Edinburgh and UCL • also provides access to UK Census data (1971 to 2011) • source of guidance, training, and support for data users in UK and around the world • currently serve approx. 24,000 registered users • newly funded to coordinate the Administrative Data Research Network, part of UK’s Big Data strategy Websites: ukdataservice.ac.uk, census.ukdataservice.ac.uk 4
Lesson Objectives To discuss the method used to navigate a poly- hierarchical tree structure, in XML, using C# and LINQ to XML, in order to provide the calling application with the data it requires, given an XML object to navigate in and a valid identifier within it. 5
Scenario • An application allows a user to add a ‘suggestion’ to a tree view based structure, in this particular case a Thesaurus for the Social Sciences (ELSST). • The tree view is poly-hierarchical, in that identical terms may appear in several places in the structure. • The data for the tree view is stored in an XML file, stored in the user’s session. • The application and its supporting architecture is written in .NET v4.5 (C#). 6
The XML (a snippet of the data) Top/Broader Term Identifier Attributes for Term Narrower Term 7
The tree view in action in the application A user has performed a search on ‘communication skills’ as a Preferred Term in the thesaurus, and the results are as shown. We can clearly see two instances in the hierarchy, under different ‘Top Terms’, ‘Ability’ and ‘Communication’. 8
Requirement • When the user is making a ‘suggestion’ they may wish to alter the position in the hierarchy that the term in scope appears. • To do this three dropdown lists are presented to them, valid ‘broader’, ‘narrower’ and ‘related’ terms, each appertaining to the term in scope. • Given the full XML data and the identifier of the term in scope, calculate the content of these lists. 9
Design • We need a method that, given the full XML (as a System.Xml.Linq.XElement object) and the identifier (as a string), will populate three lists (representing valid ‘broader’, ‘narrower’ and ‘related’ terms) and return them as an IEnumerable List of strings (a List of a List of strings). • We will need to use the System.Xml.Linq library to enable us to easily navigate through the passed XElement object. • We will need to implement our method into an existing UK Data Archive DLL, namely our UKDA.Utility.Library component, in the DataAccess/XmlFileReader class. 10
Implementation (1 of 12) • OK, our new method is going to be: public static IEnumerable GetRelevantTermsLists( XElement xElementOriginal, string cid) { NOTE: An XElement object, when passed in as a parameter, will be by reference, not • And let’s get something to work with: by value, therefore we need to make a const string Delimiter = "|"; copy before we do any manipulation on it. The identifier (‘cid’) is OK as that’s a string const string ClassString = "class-"; and is hence passed in by value. var xElement = new XElement(xElementOriginal); var el = xElement.DescendantsAndSelf("element").Where( nm => string.Equals(nm.Element("attr").Element("class").Value, cid, StringComparison.CurrentCultureIgnoreCase)); var btList = new List(); var ntList = new List(); var rtList = new List(); var xElements = el as IList ?? el.ToList(); 11
Implementation (2 of 12) • So now let’s put some meat on the bones: firstly we should get the ‘broader’, ‘narrower’ and ‘related’ terms in our xElements object and store them in our declared lists… foreach (var e in xElements) { var bts = e.DecendantsAndSelf("element") .First() .Elements(“attr“) .Elements(“BTs") .Elements(“BT"); btList.AddRange( bts.Select(bt => bt.Attribute(“BTLex").Value + Delimiter + ClassString + bt.Attribute(“BTID").Value; • And so on for the ‘narrower’ and ‘related’ terms too. 12
Implementation (3 of 12) • So far so good, all pretty easy stuff, but now let’s crank-up the gameplay a little. • We’re now going to have to ‘walk’ up our XElement object, looking for parent nodes to add to an exclusions list (because we’re not going to want these to be returned from the method). • We’re going to need two, and only two, iterations. The first iteration will get us all the instances of ‘self’, whereas the second will bring back all the ancestors. • The following code snippet will hopefully shed some light onto what we’re trying to achieve here 13
Implementation (4 of 12) var exclusionsList = new List(); for (var i = 0; i < 2; i++) Only TWO iterations are required. { var pt = el.Elements("data"); var ptId = el.Elements("attr").Elements("class"); var elements = pt as IList ?? pt.ToList(); var ptArray = pt as XElement[] ?? elements.ToArray(); Add our data into our ‘ptArray’ variable in the form of: var id = ptId as IList ?? ptId.ToList(); “DATA ANALYSIS|class-E72A058B-8A32-E311-93C1-000BDB5CC6D5”. var ptIdArray = ptId as XElement[] ?? id.ToArray(); for (var j = 0; j < ptArray.Count(); j++) { if (!ptArray[j].Value.Contains(Delimiter)) { ptArray[j].Value += Delimiter + ptIdArray[j].Value; } } var xElements1 = pt as IList ?? ptArray.ToList(); Break out of the outer loop if we encounter an empty list. if (!xElements1.Any()) { break; } foreach (var p in xElements1.Where(p => !exclusionsList.Contains(p.Value))) { Loop around and only put in items to exclude if they don’t exclusionsList.Add(p.Value); already exist. } el = el.Ancestors(); } 14
Implementation (5 of 12) • Still with us? OK, so now we should consider getting the ‘child’ nodes for removal, thus: var elementsToRemove = from a in xElement.DescendantsAndSelf("element").Where( s => string.Equals( s.Element("attr").Element("class").Value, cid, StringComparison.CurrentCultureIgnoreCase)) select a; 15
Implementation (6 of 12) • Let’s now add this data into our exclusions list, thus: foreach (var element in elementsToRemove) { var v = from child in element.Descendants("element") select new { DataToExclude = child.Element("data").Value + Delimiter + child.Element("attr").Element("class").Value }; exclusionsList.AddRange(v.Select(x => x.DataToExclude)); } 16
Implementation (7 of 12) • We now need to jump back to our original XElement, the one we were originally passed, thus: var allEles = xElementOriginal.DescendantsAndSelf("element"); • And using this object get all the remaining nodes into a big list of strings, thus: var bigList = (from ele in allEles select ele.Element("data").Value + Delimiter + ele.Element("attr").Element("class").Value into itemToAdd let badFlag = exclusionsList.Any(badEle => badEle == itemToAdd) where !badFlag select itemToAdd).ToList(); 17
Implementation (8 of 12) • Let’s now get rid of any duplicates, thus: var distinctBigList = bigList.Distinct().ToList(); btList = btList.Distinct().ToList(); ntList = ntList.Distinct().ToList(); rtList = rtList.Distinct().ToList(); • Pretty simple. We’re now ready to populate our lists that we’ll be returning… 18
Implementation (9 of 12) • We will add the items to our lists, for ‘broader’, ‘narrower’ and ‘related’ terms, thus: foreach (var item in distinctBigList.Where(item => !btList.Contains(item))) { btList.Add(item); ‘broader’ terms. } foreach (var item in distinctBigList.Where(item => !ntList.Contains(item))) { ntList.Add(item); ‘narrower’ terms. } foreach (var item in distinctBigList.Where(item => !rtList.Contains(item))) { rtList.Add(item); ‘related’ terms. } 19
Implementation (10 of 12) • Nearly there. We must now remove the item that was clicked on by the user in the application (we won’t be needing this), thus: foreach (var bt in btList.Where(bt => bt.Contains(cid))) { ‘broader’ terms, there’ll only be one so once btList.Remove(bt); we’ve removed it, exit the loop. break; } foreach (var nt in ntList.Where(nt => nt.Contains(cid))) { ntList.Remove(nt); ‘narrower’ terms, there’ll only be one so once we’ve removed it, exit the loop. break; } foreach (var rt in rtList.Where(rt => rt.Contains(cid))) { ‘related’ terms, there’ll only be one so once rtList.Remove(rt); we’ve removed it, exit the loop. break; } 20
Implementation (11 of 12) • Finally, populate a container object with our nice new lists and return it. var containerList = new List(); containerList.Add(btList); containerList.Add(ntList); containerList.Add(rtList); return containerList; } End of the method, yay! 21
Implementation (12 of 12) • And of course, to make all of the code in the previous slides actually work, we’re going to need to include the following .NET class libraries in the class our method sits in: • System • System.Collections.Generic • System.Globalization • System.IO * • System.Linq • System.Text * • System.Xml • System.Xml.Linq * Used for the WriteOutList method, used for testing (next slide). 22
Testing (1 of 2) • It was found to be quite useful to employ a simple private method, that could be called to write the content of a list to a text file, for checking and comparison purposes. private static void WriteOutList(string newFileLoc, IEnumerable dataList, string type = null, string cid = null) { using (var writeText = new StreamWriter(newFileLoc)) { var sb = new StringBuilder(); if (!string.IsNullOrEmpty(type)) { sb.Append("Valid "); sb.Append(type); sb.Append("'s for class id:'"); sb.Append(cid); sb.Append("'."); writeText.WriteLine(sb.ToString()); writeText.WriteLine("----------------------------------------------------------------------"); } foreach (var line in dataList) { writeText.WriteLine(line); } writeText.Close(); } 23 }
Testing (2 of 2) • Why the optional parameters (‘type’ and ‘cid’) then? • For ‘broader’, ‘narrower’ and ‘related’ terms, that are associated with an identifier, simply call the method thus: WriteOutList([file path], [data as a list of strings], [“BT”, “NT” or “RT”], [identifier]); • For data in the ‘big list’ simply call the method thus: WriteOutList([file path], [data as a list of strings]); • Saves on the need for overloading so one size fits all 24
Lesson Summary We have discussed the method used to navigate a poly-hierarchical tree structure, in XML, using C# and LINQ to XML, in order to provide the calling application with the data it requires, given an XML object to navigate in and a valid identifier within it. 25
Questions? 26
Find Out More Find out more Data Archive – data-archive.ac.uk UK Data Service – ukdataservice.ac.uk Contact Information Jonathan Sexton UK Data Archive Wivenhoe Park, University of Essex Colchester CO4 3SQ E-mail: jpsexton@essex.ac.uk 27
You can also read