Herman Code 🚀

XPath containstextsome string doesnt work when used with node with more than one Text subnode

February 20, 2025

📂 Categories: Programming
XPath containstextsome string doesnt work when used with node with more than one Text subnode

XPath is a almighty communication for navigating XML and HTML paperwork, permitting builders to pinpoint circumstantial components and attributes. Nevertheless, a communal stumbling artifact arises once utilizing the accommodates(matter(), 'any drawstring') relation connected nodes with aggregate matter subnodes. This seemingly easy relation tin behave unexpectedly, starring to vexation and debugging complications. Knowing wherefore this occurs and however to activity about it is important for effectual XPath utilization.

The Job with Aggregate Matter Nodes

The matter() relation successful XPath doesn’t ever behave arsenic 1 mightiness initially anticipate. Once utilized to a node with a azygous matter subnode, it neatly returns the matter contented. Nevertheless, once a node incorporates aggregate matter subnodes – frequently owed to whitespace oregon formatting inside the HTML oregon XML – matter() returns a node-fit of each these idiosyncratic matter nodes. The incorporates() relation, once utilized to this node-fit, lone checks the archetypal matter node. This leads to matches failing equal once the desired drawstring exists successful a consequent matter node.

For illustration, see this HTML snippet: <p>Hullo<br/>Planet</p>. Making use of //p[comprises(matter(), 'Planet')] wouldn’t lucifer due to the fact that ‘Planet’ is successful the 2nd matter node, not the archetypal (‘Hullo’).

This content often surfaces once dealing with dynamically generated contented oregon contented pulled from outer sources wherever the construction isn’t strictly managed.

Options for Focusing on Matter inside Aggregate Matter Nodes

Thankfully, location are respective workarounds for this content. 1 of the about dependable approaches entails utilizing the drawstring() relation. Alternatively of comprises(matter(), 'any drawstring'), usage incorporates(drawstring(), 'any drawstring'). The drawstring() relation concatenates each the matter nodes inside a fixed component into a azygous drawstring, guaranteeing that comprises() checks the full matter contented.

Different effectual scheme is to usage the normalize-abstraction() relation. This relation removes starring and trailing whitespace and replaces sequences of whitespace characters with a azygous abstraction. Piece not ever due, it’s adjuvant once whitespace variations mightiness beryllium interfering with the accommodates() relation.

Eventually, a much circumstantial attack entails iterating done all matter node individually, although this is little businesslike and frequently pointless.

Applicable Examples and Lawsuit Research

Ideate scraping merchandise descriptions from a web site wherever formatting introduces aggregate matter nodes. Utilizing incorporates(matter(), 'key phrase') mightiness neglect to seizure applicable merchandise owed to the aggregate matter nodes. Switching to accommodates(drawstring(), 'key phrase') would guarantee close recognition.

Successful XML processing, akin points tin originate once parts incorporate blended contented with interspersed matter nodes. Once more, drawstring() proves invaluable successful these conditions.

[Infographic Placeholder – illustrating the quality betwixt matter() and drawstring()]

Champion Practices and Issues

Knowing the nuances of XPath features is indispensable for businesslike and close XML and HTML processing. By using the accurate features and being alert of possible pitfalls similar the aggregate matter node content, builders tin debar clip-consuming debugging and guarantee the reliability of their codification.

See these champion practices:

  • Ever trial your XPath expressions totally.
  • Familiarize your self with XPath capabilities similar drawstring(), normalize-abstraction(), and concat().

Pursuing these pointers volition lend to cleaner, much sturdy codification.

FAQ

Q: What is the quality betwixt matter() and drawstring() successful XPath?

A: matter() returns a node-fit of each matter nodes inside a fixed component, piece drawstring() returns a azygous drawstring representing the concatenated matter contented of that component.

  1. Place the mark component.
  2. Usage accommodates(drawstring(), 'your drawstring') oregon normalize-abstraction() to relationship for aggregate matter nodes.
  3. Trial your XPath look.

Navigating the complexities of XPath tin beryllium difficult, however knowing the relation of incorporates() and however it interacts with aggregate matter nodes is a important measure towards mastering this almighty communication. By implementing the options outlined supra, builders tin make much strong and close codification for processing XML and HTML paperwork. For additional insights into XPath optimization, research sources similar W3Schools XPath Tutorial and MDN Internet Docs: XPath. Besides, see checking retired this associated article connected XPath capabilities: Precocious XPath Methods. Investing successful a deeper knowing of XPath volition undoubtedly streamline your internet improvement workflow.

XPath Optimization UsherQuestion & Answer :
I person a tiny job with XPath accommodates with dom4j …

Fto’s opportunity my XML is

<Location> <Addr> <Thoroughfare>ABC</Thoroughfare> <Figure>5</Figure> <Remark>BLAH BLAH BLAH <br/><br/>ABC</Remark> </Addr> </Location> 

Fto’s opportunity I privation to discovery each the nodes that person ABC successful the matter fixed the base Component…

Truthful the XPath that I would wanted to compose would beryllium

//*[comprises(matter(),'ABC')]

Nevertheless this is not what dom4j returns …. is this a dom4j job oregon my knowing however XPath plant, since that question returns lone the Thoroughfare component and not the Remark component?

The DOM makes the Remark component a composite component with 4 tags 2

[Matter = 'XYZ'][BR][BR][Matter = 'ABC'] 

I would presume that the question ought to inactive instrument the component since it ought to discovery the component and tally comprises connected it, however it doesn’t …

The pursuing question returns the component, however it returns cold much past conscionable the component – it returns the genitor components arsenic fine, which is undesirable to the job.

//*[accommodates(matter(),'ABC')] 

Does immoderate 1 cognize the XPath question that would instrument conscionable the components <Thoroughfare/> and <Remark/> ?

The <Remark> tag accommodates 2 matter nodes and 2 <br> nodes arsenic kids.

Your xpath look was

//*[accommodates(matter(),'ABC')] 

To interruption this behind,

  1. * is a selector that matches immoderate component (i.e. tag) – it returns a node-fit.
  2. The [] are a conditional that operates connected all idiosyncratic node successful that node fit. It matches if immoderate of the idiosyncratic nodes it operates connected lucifer the circumstances wrong the brackets.
  3. matter() is a selector that matches each of the matter nodes that are youngsters of the discourse node – it returns a node fit.
  4. comprises is a relation that operates connected a drawstring. If it is handed a node fit, the node fit is transformed into a drawstring by returning the drawstring-worth of the node successful the node-fit that is archetypal successful papers command. Therefore, it tin lucifer lone the archetypal matter node successful your <Remark> component – specifically BLAH BLAH BLAH. Since that doesn’t lucifer, you don’t acquire a <Remark> successful your outcomes.

You demand to alteration this to

//*[matter()[accommodates(.,'ABC')]] 
  1. * is a selector that matches immoderate component (i.e. tag) – it returns a node-fit.
  2. The outer [] are a conditional that operates connected all idiosyncratic node successful that node fit – present it operates connected all component successful the papers.
  3. matter() is a selector that matches each of the matter nodes that are kids of the discourse node – it returns a node fit.
  4. The interior [] are a conditional that operates connected all node successful that node fit – present all idiosyncratic matter node. All idiosyncratic matter node is the beginning component for immoderate way successful the brackets, and tin besides beryllium referred to explicitly arsenic . inside the brackets. It matches if immoderate of the idiosyncratic nodes it operates connected lucifer the situations wrong the brackets.
  5. incorporates is a relation that operates connected a drawstring. Present it is handed an idiosyncratic matter node (.). Since it is handed the 2nd matter node successful the <Remark> tag individually, it volition seat the 'ABC' drawstring and beryllium capable to lucifer it.