All Questions
Tagged with html-parsing xml-parsing
143
questions
0
votes
0
answers
59
views
how to fix Error The markup in the document preceding the root element must be well-formed
I'm trying to parse html with parse library for google app script from github, but i get a Error to my code.
my code
function test() {
options = {
'method': 'post',
'contentType': '...
0
votes
1
answer
104
views
How to extraxt HTML elements from inside the "content:encoded" part of an RSS feed?
I am trying to generate a newsletter which, among other stuff, includes news entries which are present on the website as well. The website is built with WordPress and has an RSS feed, which is not ...
0
votes
0
answers
131
views
parsing a table using html agility pack in unity?
I am not a programmer so i have no experience. the issue is that I managed to parse some information from this website text but the table at the bottom that has the Announcement Date and the rest of ...
0
votes
2
answers
97
views
convert html to json using rdd.map
I have html file which I want to parse in pySpark.
Example:
<MainStruct Rank="1">
<Struct Name="A">
<Struct Name="AA">
<Struct Name="...
0
votes
1
answer
34
views
Unable to catch web page content with an HTML object from VBA
Im using this URL https://www.morningstar.com/stocks/xtks/1407/dividends
and the table with the upcoming dividends are displayed on my browser
I inspect the page and try to catch the content of the ...
0
votes
0
answers
189
views
C# Unity - How To Parse a Steam News Page to Extract an IMG URL? The Data received from the Get Resquest seems incomplete. Is it XHTML?
Problem:
I am having trouble trying to parse this page for the enclosure xml tags that contain image links.
https://store.steampowered.com/feeds/news/app/1348750/?cc=US&l=english&snr=...
0
votes
1
answer
322
views
How to parse information from same class using Beautifulsoup?
Suppose I have the following HTML
html_doc = """
<html>
<head>
<title>Page Title</title>
</head>
<body>
<div class = ...
1
vote
1
answer
179
views
How can I crawl/scrape (using R) the non-table EPA CompTox Dashboard?
The EPA CompTox Chemical Dashboard received an update, and my old code is not longer able to scrape the Boiling Point for chemicals. Is anyone able to help me scrape the Experimental Average Boiling ...
0
votes
3
answers
588
views
How can I remove html attributes from xml file in php?
I have a XML file that has HTML in it. Within the HTML tags there are attributes I'd like to remove, yet I need to keep all of the tags. An example would be:
<description><![CDATA[<div>&...
0
votes
1
answer
626
views
How to dynamically populate an HTML Table with XML Data?
I have an XML like below:
<?xml version="1.0" encoding="utf-8"?>
<TestRun id="abc-jhhg-yuh" name="Demo" runUser="Admin" xmlns="http://...
0
votes
1
answer
80
views
How to preserve HTML content?
Trying to preserve HTML content which is generated at specific location by powerMTA.
Below is the snippet of html content. Content-1.
<html>=0A<body>=0A<table style=3D"max-width:...
1
vote
1
answer
1k
views
XPath node that doesn't contain a child
I'm trying to access a certain element from by using XML but I just can't seem to get it, and I don't understand quite why.
<ul class="test1" id="content">
&...
1
vote
1
answer
98
views
PHP return XML string with values added to attributes missing values
I have to parse HTML and "HTML" from emails. I've already managed to create a function that cleans most of the errors such as improper nesting of elements.
I'm trying to determine how best ...
2
votes
2
answers
1k
views
What does LIBXML_NOBLANKS do, exactly?
What is the difference between
$domd=new DOMDocument();
$domd->loadHTML($html, LIBXML_NOBLANKS);
and
$domd=new DOMDocument();
$domd->loadHTML($html, 0);
?
edit: just in case someone wants to ...
0
votes
3
answers
50
views
Extract text and (hlStart and hlEnd) tags from simple html
I have a following part of html/xml file:
<p><hlstart ana="#ann224094"></<hlstart>Przed<hlend ana="#ann224094"></hlend> <hlstart ana="#ann224160"></hlstart&...