Skip to main content

All Questions

Tagged with
1 vote
1 answer
54 views

How to handle regex in BeautifulSoup / CSS selector?

I'm looking for a solution to use regex in BeautifulSoup to find elements that may contain the text HO # with possible spaces and ignoring case sensitivity. check_ho_number3 = soup.select_one('td:-...
Tonin thomas's user avatar
0 votes
2 answers
511 views

Extract Text from Unstructured HTML using Python and Beautiful Soup

For the HTML code below, how do I extract the content below aaa, bbb after the tag using regular Expressions and Beautiful Soup with the Python Requests Library <html> <head></head> ...
ZASE's user avatar
  • 66
0 votes
2 answers
114 views

How to find main price and discounted price in a webpage using selenium and python?

I am trying to find a way to find main price and also discounted price in a webpage but I can get just one of them and I need a good pattern or method to extract all price and discounted prices from ...
Alireza Mirhabibi - IRAN's user avatar
0 votes
1 answer
51 views

regex code to find email address within HTML script webscraping

I am trying to extract phone, address and email from couple of corporate websites through webscraping My code for that is as follows l = 'https://www.zimmermanfinancialgroup.com/about' address_t = [] ...
anonymous13's user avatar
-3 votes
1 answer
43 views

Manipulate string in python

I am scraping web content with Beautifulsoup, Python and I would like to manipulate the following strings: 'Induktora 28" 36V/14 Ah | 16.5" Bordo' 'Induktora 28" 36V/14 Ah | 18" ...
Bohumír Mäsiar's user avatar
-1 votes
2 answers
464 views

BeautifulSoup: Search and replace in the text parts of HTML

I want to do a search and replace on the textual part of the content of the HTML elements. E.g., replacing foo with <b>bar</b> in <div id="foo">foo <i>foo</i> ...
HappyFace's user avatar
  • 4,007
0 votes
1 answer
63 views

Python - Beautifulsoup - parse multiple span elements

I am trying to extract title from 'span'. Using the below code as an example, the output I am looking for is 6536 and 9319, which are part of 'title'. Seen below: span aria-label="6536 users ...
JJH's user avatar
  • 9
0 votes
1 answer
70 views

What's the proper way to exclude uppercase word/s in regex python

Let's say I've scrapped this from a website. PARIS - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (2015). Ut enim ad minim ...
pixsay's user avatar
  • 27
0 votes
1 answer
411 views

Regex pattern with accented characters

I am trying to get the words that start with a capital letter regardless of whether it has a special character or not in the word. Currently, my pattern only gets capital letters without accents. I ...
user avatar
1 vote
2 answers
58 views

How to extract key info from <script> tag

I'm trying to extract the user id from this link https://www.instagram.com/design.kaf/ using bs4 and Regex Found a JSON key inside script tag called "profile_id" but I can't even search that ...
Hossam Hassan's user avatar
1 vote
1 answer
35 views

Find string between two sets of characters or 3rd and 4th quotation marks

I have been playing with Beautifulsoup and re to collect only the links I need from a webpage. I was able to cut the page content to a <class 'bs4.element.ResultSet'> This dataset contains the ...
Blackwidow's user avatar
0 votes
1 answer
478 views

api request parameters are ignored

This code works as expected and shows 3 recent wikipedia editors. My question is that if I uncomment the second URL line, I should get Urmi27 three times or None if the user is not listed. But I get ...
shantanuo's user avatar
  • 32.2k
1 vote
2 answers
51 views

Extracting RegEx pattern across list excluding other html code

I've written a script to pull a list of available report url extensions page available for text extraction. I've used parsing and BeautifulSoup to extract the reference area for the latest report ...
Pryore's user avatar
  • 520
-2 votes
2 answers
51 views

How to extract text from html in Python with BeautifulSoup4

I am trying to extract its text i.e only the filename from the below html tags So in the end I would like to have output as below- BeforeStructure.PNG AfterStructure.PNG Can you please guide how to I ...
Deepali's user avatar
  • 97
1 vote
2 answers
687 views

Adding line breaks after times in parentheses

I'm trying to clean up some data from web scraping. This is an example of the information I'm working with: Best Time Adam Jones (w/ help) (6:34)Best Time Kenny Gobbin (a) (2:38)Personal Best Matt ...
SpingoTakagi's user avatar

15 30 50 per page
1
2 3 4 5
47