創(chuàng)新互聯(lián)www.cdcxhl.cn八線動(dòng)態(tài)BGP香港云服務(wù)器提供商,新人活動(dòng)買(mǎi)多久送多久,劃算不套路!
10年積累的做網(wǎng)站、成都網(wǎng)站制作經(jīng)驗(yàn),可以快速應(yīng)對(duì)客戶(hù)對(duì)網(wǎng)站的新想法和需求。提供各種問(wèn)題對(duì)應(yīng)的解決方案。讓選擇我們的客戶(hù)得到更好、更有力的網(wǎng)絡(luò)服務(wù)。我雖然不認(rèn)識(shí)你,你也不認(rèn)識(shí)我。但先網(wǎng)站設(shè)計(jì)后付款的網(wǎng)站建設(shè)流程,更有競(jìng)秀免費(fèi)網(wǎng)站建設(shè)讓你可以放心的選擇與我們合作。Python爬蟲(chóng)如何使用CSS選擇器?針對(duì)這個(gè)問(wèn)題,這篇文章詳細(xì)介紹了相對(duì)應(yīng)的分析和解答,希望可以幫助更多想解決這個(gè)問(wèn)題的小伙伴找到更簡(jiǎn)單易行的方法。
CSS選擇器
這是另一種與find_all()方法有異曲同工的查找方法,寫(xiě)CSS時(shí),標(biāo)簽名不加任何修飾,類(lèi)名前加.,id名前加#。
在這里我們也可以利用類(lèi)似的方法來(lái)篩選元素,用到的方法是soup.select(),返回的類(lèi)型是list。
(1)通過(guò)標(biāo)簽名查找
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ # 創(chuàng)建 Beautiful Soup 對(duì)象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("title")) print(soup.select("b")) print(soup.select("a"))
運(yùn)行結(jié)果
[<title>The Dormouse's story</title>] [<b>The Dormouse's story</b>] [<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
(2)通過(guò)類(lèi)名查找
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ # 創(chuàng)建 Beautiful Soup 對(duì)象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select(".title"))
運(yùn)行結(jié)果
[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>]
(3)通過(guò)id名查找
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ # 創(chuàng)建 Beautiful Soup 對(duì)象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("#link1"))
運(yùn)行結(jié)果
[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>]
(4)組合查找
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ # 創(chuàng)建 Beautiful Soup 對(duì)象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("p #link1"))
運(yùn)行結(jié)果
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]
(5)屬性查找
查找時(shí)還可以加入屬性元素,屬性需要用中括號(hào)括起來(lái),注意屬性和標(biāo)簽屬于同一節(jié)點(diǎn),所以中間不能加空格,否則會(huì)無(wú)法匹配到。
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ # 創(chuàng)建 Beautiful Soup 對(duì)象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("a[class='sister']"))
運(yùn)行結(jié)果
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
同樣,屬性仍然可以與上述查找方式組合,不在同一節(jié)點(diǎn)的空格隔開(kāi),同一節(jié)點(diǎn)的不加空格。
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ # 創(chuàng)建 Beautiful Soup 對(duì)象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("p a[class='sister']"))
運(yùn)行結(jié)果
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
(6)獲取內(nèi)容
以上的select()方法返回的結(jié)果都是列表形式,可以遍歷形式輸出,然后用get_text()方法來(lái)獲取它的內(nèi)容。
#!/usr/bin/python3 # -*- coding:utf-8 -*- from bs4 import BeautifulSoup html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ # 創(chuàng)建 Beautiful Soup 對(duì)象,指定lxml解析器 soup = BeautifulSoup(html, "lxml") print(soup.select("p a[class='sister']")) for item in soup.select("p a[class='sister']"): print(item.get_text())
運(yùn)行結(jié)果
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3"> Tillie</a>] Lacie Tillie
注意:<!-- Elsie -->為注釋內(nèi)容,未輸出
關(guān)于Python爬蟲(chóng)如何使用CSS選擇器問(wèn)題的解答就分享到這里了,希望以上內(nèi)容可以對(duì)大家有一定的幫助,如果你還有很多疑惑沒(méi)有解開(kāi),可以關(guān)注創(chuàng)新互聯(lián)-成都網(wǎng)站建設(shè)公司行業(yè)資訊頻道了解更多相關(guān)知識(shí)。
網(wǎng)站欄目:Python爬蟲(chóng)如何使用CSS選擇器-創(chuàng)新互聯(lián)
鏈接分享:http://www.rwnh.cn/article38/dgscsp.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供自適應(yīng)網(wǎng)站、電子商務(wù)、網(wǎng)站建設(shè)、網(wǎng)站收錄、ChatGPT、網(wǎng)站內(nèi)鏈
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶(hù)投稿、用戶(hù)轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話(huà):028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)