php/asp PHP로 문서의 HTML DOM을 손쉽게 가져오자~

구퍼 2008.08.14 10:50 조회 수 : 4020

http://simplehtmldom.sourceforge.net/

Easy Screen Scraping in PHP with the Simple HTML DOM Library

PHP를 이용해서 특정 웹문서의 HTML DOM을 손쉽게 가져올수 있는 라이브러리입니다.

온라인 document : http://simplehtmldom.sourceforge.net/manual.htm

문서는 다음과 같은 방법으로 가져옵니다.

직접 HTML string을 이용한 방법과 URL로의 접근, 그리고 local 문서로의 접근 이렇게 3가지의 방법이 있습니다.

// Create a DOM object from a string $html = str_get_html('<html><body>Hello!</body></html>');


// Create a DOM object from a URL
$html = file_get_html('http://www.google.com/');

// Create a DOM object from a HTML file $html = file_get_html('test.htm');

그리고

// Find all anchors, returns a array of element objects $ret = $html->find('a');


// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', 0);
// Find all <div> which attribute id=foo
$ret = $html->find('div[id=foo]');

// Find all <div> with the id attribute $ret = $html->find('div[id]'); // Find all element has attribute id $ret = $html->find('[id]');

이렇게 특정 엘리먼트를 발견할수 있고요.

엘리먼트의 속성을 가져올땐 이렇게

// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false) $value = $e->href;


// Set a attribute(If the attribute is non-value attribute (eg. checked, selected...), set it's value as true or false)
$e->href = 'my link';
// Remove a attribute, set it's value as null! 
$e->href = null;

// Determine whether a attribute exist? if(isset($e->href)) echo 'href exist!';

그리고 마음껏 DOM tree를 돌아다닐수도 있습니다.

// If you are not so familiar with HTML DOM, check this link to learn more...

// Example echo $html->find("#div1", 0)->children(1)->children(1)->children(2)->id; // or echo $html->getElementById("div1")->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id');

참 쉽죠?

무엇보다도 이걸 이용해서

RSS 제공하지 않는 곳의 게시물을 손쉽게 비공식 적으로 RSS로 발행할수 있을것 같습니다.

simplehtmldom_0_99.zip [File Size:39.8KB/Download:100]

이 게시물을

수정 삭제 목록

번호	제목	글쓴이	날짜	조회 수
78	안드로이드 로그인 세션유지에 관한 연구	구퍼	2011.02.22	18306
77	"지금 보고 있는 웹페이지 창을 닫으려고 합니다..." 안나타나게 하기	구퍼	2010.07.30	12383
76	프리페어스테이트먼트에 ? 표 자리에 값을 셋팅후 만들어진 SQL 문을 보는 유틸	운영자	2003.09.18	6949
75	select box의 option값 정렬 함수	박상현	2004.09.23	6705
74	c#으로 만든 asp.net 게시판	박상현	2003.12.15	6647
73	weblogic5.1과 ant를 이용하여 EJB개발(내부 개발용)	박상현	2004.06.22	6260
72	select box관련 함수들(입력, 수정, 삭제, 정렬등)	박상현	2004.09.23	6170
71	자바스크립트로 한글 , 초성 중성 종성 분리 (음소분리)	박상현	2005.05.19	5604
70	능동적으로 select box의 option값 설정및 삭제	박상현	2005.10.24	5162
69	PDFBox 0.6.1 - Java PDF Library	운영자	2003.04.15	5158
68	popup창 띄우는 4가지 방법	운영자	2003.09.24	5045
67	[javascript]textarea의 내용을 클립보드에 담아 처리하기	박상현	2003.10.09	4775
66	정보를 다시 보내지 않으면....<익스플로러 MsgBox방지법>- mothod:post	하늘과컴	2007.10.13	4728
65	C# 메신저 AicacaServer1.2(서버용)...	박상현	2003.12.15	4691
64	jsp페이지에서 popup창의 depth에 따른 메인 복귀 방법 달리하기...	박상현	2003.10.14	4599
63	C# 메신저 AicacaClient1.2(클라이언트용)	박상현	2003.12.15	4572
62	옥션처럼 실시간으로 남은시간 구하기	구퍼	2008.08.11	4409
61	[제로보드]게시물 소스보기 기능추가	운영자	2003.10.09	4173
60	RAS암호 시스템의 구현	박상현	2001.10.16	4162
»	PHP로 문서의 HTML DOM을 손쉽게 가져오자~	구퍼	2008.08.14	4020

쓰기 태그

첫 페이지 1 2 3 4 끝 페이지

tnt_lang

php/asp PHP로 문서의 HTML DOM을 손쉽게 가져오자~

댓글 0

tnt_lang

php/asp PHP로 문서의 HTML DOM을 손쉽게 가져오자~

댓글 0

LOGIN