Googlesearch Documentation - Mario Vilas - Jul 22, 2019 - Read the Docs
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
googlesearch Documentation Mario Vilas Jul 22, 2019
Contents 1 Indices and tables 1 2 Reference 3 Python Module Index 11 Index 13 i
ii
CHAPTER 1 Indices and tables • genindex • modindex • search 1
googlesearch Documentation 2 Chapter 1. Indices and tables
CHAPTER 2 Reference googlesearch.search(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0, stop=None, domains=None, pause=2.0, only_standard=False, extra_params={}, tpe=”, user_agent=None) Search the given query string using Google. Parameters • query (str) – Query string. Must NOT be url-encoded. • tld (str) – Top level domain. • lang (str) – Language. • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). • safe (str) – Safe search. • num (int) – Number of results per page. • start (int) – First result to retrieve. • or None stop (int) – Last result to retrieve. Use None to keep searching forever. • of str or None domains (list) – A list of web domains to constrain the search. • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! • only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. • of str to str extra_params (dict) – A dictionary of extra HTTP GET param- eters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. 3
googlesearch Documentation • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow- ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} • or None user_agent (str) – User agent for the HTTP requests. Use None for the default. Return type generator of str Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever. googlesearch.search_images(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0, stop=None, pause=2.0, domains=None, only_standard=False, ex- tra_params={}) Shortcut to search images. Note Beware, this does not return the image link. Parameters • query (str) – Query string. Must NOT be url-encoded. • tld (str) – Top level domain. • lang (str) – Language. • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). • safe (str) – Safe search. • num (int) – Number of results per page. • start (int) – First result to retrieve. • or None stop (int) – Last result to retrieve. Use None to keep searching forever. • of str or None domains (list) – A list of web domains to constrain the search. • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! • only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. • of str to str extra_params (dict) – A dictionary of extra HTTP GET param- eters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow- ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} • or None user_agent (str) – User agent for the HTTP requests. Use None for the default. Return type generator of str Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever. 4 Chapter 2. Reference
googlesearch Documentation googlesearch.search_news(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0, stop=None, domains=None, pause=2.0, only_standard=False, ex- tra_params={}) Shortcut to search news. Parameters • query (str) – Query string. Must NOT be url-encoded. • tld (str) – Top level domain. • lang (str) – Language. • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). • safe (str) – Safe search. • num (int) – Number of results per page. • start (int) – First result to retrieve. • or None stop (int) – Last result to retrieve. Use None to keep searching forever. • of str or None domains (list) – A list of web domains to constrain the search. • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! • only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. • of str to str extra_params (dict) – A dictionary of extra HTTP GET param- eters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow- ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} • or None user_agent (str) – User agent for the HTTP requests. Use None for the default. Return type generator of str Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever. googlesearch.search_videos(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0, stop=None, domains=None, pause=2.0, only_standard=False, ex- tra_params={}) Shortcut to search videos. Parameters • query (str) – Query string. Must NOT be url-encoded. • tld (str) – Top level domain. • lang (str) – Language. 5
googlesearch Documentation • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). • safe (str) – Safe search. • num (int) – Number of results per page. • start (int) – First result to retrieve. • or None stop (int) – Last result to retrieve. Use None to keep searching forever. • of str or None domains (list) – A list of web domains to constrain the search. • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! • only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. • of str to str extra_params (dict) – A dictionary of extra HTTP GET param- eters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow- ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} • or None user_agent (str) – User agent for the HTTP requests. Use None for the default. Return type generator of str Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever. googlesearch.search_shop(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0, stop=None, domains=None, pause=2.0, only_standard=False, ex- tra_params={}) Shortcut to search shop. Parameters • query (str) – Query string. Must NOT be url-encoded. • tld (str) – Top level domain. • lang (str) – Language. • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). • safe (str) – Safe search. • num (int) – Number of results per page. • start (int) – First result to retrieve. • or None stop (int) – Last result to retrieve. Use None to keep searching forever. • of str or None domains (list) – A list of web domains to constrain the search. 6 Chapter 2. Reference
googlesearch Documentation • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! • only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. • of str to str extra_params (dict) – A dictionary of extra HTTP GET param- eters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow- ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} • or None user_agent (str) – User agent for the HTTP requests. Use None for the default. Return type generator of str Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever. googlesearch.search_books(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0, stop=None, domains=None, pause=2.0, only_standard=False, ex- tra_params={}) Shortcut to search books. Parameters • query (str) – Query string. Must NOT be url-encoded. • tld (str) – Top level domain. • lang (str) – Language. • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). • safe (str) – Safe search. • num (int) – Number of results per page. • start (int) – First result to retrieve. • or None stop (int) – Last result to retrieve. Use None to keep searching forever. • of str or None domains (list) – A list of web domains to constrain the search. • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! • only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. • of str to str extra_params (dict) – A dictionary of extra HTTP GET param- eters, which must be URL encoded. For example if you don’t want Google to filter similar 7
googlesearch Documentation results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow- ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} • or None user_agent (str) – User agent for the HTTP requests. Use None for the default. Return type generator of str Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever. googlesearch.search_apps(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, num=10, start=0, stop=None, domains=None, pause=2.0, only_standard=False, ex- tra_params={}) Shortcut to search apps. Parameters • query (str) – Query string. Must NOT be url-encoded. • tld (str) – Top level domain. • lang (str) – Language. • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). • safe (str) – Safe search. • num (int) – Number of results per page. • start (int) – First result to retrieve. • or None stop (int) – Last result to retrieve. Use None to keep searching forever. • of str or None domains (list) – A list of web domains to constrain the search. • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! • only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. • of str to str extra_params (dict) – A dictionary of extra HTTP GET param- eters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow- ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} • or None user_agent (str) – User agent for the HTTP requests. Use None for the default. Return type generator of str 8 Chapter 2. Reference
googlesearch Documentation Returns Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever. googlesearch.lucky(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, only_standard=False, ex- tra_params={}, tpe=”) Shortcut to single-item search. Parameters • query (str) – Query string. Must NOT be url-encoded. • tld (str) – Top level domain. • lang (str) – Language. • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). • safe (str) – Safe search. • num (int) – Number of results per page. • start (int) – First result to retrieve. • or None stop (int) – Last result to retrieve. Use None to keep searching forever. • of str or None domains (list) – A list of web domains to constrain the search. • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! • only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. • of str to str extra_params (dict) – A dictionary of extra HTTP GET param- eters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow- ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} • or None user_agent (str) – User agent for the HTTP requests. Use None for the default. Return type str Returns URL found by Google. googlesearch.hits(query, tld=’com’, lang=’en’, tbs=’0’, safe=’off’, domains=None, extra_params={}, tpe=”, user_agent=None) Search the given query string using Google and return the number of hits. Note This is the number reported by Google itself, NOT by scraping. Parameters • query (str) – Query string. Must NOT be url-encoded. • tld (str) – Top level domain. • lang (str) – Language. 9
googlesearch Documentation • tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). • safe (str) – Safe search. • num (int) – Number of results per page. • start (int) – First result to retrieve. • or None stop (int) – Last result to retrieve. Use None to keep searching forever. • of str or None domains (list) – A list of web domains to constrain the search. • pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! • only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. • of str to str extra_params (dict) – A dictionary of extra HTTP GET param- eters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. • tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the follow- ing values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} • or None user_agent (str) – User agent for the HTTP requests. Use None for the default. Return type int Returns Number of Google hits for the given search query. googlesearch.ngd(term1, term2) Return the Normalized Google distance between words. For more info, refer to: https://en.wikipedia.org/wiki/Normalized_Google_distance Parameters • term1 (str) – First term to compare. • term2 (str) – Second term to compare. Return type float Returns Normalized Google distance between words. googlesearch.get_random_user_agent() Get a random user agent string. Return type str Returns Random user agent string. 10 Chapter 2. Reference
Python Module Index g googlesearch, 3 11
googlesearch Documentation 12 Python Module Index
Index G get_random_user_agent() (in module google- search), 10 googlesearch (module), 3 H hits() (in module googlesearch), 9 L lucky() (in module googlesearch), 9 N ngd() (in module googlesearch), 10 S search() (in module googlesearch), 3 search_apps() (in module googlesearch), 8 search_books() (in module googlesearch), 7 search_images() (in module googlesearch), 4 search_news() (in module googlesearch), 4 search_shop() (in module googlesearch), 6 search_videos() (in module googlesearch), 5 13
You can also read