# Tornado [Tornado](http://www.tornadoweb.org/) is a Python web framework and asynchronous networking library, originally developed at [FriendFeed](https://en.wikipedia.org/wiki/FriendFeed). By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for [long polling](http://en.wikipedia.org/wiki/Push_technology#Long_polling), [WebSockets](http://en.wikipedia.org/wiki/WebSocket), and other applications that require a long-lived connection to each user. Tornado can be roughly divided into three major components: - A web framework (including [`RequestHandler`](https://www.tornadoweb.org/en/stable/web.html#tornado.web.RequestHandler "tornado.web.RequestHandler") which is subclassed to create web applications, and various supporting classes). - Client- and server-side implementions of HTTP ([`HTTPServer`](https://www.tornadoweb.org/en/stable/httpserver.html#tornado.httpserver.HTTPServer "tornado.httpserver.HTTPServer") and [`AsyncHTTPClient`](https://www.tornadoweb.org/en/stable/httpclient.html#tornado.httpclient.AsyncHTTPClient "tornado.httpclient.AsyncHTTPClient")). - An asynchronous networking library including the classes [`IOLoop`](https://www.tornadoweb.org/en/stable/ioloop.html#tornado.ioloop.IOLoop "tornado.ioloop.IOLoop") and [`IOStream`](https://www.tornadoweb.org/en/stable/iostream.html#tornado.iostream.IOStream "tornado.iostream.IOStream"), which serve as the building blocks for the HTTP components and can also be used to implement other protocols. The Tornado web framework and HTTP server together offer a full-stack alternative to [WSGI](http://www.python.org/dev/peps/pep-3333/). While it is possible to use the Tornado HTTP server as a container for other WSGI frameworks ([`WSGIContainer`](https://www.tornadoweb.org/en/stable/wsgi.html#tornado.wsgi.WSGIContainer "tornado.wsgi.WSGIContainer")), this combination has limitations and to take full advantage of Tornado you will need to use Tornado’s web framework and HTTP server together. ---- Example of concurrent web spider with ```tornado.queues``` ``` #!/usr/bin/env python3 import asyncio import time from datetime import timedelta from html.parser import HTMLParser from urllib.parse import urljoin, urldefrag from tornado import gen, httpclient, queues base_url = "http://www.tornadoweb.org/en/stable/" concurrency = 10 async def get_links_from_url(url): """Download the page at `url` and parse it for links. Returned links have had the fragment after `#` removed, and have been made absolute so, e.g. the URL 'gen.html#tornado.gen.coroutine' becomes 'http://www.tornadoweb.org/en/stable/gen.html'. """ response = await httpclient.AsyncHTTPClient().fetch(url) print("fetched %s" % url) html = response.body.decode(errors="ignore") return [urljoin(url, remove_fragment(new_url)) for new_url in get_links(html)] def remove_fragment(url): pure_url, frag = urldefrag(url) return pure_url def get_links(html): class URLSeeker(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.urls = [] def handle_starttag(self, tag, attrs): href = dict(attrs).get("href") if href and tag == "a": self.urls.append(href) url_seeker = URLSeeker() url_seeker.feed(html) return url_seeker.urls async def main(): q = queues.Queue() start = time.time() fetching, fetched, dead = set(), set(), set() async def fetch_url(current_url): if current_url in fetching: return print("fetching %s" % current_url) fetching.add(current_url) urls = await get_links_from_url(current_url) fetched.add(current_url) for new_url in urls: # Only follow links beneath the base URL if new_url.startswith(base_url): await q.put(new_url) async def worker(): async for url in q: if url is None: return try: await fetch_url(url) except Exception as e: print("Exception: %s %s" % (e, url)) dead.add(url) finally: q.task_done() await q.put(base_url) # Start workers, then wait for the work queue to be empty. workers = gen.multi([worker() for _ in range(concurrency)]) await q.join(timeout=timedelta(seconds=300)) assert fetching == (fetched | dead) print("Done in %d seconds, fetched %s URLs." % (time.time() - start, len(fetched))) print("Unable to fetch %s URLS." % len(dead)) # Signal all the workers to exit. for _ in range(concurrency): await q.put(None) await workers if __name__ == "__main__": asyncio.run(main()) ```