The piratepie.org

From Medialab Prado
Jump to: navigation, search


¿Cuál es el daño real que la piratería hace a la industria musical y cinematográfica? ¿Tienen los piratas mejor gusto que los consumidores legales? ¿Qué es lo más pirateado? The Pirate Bay, al borde de la desaparición después de la denuncia de un tribunal sueco, recientemente publicó sus 20 GB de torrents. Una ocasión única para homenajear el gran hub mundial de la piratería.

Durante Visualizar09, nuestro proyecto tendrá tres fases. Primero, una sesión de brainstorming sobre a qué preguntas pueden, estos datos, dar una respuesta visual. En segundo lugar, prepararemos el código necesario para crear los prototipos de visualizaciones que más convenzan al grupo de trabajo y que respondan al máximo número de preguntas propuestas. Y, finalmente, decidiremos qué preguntas tienen las respuestas más elegantes, impresionantes, sorprendentes y bellas, y construiremos resultados limpios y claros. Los resultados serán, por supuesto, las visualizaciones, ya sean interactivas, para impresión, animaciones, o en cualquier otro formato; éstas se entregarán al final de Visualizar y también se publicaran en ThePiratePie.org.


Author and collaborators

PROJECT BY

Mar Canet, Jaume Nualart y David Stolarsky

COLLABORATORS

Inma L.H. (INLOHO)

Caetano Carvalho

Carles Gutiérrez

Dan Pleck

Jesús Rodríguez

Travis Kirton

Juan Galván

Project Goal

Create a piracy monitor (website) that describes more clearly how, where, when, and perhaps even why internet piracy occurs. The website will contain interactive (and not interactive) views into current and historic piracy trends. The Pirate Bay will serve as the only source of data at first, but eventually we can expand to account for a greater portion of internet piracy.

Inspiration, background, context, references

Short Term Goal

Useful, functional, stable, attractive website and (Pirate Bay only) torrent activity logger which stay current without much programmer intervention.

Long Term Goal

Continued torrent activity logging and up-to-date visualization; expansion into additional torrent sets and potentially other piracy modes.

Data / Ownership of Data

The core data is the torrent files hosted on the pirate bay (and their states over time). Some of our ideas will use other data sources, like IMDB, copyright agencies, Amazon/iTunes Store,

Data --> Experience

Our torrent logger will sample torrent activity in a sensible/targeted way; the samples and some metadata will reside in our database.

Assumptions / Hypotheses

We have scattered visualization plans, and similarly scattered hypotheses:

  • Pirate tastes: pirates have better taste than legal consumers (as measured by critical acclaim) AND/OR pirate taste demographics reflect younger age groups who have less money to spend on music and movies and software
  • Most of the content on the pirate bay is copyrighted
  • Porn is downloaded at night

We also conducted a very scientific survey to gauge peoples' assumptions/hypotheses about the pirate bay.

List of visualization ideas

  • Particle system showing realtime activity (seeders, leechers)
  • Baby name wizard-style, searchable stacked graph of torrent popularity (seeders, leechers)
  • Geo plot torrents by category, by time (with day/night overlay)
  • Birth of individual torrent (Geo...)
  • Comparison with IMDB Box office totals, critical ratings
  • Statistics & quantitative visualizations (funny, interesting, notable)
  • Digg-like collection of user-generated SQL (or otherwise) queries, perhaps with visualization of common result types


Early Stage Database Schema

Piratepie.sql.png


SQL QUERYS

> para treemap (añadir filtros temporales)

SELECT torrentinfo.cat, COUNT( * ) , SUM( torrentinfo.size ), cat.title FROM torrentinfo LEFT JOIN cat ON torrentinfo.id=cat.id GROUP BY torrentinfo.cat


SELECT t1.title, SUM( t2.size ) FROM cat AS t1 JOIN torrentinfo AS t2 WHERE t2.cat = t1.id GROUP BY t1.id

> Otras cosas

// the los nuevos torrent su actividad de las ultimas 24 // es un cruze de torrent menos de 24 con la informacion de la actividad a lo largo de las 24 (con el max de resolucion)

def getLast24Activity(self):

       sql = "SHOW TABLES"
       ar = self.callSql(sql)
       return ar
                                          • DE LA ULTIMA HORA **********

SELECT SUM(leechers),SUM(seeders),FROM_UNIXTIME(t1.date ) FROM torrentinfo AS t1 JOIN activity AS t2 ON t2.tpb_id=t1.id WHERE (FROM_UNIXTIME( t1.date ) > DATE_ADD( CURDATE( ) , INTERVAL -1 HOUR) AND cat=201) GROUP BY cat

          • INTERVALO DE HORAS *****

SELECT SUM(leechers),SUM(seeders),FROM_UNIXTIME(t1.date ) FROM torrentinfo AS t1 JOIN activity AS t2 ON t2.tpb_id=t1.id WHERE (FROM_UNIXTIME( t1.date ) > DATE_ADD( CURDATE( ) , INTERVAL -5 HOUR) AND cat=201 AND FROM_UNIXTIME( t1.date ) < DATE_ADD( CURDATE( ) , INTERVAL -4 HOUR)) GROUP BY cat


// torrent treemap of activity: actividad de las categorias

   def getAllActivity(self):
       sql = "SHOW TABLES"
       ar = self.callSql(sql)
       return ar 

SELECT t1.id,SUM(t3.seeders),SUM(t3.leechers) FROM cat AS t1 JOIN torrentinfo AS t2 JOIN activity AS t3 where (t2.cat=t1.id) and (t3.tpb_id=t2.id) GROUP BY t1.id

>>>>> Peliculas (para J)

(me falta - la tengo que buscar)

NOTA: Está sin acabar - falta tiempo/timestamp