Descargar tweets a .csv usando Python

Por: Isabel Yepes

Explicamos como acceder a los tweets públicos de un usuario por medio de Python, para almacenarlos en un archivo .csv que posteriormente pueda ser cargado en un DataFrame.

Se requiere instalar la librería tweepy (El manejador de paquetes pip ya debe estar instalado, para instalar pip puedes consultar como Aquí)

pip3 install tweepy

Obtener tokens de acceso para conectarse a Twitter

  1. Ir a la página de Administración de Aplicaciones de Twitter, hacer sesión.
  2. Usar el botón de “Crear nueva app”
  3. Llenar los campos mandatorios de nombre, descripción y sitio web. Este último puede no ser una página activa dado que nuestra conexión será de solo lectura
  4. Aceptar los términos y condiciones y dar click en “Crear twitter app”
  5. Una vez creada la aplicación ir a la pestaña de “Permisos” y cambiarlos por “Solo lectura”, esto es importante pues solo la usaremos para descarga de datos, no para publicar nada en tu cuenta.
  6. Recibirás una alerta de que debes esperar a que los permisos se actualicen, una vez actualizados ve a la pestaña “Tokens de acceso”
  7. Dar click sobre “Crear Tokens de acceso” para generar las credenciales que la aplicación usará.  Estos datos son privados, cualquiera que los tenga podrá conectarse a twitter a nombre de tu aplicación.
  8. Usa el código a continuación y guárdalo en un archivo de script Python tweets.py
import tweepy #https://github.com/tweepy/tweepy
import csv

#Credenciales del Twitter API
consumer_key = "Agregar Consumer Key"
consumer_secret = "Agregar Consumer Secret"
access_key = "Agregar Access Key"
access_secret = "Agregar Access Secret"

#Remover los caracteres no imprimibles y los saltos de línea del texto del tweet
def strip_undesired_chars(tweet):
    stripped_tweet = tweet.replace('\n', ' ').replace('\r', '')
    char_list = [stripped_tweet[j] for j in range(len(stripped_tweet)) if ord(stripped_tweet[j]) in range(65536)]
    stripped_tweet=''
    for j in char_list:
        stripped_tweet=stripped_tweet+j
    return stripped_tweet

def get_all_tweets(screen_name):
    #Este método solo tiene permitido descargar máximo los ultimos 3240 tweets del usuario
    #Especificar aquí durante las pruebas un número entre 200 y 3240
    limit_number = 3240
    
    #autorizar twitter, inicializar tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)
    
    #inicializar una list to para almacenar los Tweets descargados por tweepy
    alltweets = []    
    
    #Hacer una petición inicial por los 200 tweets más recientes (200 es el número máximo permitido)
    new_tweets = api.user_timeline(screen_name = screen_name,count=200)
    
    #guardar los tweets más recientes
    alltweets.extend(new_tweets)
    
    #guardar el ID del tweet más antiguo menos 1
    oldest = alltweets[-1].id - 1
    
    #recorrer todos los tweets en la cola hasta que no queden más
    while len(new_tweets) > 0 and len(alltweets) <= limit_number:
        print ("getting tweets before" + str(oldest))
        
        #en todas las peticiones siguientes usar el parámetro max_id para evitar duplicados
        new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
        
        #guardar los tweets descargados
        alltweets.extend(new_tweets)
        
        #actualizar el ID del tweet más antiguo menos 1
        oldest = alltweets[-1].id - 1
        
        #informar en la consola como vamos
        print (str(len(alltweets)) + " tweets descargados hasta el momento")
    
    #transformar los tweets descargados con tweepy en un arreglo 2D array que llenará el csv
    outtweets = [(tweet.id_str, tweet.created_at, strip_undesired_chars(tweet.text),tweet.retweet_count,str(tweet.favorite_count)+'') for tweet in alltweets]
    
    #escribir el csv    
    with open('%s_tweets.csv' % screen_name, "w", newline='') as f:       
        writer = csv.writer(f, quoting=csv.QUOTE_ALL)
        writer.writerow(['id','created_at','text','retweet_count','favorite_count'''])
        writer.writerows(outtweets)    
    pass

if __name__ == '__main__':
    #especificar el nombre de usuario de la cuenta a la cual se descargarán los tweets
    get_all_tweets("Agregar TwitterUser")

Ejecutar el script según se indica a continuación, esto creará un archivo llamado TwitterUser_tweets.csv

python3 tweets.py

Fuente original del código Aquí, se hicieron cambios para compatibilidad con Python 3, para garantizar que todas las columnas quedan entre “”, eliminar los saltos de línea de los tweets y los emoticones que pueden generar problemas al leer el archivo .csv resultante desde python.

Para cargar el archivo .csv en un Dataframe de Python

import pandas as pd

archivo_csv = pd.read_csv("ruta/username_tweets.csv", index_col = 0)
tweetsDF = pd.DataFrame(archivo_csv)
print(tweetsDF)

Más info en How to extract Twitter tweets data and followers to Excel

Sobre otras características que pueden extraerse de los tweets consulta How to Download Twitter data in JSON – Twitter API Python examples

Y finalmente un video explicando el mismo código que presentamos

Women Who Code Medellín – Python, generación de datos aleatorios – Mayo 2018

Continuamos con la temática de Ciencia de Datos para los Meetups de Women Who Code Medellín, este mes tratamos generación de datos aleatorios usando la librería Numpy de Python.

Reglas del juego que se presenta como ejemplo para ser resuelto simulando datos aleatorios para resolver la pregunta ¿Cuál es la probabilidad de Ganar este juego?

REGLAS

  • Se usa un dado para jugar, por lo cual los valores son de 1 a 6
  • Si tiramos 3 o menos devolvemos 1 peso al juego
  • Si tiramos más de 3 y hasta 5 nos dan 1 peso
  • Si tiramos 6, entonces tiramos de nuevo el dado y nos dan tantos pesos como el dado lo indique.
  • Jugamos con monedas, de modo que no hay valores de pesos negativos
  • El turno del jugador consiste en tirar el dado 100 veces
  • Se gana el juego si se obtiene más de 50 pesos al final del turno

Aquí el código que usamos en la presentación.

import numpy as np
import matplotlib.pyplot as plt
#Iniciar la semilla para garantizar que los datos serán iguales cada que se corra el algoritmo
np.random.seed(204)
todos_turnos = []
#Definir cuántas veces se corre la simulación
muestras = 600
for x in range(muestras) :
     #Comenzar el turno sin monedas
     monedas = 0
     turno_aleatorio = [0]
     for x in range(100) :
         dado = np.random.randint(1,7)
         if dado <= 3 :
              monedas = max(0,monedas - 1)
         elif dado < 6 :
               monedas = monedas + 1
         else :
               monedas = monedas + np.random.randint(1,7)
         #Registrar cuantas monedas tengo al final de cada tirada
         turno_aleatorio.append(monedas)
      #Guardar los resultados del turno
      todos_turnos.append(turno_aleatorio)

#Formatear el arreglo como numpy array 
np_todos_turnos = np.array(todos_turnos) 

#Trasponer filas por columnas para adaptar a la gráfica 
np_todos_turnos_t = np.transpose(np_todos_turnos) 

#Sacar la última fila - resultado final de todos los turnos 
ultimos = np_todos_turnos_t[-1,:] 

#Calcular probabilidad de ganar contando los valores del vector
#mayores o iguales a 50 y dividiendo por el número de turnos 
print('La probabilidad de ganar el juego es de ' + str(round(100*(ultimos >= 50).sum()/muestras,2)) + '%')

#Preparar la Gráfica del desarrollo de todos los turnos
plt.figure(200)
plt.xlabel('Cantidad lanzamientos del dado')
plt.ylabel('Monedas')
plt.title('Desarrollo de '+ str(muestras)+ ' turnos')
plt.plot(np_todos_turnos_t)

#Preparar la Gráfica de distribución de los turnos
plt.figure(300)
plt.xlabel('Total de monedas al final del turno')
plt.ylabel('Cantidad de turnos en el rango del total')
plt.title('Histograma para '+ str(muestras)+ ' turnos')
plt.hist(ultimos)

#Mostrar las gráficas
plt.show()

Women Who Code Medellín, Meetup presencial – Workshop Datos Abiertos – Noviembre 12 2016

women-who-code-noviembreEste mes de Noviembre en Women Who Code Medellín tendremos Meetup Presencial, para que puedas reunirtey participar en un taller sobre datos abiertos, el sábado 12 de Noviembre de 2016, de 1:00 pm a 5:00 pm en el ViveLab de RutaN Medellín.

Regístrate para asistir en https://www.meetup.com/Women-Who-Code-Medellin/events/235267313/

Estaremos hablando de Datos abiertos (OpenData). Trae tu laptop para que puedas participar en un taller donde crearemos aplicaciones web que consumen datos abiertos de gobierno y charlaremos sobre la convocatoria abierta de emprendimiento usando datos abiertos.

Que bueno será verlas de nuevo 🙂

Women Who Code, Meetup virtual 15 de Octubre de 2016

Este mes de Octubre en Women Who Code Medellín tendremos Meetup virtual, para que puedas conectarte desde donde quiera que estés por YouTube Live, el sábado 15 de Octubre a las 2:00 pm.

Regístrate para asistir en https://www.meetup.com/es-ES/Women-Who-Code-Medellin/events/234281652/

Estaremos hablando de OpenData. Les contaremos qué son los datos abiertos, mencionaremos algunas plataformas disponibles para consumir datos, los clientes de Socrata (también te diremos qué es Socrata) y sobre una convocatoria abierta de emprendimiento usado datos abiertos.

Aquí la versión grabada del streaming.

Reasons why we need more women in technology

It was 1992, a young girl started her career as an electronics engineer, she was coming from a feminine school and entered a public university, she was habituated to be between girls and being the majority, that year it changed forever. She was 1 between only 7 girls admitted that semester in a group of 52 students, and for her surprise that was the largest number in years, with them there were only 35 girls in the whole career, that had about 500 students. In her mind that was abnormal. We are going to discuss why we are habituated to these abnormal numbers, and what we can do to have a statistically profession gender share that matches population one: 50/50.

  • Is the wrong question going to give us the right answers?

We are plenty informed about the lack of women in STEM, you just have to look around in a tech workplace, women are usually concentrated in administrative and design tasks. We have received tons of explanations about what, from lack of ambition in women, to life elections like maternity, to lack of interest. Others also talk about hostile environments, or closed boy clubs.

One of the problems about focusing in the diagnosis of the obstacles is the appearance of deniers, people interested on saying that the problem doesn’t exists and who are basically dedicated to debate (most of times without a real research basis) in order to “demonstrate” that nothing happens and everything works as it should. But as our young student quickly realized, no, it’s abnormal, no, there’s no reason for things to remain like that.

Also these explanations not necessarily offer women solutions to their personal struggles when pursuing a career in tech; statistics work well for groups of people but don’t give parents tools to motivate girls to be curious about science. We need to go farther from diagnosis and start building a propositive speech that can offer solutions to factual scenarios where women face restrictions in access or ability to adhere to a career in tech.

  • Why do we need more women in tech?

According Mckinsey reaching gender equality by 2025 would add 4.3 trillions in anual GDP for USA economy (6). Is there any good reason to leave that money out from the table? I don’t think so.  In the specific case of tech, actually is almost not affordable to maintain the status quo.

The other important reason is that women should pursue the most lucrative careers, and their consequently jobs, as the “Graduating to a pay gap” study mentions: “Graduates who earned degrees in female-dominated majors tend to get jobs that pay less than the jobs held by graduates who earned degrees in male-dominated majors. For example, one year after graduation, the average full-time-employed female social science major earned just 66 percent of what the average full- time-employed female engineering or engineering technology major earned ($31,924 compared with $48,493).” (18) So, if women miss the opportunity to participate in tech careers and jobs they will also miss the possibility of better income.

  • Let’s ask the right question: Why do women succeed in tech?

Now we give up with the diagnosis of the handicaps, lets take a look to the women who have actually overcome it, to identify patterns and ways to stimulate new women to participate in technology.

I had the opportunity to conduct a qualitative investigation with a partner about the reasons why women elect a career in STEM (10). Those women told us about their life stories and why they selected the career. Interviewed profiles came from recently graduates to company owners. These are some of their experiences:

– Attracted to biology in high school, she didn’t find a faculty offering it near her hometown, so she decided to pursue Systems Engineering after the counseling of a former school partner. She liked engineering too much and finished her studies.  She still works in IT but now develops more administrative roles.

– She wanted to study Mining Engineering, she found opposition from family because they thought it was not a suitable career for a woman. She ignored them an started at the faculty. She was habituated to be in male environments but it was not an issue for her. After graduation she found out more details about the environmental and social impact of mining that brought her to study science divulgation, and now she is the manager of a science museum that encourage young people and families to experiment with science.

– She grow up in a family dedicated to commercial business, she wanted to pursue Systems engineering and found support from her family, who had taught her the value of hardworking. After being trained in testing technology and being very successful on it she decided to form her own testing company. She is very recognized as entrepreneur and business woman. She says never feeling bias against her gender, she just dedicates to be a hard worker, but after all those years she has also learn to appreciate the calm and value of sharing time with her loved ones.

Even the qualitative research doesn’t give us a statistical sample, we are talking about a social phenomenon and it has to be understood from the voices of the people embedded in it. The research allowed us to find some common patterns between them:

– Early exposition to science in general woke up curiosity in them to pursue a tech career later, even if the exposition was in a branch of science different from the one they later choose. As we concluded, girls should be exposed to a fan of options to explore and learn, and science and technology must be in that fan.

– A encouraging family environment to exploration, to built and dismantle toys, awakes curiosity for science to.

– Family support helps to confirm the career election, however when there’s family opposition to it the development of self-esteem gives the girls the tools to overcome the hostile environment and to pursue the aspiration.

  • The role of role models

There are many people wanting to be the next Mark Zuckerberg or Steve Jobs, it´s time to start putting the spotlight over women in Tech that have done remarkable contributions so girls can easily identify themselves with them. We need to make more familiar to the general public. Here is a short list.

Ada Lovelace: Known as the first programmer, she created the first algorithms expected to run in the proposed Babbage’s calculation machine.  The Ada lovelace day is the second Tuesday of October and “is an international celebration of the achievements of women in science, technology, engineering and maths (STEM).” (11)

Hedy Lammar:  Usually remembered because of her beauty as cinema actress, she patented along with her husband the method of aleatory frequency hopping of a signal to avoid it’s interception. That concept was later developed as the Spread Spectrum technique that allows modern wireless systems to operate with less power consumption and efficiency. (12)

ENIAC TEAM: First noted in the ENIAC pictures and initially presented as “Refrigerator Ladies”, they were really the ENIAC programmers who created the whole code. ENIAC was the first general purpose computer of the history: Francis “Betty” Snyder Holberton, Betty “Jean” Jennings Bartik, Kathleen McNulty Mauchly Antonelli, Marlyn Wescoff Meltzer, Ruth Lichterman Teitelbaum, and Frances Bilas Spence.

Dr. Grace Hopper: Navy Rear Admiral, she created the first compiler, she was also the first programmer to use MARK I defense computers, receiving recognition for her success with MARK II and III. (14)  Nowadays the “Grace Hopper Celebration of Women in Computing” yearly gathers the largest amount of technologists in the world. (15)

Anita Borg: She developed Unix systems for Digital and Xerox PARC, she founded the “Institute for Women and Technology” dedicated to create programs, partnerships, and initiatives to include women in all aspects of technology. The institute organizes the “Grace Hopper” event. (16)

Margaret Hamilton: She was the lead software engineer for the Apollo project, she designed the software that allowed men go to the moon and safely return. In her time programming work was actually a women’s one, as it was only considered keypunching like typing, however they usually receive programming requirements directly and create the calculation approach by themselves.

As you just see, in the beginning of programming it was actually a feminine work, there’s no reason for it not to be one now.

  • Here comes the gender salary gap

Women are not yet paid as much as men, citing Gender Wage gap article in Time: “Among full-time workers, women earn 77% of what men earn. Even after accounting for the fact that women often work in different occupations and industries than men, as well as differences in work experience, union status, education and race, 41% of that gap is still unexplained. When social scientists control for every employment factor that could possibly explain the disparity, women still earn 91% of what men earn for doing the same job.” (19)

Sadly that tendency is not different in tech, here you can see data from “Graduating to a pay gap“, an study conducted by AAU.

Graduating to a pay gap, Average Annual Earnings One Year after College Graduation, by Undergraduate Major and Gender

Source “Graduating to a pay gap” study by Christianne Corbett, M.A. Catherine Hill, Ph.D. for AAUW page 14, figure 5

The bad news is you will find a gender gap if you get a work in tech, the good news is that you can found you own company and get rid of it 🙂

  • What can women do about it?

Women helping women is one of the best actions to take, because they already know what it is participating in Tech environments, the known issues, and better, the known solutions for them. Here is a list of resources.

To spread the word about the need of closing the gender gap I recommend:

CODE: Debugging the Gender Gap: “CODE documentary exposes the dearth of American female and minority software engineers and explores the reasons for this gender gap and digital divide.” It also gives clues about how to close the gap.

There are organizations dedicated to promote participation of women in Tech, many of them offer meetups, mentoring, and what I call “Safe environments”: places where you can learn and fail and do it right without gender pressure.

She´s coding: “She’s Coding was inspired by CODE: Debugging the Gender Gap documentary filmmakers, and was initiated by a team of engineers, marketers, and designers at JOLT Labs in Seattle, WA. JOLT Labs cares deeply about diversifying the workplace, not just because it’s the right thing to do…but because it’s the smart thing to do. Diverse teams produce better results!”

Women Who Code: “A U.S. based 501(c)(3) non-profit dedicated to inspiring women to excel in technology careers. We connect amazing women with other like minded amazing women around the globe who unite under one simple notion – the world of technology is much better with women in it.” I personally leader the Medellín network, take a look to the long list of more of 50 cities in 15 countries around the world.

A mighty girl: “A Mighty Girl is the world’s largest collection of books, toys, movies, and music for parents, teachers, and others dedicated to raising smart, confident, and courageous girls and, of course, for girls themselves!” An amazing resource for parents who want their daughters involved in science from early age, basically under the premise of offering girls with all the available options, so they can choose later.

  • What can men do about it?

One of the uncomfortable things women in tech know is the need to work harder than men to be considered at least equal, actually that affirmation is considered offensive by some men and quickly discarded as untrue. We need to stop this game, let’s take an example: an study about Github shows that code done by women is more prone to be accepted, but only if they don’t show their gender explicitly in their profiles (8).

Is very frustrating to see educated and professional men acting as climate change deniers, saying than the gender gap and gender bias are non-existent, even in front of evidence. Corporations under class action lawsuits for gender discrimination have hard time understanding the issue too (7), having “Boy clubs” that are basically social events where women are not invited to participate and where socialization with managers takes place and promotion opportunities appear.

As the UN initiative He for She advocates, we need men involved to solve this problem, and the first step to solve it is to acknowledge its existence. As addicted people, recovery starts only after accepting addiction. If a man has doubts about what is happening please read documentation available, this post offers resources about it.

The second phase is let women take part, let them propose, create, encourage them to express their opinion. Culture has taught women to start every phrase in a group meting with “Excuse me”, make them confortable to participate, not to please them but because their opinion will enrich the discussion.

Stop judging women capacity by their appearance, if she wears all pink or like a bro it has nothing to do with her cognitive ability (9). Use the same rule for all (women and men) assume the other person knows the subject as you do.

  • Parenting

USA is the only big economy where paid parental leave is not enforced by law for all workers, what puts pressure on an aspect of life that should be normal. After having a child people have to decide the kind of care they want for the baby, paying someone to do so or taking time from work. Women are more prone to leave the job after having a baby than men, because they are expected to be caregivers, not only by society but also by their husbands. Men interviewed answered they considered their careers more important than their wives’ (4),

However simple actions can reverse the numbers, Google reached a decrease of 50% of women attrition in postpartum after increasing maternity leave from 3 to 5 months and from partial to full paid. (3)

Surprisingly  having a child constitutes a bonus for men, employers hire more fathers as they are perceived as more stable and committed, in the other hand mothers are expected to be less productive and easily distracted. In terms of salary having a child implies an average 6% increase in salary for men and a 4% decrease for women. (2)

We have been looking to those subjects from the view of the workforce, but, Who speaks in the name of the families? Why aren’t we asking about the men losing the opportunity to share time with their children? Why does society expect men sacrifice their family in favor of their careers?

Recently the President of the biggest bank in Colombia decided to leave his successful position in order to take care of his family and health (5), even though some people praise him others criticize his determination, society needs to understand more about the benefits of children having both parents with enough time for them. This is a family issue, not just a women one.

Conclusion

It’s time to start taking action to increase participation of women in technology, steps start from the acknowledge of the problem existence and having the purpose to do something about it.  This is not only a women’s issue, is a society issue, one that costs productivity and leaves the whole industry without important contribution from women that give up at a specific time. Creating a more welcoming workplace benefits all, better consideration of parenting help both men and women and their personal satisfaction. Increasing participation of women in tech is a win-win result, more available workforce, more people creating new companies, more business for everyone.

Sources:

  1. The Simple Truth about the Gender Pay Gap (Spring 2016) http://www.aauw.org/resource/the-simple-truth-about-the-gender-pay-gap/
  2. The Motherhood Penalty vs. the Fatherhood Bonus http://www.nytimes.com/2014/09/07/upshot/a-child-helps-your-career-if-youre-a-man.html
  3. In Google’s Inner Circle, a Falling Number of Women http://www.nytimes.com/2012/08/23/technology/in-googles-inner-circle-a-falling-number-of-women.html
  4. Why U.S. Women Are Leaving Jobs Behind http://www.nytimes.com/2014/12/14/upshot/us-employment-women-not-working.html
  5. La carta que hizo renunciar al presidente de Bancolombia http://www.eltiempo.com/economia/empresas/renuncia-de-presidente-de-bancolombia-y-carta-de-su-hija/16531722
  6. The power of parity: Advancing women’s equality in the United States http://www.mckinsey.com/global-themes/employment-and-growth/The-power-of-parity-Advancing-womens-equality-in-the-United-States
  7. In denial: Corporate America’s blindness to gender discrimination http://fortune.com/2013/05/24/in-denial-corporate-americas-blindness-to-gender-discrimination/
  8. Women considered better coders – but only if they hide their gender https://www.theguardian.com/technology/2016/feb/12/women-considered-better-coders-hide-gender-github
  9. Coding Like a Girl https://medium.com/@sailorhg/coding-like-a-girl-595b90791cce#.ggm4tkcvz
  10. Las voces del SI http://www.latinity.info/detailed-program/#ST18
  11. Finding Ada http://findingada.com
  12. Hedy Lammar http://www.women-inventors.com/Hedy-Lammar.asp
  13. ENIAC Programmers http://eniacprogrammers.org
  14. Grace Hopper biography http://www.cs.yale.edu/homes/tap/Files/hopper-story.html
  15. Grace Hopper Event http://ghc.anitaborg.org
  16. Anita Borg http://anitaborg.org/about-us/about-anita-borg/
  17. Margaret Hamilton https://medium.com/@3fingeredfox/margaret-hamilton-lead-software-engineer-project-apollo-158754170da8#.7n5z7vxze
  18. Graduating to a pay gap, the earnings of women and men one year after college graduation http://www.aauw.org/files/2013/02/graduating-to-a-pay-gap-the-earnings-of-women-and-men-one-year-after-college-graduation.pdf?_ga=1.7578036.722397424.1379578621
  19. Gender Wage gap http://time.com/105292/gender-wage-gap/

 

Women Who Code, Meetup virtual 20 de febrero de 2016

Iniciamos este año Women Who Code Medellín con un Meetup virtual, para que puedas conectarte desde donde quiera que estés en el siguiente evento de google hangout on air, el sábado 20 de febrero a la 1:00 pm.

Regístrate para asistir en http://meetu.ps/2RT7Kg

Queremos escuchar lo que te interesa conocer y aprender durante este 2016, tus preguntas y comentarios durante el hangout son bienvenidos. Queremos que otras chicas se vinculen a la logística de los eventos, tu participación es importante.

En lo técnico estaremos hablando de prototipado rápido de aplicaciones con Invision. Puedes seguirnos y participar desde tu computador en cualquier lugar. Para enviar preguntas y participar puedes hacerlo desde Hangouts On Air, las preguntas pueden hacerse desde antes de que comience https://plus.google.com/events/c8mq0nd7sfmic4dlrjg7ucnrk0s

Para visualizarlo de forma no interactiva, puedes hacerlo desde YouTube.