Zeitungsartikel per API herunterladen

library(tidyverse)
library(newsanchor)
library(printr)
library(httr)
library(jsonlite)

News API

Es gibt eine Seite News API, die es erlaubt, per API News (Artikel, Schlagzeilen) von weltweiten Quellen herunterzuladen, per JSON API.

Gibt’s da auch ein R-Paket?

Ja - NewsAnchor.

Setup

Zuerst muss man sich bei der Seite eine API Key holen, für Entwicklerzwecke kostenlos. Komfortabel ist, sich den Schlüssel in die R-environment-Datei (.Renviron) zu schreiben, s. hier für mehr Infos.

Das kann man z.B. so machen:

newsanchor::set_api_key(path = "~/.Renviron")

So fragt man den API-Schlüssel dann ab:

my_api_key <- Sys.getenv("NEWS_API_KEY")

Abfrage

results <- get_everything(query = "Corona",
                          language = "de")

results_main <- results$results_df

In eine Datei schreiben:

saveRDS(results, file = "newsanchor1.rds")

Und wieder laden:

results <- read_rds("newsanchor1.rds")

Abfrage 2

results_covid <- get_everything(query = "Covid",
                          language = "en")
write_rds(results_covid,
          file = "results_covid1.rds")
results_covid <- read_rds(file = "results_covid1.rds")

Metadata:

results_covid$metadata
total_results status_code request_date request_url code message page page_size
175517 200 2021-07-07 10:04:28 https://newsapi.org/v2/everything?q=Covid&language=en&sortBy=publishedAt&pageSize=100&page=1 1 100

Ergebnisse:

results_covid$results_df %>% 
  slice(1:2)
author title description url url_to_image published_at content id name
Colm Quinn Saudi Arabia’s Hushed Washington Visit The visit of Khalid bin Salman has been met with little fanfare and proves Saudi Arabia is far from the “pariah” of Biden’s campaign pledge. https://foreignpolicy.com/2021/07/07/saudi-arabia-washington-khalid-bin-salman/ https://foreignpolicy.com/wp-content/uploads/2021/07/saudi-arabia-GettyImages-1170949600.jpg?w=1000 2021-07-07 09:53:28 Here is todays Foreign Policy brief: Prince Khalid bin Salman continues Washington visit, the United Nations calls for talks on Ethiopian dam dispute, and Iran takes steps to produce enriched uranium… [+6709 chars] NA Foreign Policy
NA District in central Israel labeled ‘Red’ due to high infection rate Israel’s Ministry of Health declared the city of Kfar Yona in the center of the country as a “Red” - or highly contagious - city due to a recent spike in local COVID cases.The Hod Hasharon district received the second worst “Orange” label, as 20 new patients … https://www.israelnationalnews.com/News/Flash.aspx/546065 http://u.a7.org/pictures/799/799624.jpg 2021-07-07 09:52:57 Israel’s Ministry of Health declared the city of Kfar Yona in the center of the country as a “Red” - or highly contagious - city due to a recent spike in local COVID cases.
The Hod Hasharon district… [+189 chars] NA Israelnationalnews.com

Aufbereiten

results_covid_short <- 
  results_covid$results_df %>% 
  select(author, title, content, published_at, name) %>%
  drop_na()

results_covid_short %>% 
  slice(1:2)
author title content published_at name
Colm Quinn Saudi Arabia’s Hushed Washington Visit Here is todays Foreign Policy brief: Prince Khalid bin Salman continues Washington visit, the United Nations calls for talks on Ethiopian dam dispute, and Iran takes steps to produce enriched uranium… [+6709 chars] 2021-07-07 09:53:28 Foreign Policy
soyacincau 1 Utama: WhatsApp message claiming over 50 people infected with Delta variant is fake news The management of 1 Utama Shopping Centre today denied that any of its personnel or patrons have become infected with the highly contagious new strain of COVID-19 known identified as Delta.
In a Fac… [+814 chars] 2021-07-07 09:51:45 SoyaCincau.com
write_csv(results_covid_short,
          file = "news_covid_short.csv")

Quellen

Hier ist eine Liste der unterstützen Quellen:

terms_sources
sources
abc-news
abc-news-au
aftenposten
al-jazeera-english
ansa
argaam
ars-technica
ary-news
associated-press
australian-financial-review
axios
bbc-news
bbc-sport
bild
blasting-news-br
bleacher-report
bloomberg
breitbart-news
business-insider
business-insider-uk
buzzfeed
cbc-news
cbs-news
cnbc
cnn
cnn-es
crypto-coins-news
daily-mail
der-tagesspiegel
die-zeit
el-mundo
engadget
entertainment-weekly
espn
espn-cric-info
financial-post
financial-times
focus
football-italia
fortune
four-four-two
fox-news
fox-sports
globo
google-news
google-news-ar
google-news-au
google-news-br
google-news-ca
google-news-fr
google-news-in
google-news-is
google-news-it
google-news-ru
google-news-sa
google-news-uk
goteborgs-posten
gruenderszene
hacker-news
handelsblatt
ign
il-sole-24-ore
independent
infobae
info-money
la-gaceta
la-nacion
la-repubblica
le-monde
lenta
lequipe
les-echos
liberation
marca
mashable
medical-news-today
metro
mirror
msnbc
mtv-news
mtv-news-uk
national-geographic
national-review
nbc-news
news24
new-scientist
news-com-au
newsweek
new-york-magazine
next-big-future
nfl-news
nhl-news
nrk
politico
polygon
rbc
recode
reddit-r-all
reuters
rt
rte
rtl-nieuws
sabq
spiegel-online
svenska-dagbladet
t3n
talksport
techcrunch
techcrunch-cn
techradar
the-american-conservative
the-economist
the-globe-and-mail
the-guardian-au
the-guardian-uk
the-hill
the-hindu
the-huffington-post
the-irish-times
the-jerusalem-post
the-lad-bible
the-new-york-times
the-next-web
the-sport-bible
the-telegraph
the-times-of-india
the-verge
the-wall-street-journal
the-washington-post
the-washington-times
time
usa-today
vice-news
wired
wired-de
wirtschafts-woche
xinhua-net
ynet

Mit GET-Abfrage

So setzt man die Abfrage zusammen, bzw. so kann man es tun:

Beispiel: Alle Artikel über Tesla aus dem letzten Monat, sortiert nach dem neuesten

apiurl <- 
  "https://newsapi.org/v2/everything?"

Suchbegriff:

term <- "q=tesla"

Zeitraum begrenzen:

from_to <- "&from=2021-06-07"

Sortieren:

sortby <- "&sortBy=publishedAt"

API-Schlüssel:

apikey <- paste0("&apiKey=",my_api_key)

In tutto:

GET_term <- paste0(apiurl, 
                   term,
                   from_to,
                   sortby,
                   apikey
                   )
tesla_news <- 
  GET(url = GET_term)
write_rds(tesla_news, 
          file = "tesla_news.rds")
tesla_news <- read_rds("tesla_news.rds")
tesla_news$status_code

See here the HTTP status codes.

200 means success, everything above is not so good.

tesla_content <-
  fromJSON(rawToChar(tesla_news$content))
names(tesla_content)
## [1] "status"       "totalResults" "articles"
tesla_content$articles %>% 
  select(-source)
author title description url urlToImage publishedAt content
Wall Street Breakfast Wall Street Breakfast: What Were They Thinking? Listen on the go! A daily podcast of Wall Street Breakfast will be available by 8:00 a.m. on Seeking Alpha, iTunes, Stitcher and Spotify. https://seekingalpha.com/article/4438128-wall-street-breakfast-what-thinking https://static.seekingalpha.com/uploads/2021/7/2/si_water_2021_630x220_shaq_v1.jpg 2021-07-07T11:16:43Z Listen on the go! A daily podcast of Wall Street Breakfast will be available by 8:00 a.m. on Seeking Alpha, iTunes, Stitcher and Spotify.
What were they thinking? More details are coming together … [+8662 chars] | |Frank Holland |Visa says crypto linked card usage tops $1 billion in first half of 2021 |“We are doing a lot to create an ecosystem that makes cryptocurrency more usable and more like any other currency,” Visa CFO Vasant Prabhu told CNBC. |https://www.cnbc.com/2021/07/07/visa-says-crypto-linked-card-usage-tops-1-billion-in-first-half-of-2021.html |https://image.cnbcfm.com/api/v1/image/105845579-1554971823404coinbase_card_-_21.jpg?v=1560405899 |2021-07-07T11:05:01Z |Visa said Wednesday that more than $1 billion worth of cryptocurrency was spent by consumers globally on goods and services through their crypto-linked cards in the first six months of the year. By … [+2937 chars] | |Nicholas Pongratz |Synthetic Stock Trading Popularity Growing in DeFi Sector |The prevalence of trading synthetic versions of Tesla, Apple, Amazon, and other big stocks and ETFs has grown on decentralized finance (DeFi) projects like… |https://finance.yahoo.com/news/synthetic-stock-trading-popularity-growing-105549722.html |https://s.yimg.com/uu/api/res/1.2/LlRPJZUkBU9EYTCy8ZoP4Q--~B/aD03MjA7dz0xMjAwO2FwcGlkPXl0YWNoeW9u/https://media.zenfs.com/en/beincrypto_articles_718/e9207356d0ae5d7b932a71249438d98b |2021-07-07T10:55:49Z |The prevalence of trading synthetic versions of Tesla, Apple, Amazon, and other big stocks and ETFs has grown on decentralized finance (DeFi) projects like Mirror Protocol and Synthetix over the past… [+1942 chars] | |Oğuz Çakır |Elon Musk, Cybertruck’ın yeni özelliğini açıkladı |Tesla ve SpaceX CEO’su Elon Musk, bir süre önce tanıtılan Cybertruck için arka aks yönlendirme sistemi ekledi. İşte diğer yenilikler… Bu içerik ilk olarak Elon Musk, Cybertruck’ın yeni özelliğini açıkladı adresinde yayınlandı Teknoloji Haberleri - ShiftDele… |https://shiftdelete.net/elon-musk-cybertruckin-yeni-ozelligini-acikladi |https://shiftdelete.net/wp-content/uploads/2021/07/elon-musk-cybertruckin-yeni-ozelligini-acikladi.jpg |2021-07-07T10:54:56Z |Bir süre önce Elon Musk tarafndan tantlan Cybertruck, kullanclarn ilgisini çekmeyi baard. Rekor sayda ön sipari alan araç, özellikle ABD’de youn ilgiyle karlat. Ancak 2019’daki etkinlikten sonra sess… [+1405 chars] | |Deutsche Welle (www.dw.com) |Эксперты сообщают о росте заражений коронавирусом в связи с Евро-2020 |В Европе выявлено более 2500 случаев ковидной инфекции в связи с матчами ЧЕ по футболу. При подготовке турнира не было учтено возможное распространение новых заразных вариантов коронавируса, отмечают эксперты. |https://www.dw.com/ru/jeksperty-soobshhajut-o-roste-zarazhenij-kovidom-v-svjazi-s-che-po-futbolu/a-58187964 |https://static.dw.com/image/58183470_6.jpg |2021-07-07T10:54:00Z |.   (ECDC) -2020. 2500 , , . , , ECDC, - . SARS-CoV-2, , 1991. . , - , -2020 . -. ECDC: -2020 436 -   , , -. , , , . , . , , -, , . , ECDC. "“, ,” "  , , . :