Where are some corporate AI hubs?
Some Context: Digital Intelligence Index
The Digital Intelligence Index by Fletcher School / Tufts University is nice context information for this little research (data source: https://digitalintelligence.fletcher.tufts.edu/trajectory). They gathered a wide array of secundary data sources per country, aggregated into clusters, components, drivers and final scores rescaled from 0 to 100, estimating how overall digitally mature a nations economy is.
In particular, their scoring for the scorecard component innovation and change is interesting, because AI startups are part of that.
Innovation Momentum might indicate the near future, whereas overall digital innovative indicator signals the current state.
This is more a set of negative indicators, i.e. scoring low means there is properly not a lot of AI startup activity to find in that location. Arguably, the high performers might be more likely to have some AI business activity and possess a mature business ecosystem for AI companies, so this is where we will try to collect data.
pacman::p_load(tidyverse, tmap, openxlsx, sf, tidytable, data.table, tidytext, arrow, rvest, reclin, maps, mapview)
dindex <- openxlsx::read.xlsx("https://sites.tufts.edu/digitalplanet/digitalintelligence/DIIData2020", sheet = "Digital Evolution Main", startRow = 2)
data("World")
World2 <- left_join(World, dindex %>% select(Entity, `Overall Digital Evolution`=Innovation.and.Change.Zone, `Digital Evolution Rank`=Digital.Evolution.Score.Rank, `Digital Evolution Momentum (Rank)`=Innovation.and.Change.Momentum.Rank, `Innovativeness Rank` =Innovation.and.Change.Score.Rank, Innovation.and.Change.Score, `Innovation Momentum Indicator` = Innovation.and.Change.Momentum, iso_a3=ISO3C))
## Joining, by = "iso_a3"
World2 <- st_sf(World2)
tm_shape(World2) +
tm_polygons(col = "Digital Evolution Momentum (Rank)",
legend.hist = TRUE, palette = "seq", style = "order") +
tm_layout(legend.outside = F, aes.palette = list(seq = "-RdBu"))
## Warning: Histogram not supported for styles "cont" or "order"
tm_shape(World2) +
tm_polygons(col = "Innovativeness Rank",
# style = "kmeans",
legend.hist = TRUE, palette = "RdGy") +
tm_layout(legend.outside = F)
Overall Digital Maturity and Potential
Example places where one might expect to find mature and thriving AI hubs:
- USA
- Germany, Poland, Ireland
- China
AI company locations
I scraped a few datasets about AI startup and AI company locations. Let’s first find out where AI hubs are. Because the talent concentrates there, we can use this to narrow down our search for AI startup hubs.
ai_companies <- fread("/run/media/knut/HD/MLearningAlgoTests/aicompanies.csv")
ai_companies %>% select(company, website, location) %>% sample_n(10)
## company website
## 1: Cyber Surgery https://cyber-surgery.com
## 2: Oneclick.Ai http://www.oneclick.ai
## 3: Journi http://journiapp.com
## 4: Talus Bioscience https://www.talus.biohttp://talus.bio
## 5: Overwatch Imaging http://www.overwatchimaging.com
## 6: Augury (company) https://www.augury.com/https://augury.com
## 7: Synap https://synap.ac/
## 8: TripGuru http://tripguru.io
## 9: PrepFlash http://www.prepflash.com
## 10: Sticky Technology http://stickytechnology.net/spiralvortex
## location
## 1: San Sebastián
## 2: Bellevue, Washington
## 3: ViennaAustria
## 4: Seattle
## 5: Hood River, Oregon
## 6: New York CityNew York
## 7: Leeds
## 8: London
## 9: Austin, TexasCollege Station, Texas
## 10: Mexico City
Locations column need some work.
source("/home/knut/Documents/clean.R")
ai_companies <- ai_companies %>% mutate(location=clean(str_squish(str_replace_all(location, '([[:upper:]])', ' \\1')))) %>% separate_rows(location, sep = "([[:upper:]])") %>% unnest_ngrams("ngram", "location", n_min = 1, drop = F)
cities <- fread("https://gist.githubusercontent.com/curran/13d30e855d48cdd6f22acdf0afe27286/raw/0635f14817ec634833bb904a47594cc2f5f9dbf8/worldcities_clean.csv") %>% mutate(location=tolower(city)) %>% select(location, country, population) %>% arrange(location, desc(population))
cities <- cities[!duplicated(location)] %>% select(-population)
ai_companies <- ai_companies %>% inner_join(cities) %>% select(-ngram) %>% distinct()
## Joining, by = "location"
print(c(nrow(ai_companies), nrow(fread("/run/media/knut/HD/MLearningAlgoTests/aicompanies.csv"))))
## [1] 3508 12487
The database has 3.5k records after cleaning, but of a total of 12k records. Let’s take a look which countries have the most AI companies according to this source:
ggcharts::bar_chart(ai_companies, country, top_n = 30)
There is more in here. I’m forcing myself to do a quick and dirty job, not exhaustive information extraction. It’s just a blog post.
kg_names <- read_parquet("/run/media/knut/HD/MLearningAlgoTests/data/polar/w5mentities.parquet") %>% separate.(label, c("selector", "disambiguation"), sep = "[(]") %>% mutate.(disambiguation=str_remove(disambiguation, "[)]"))
kg_names_locations <- kg_names %>% right_join.(cities%>% mutate(nchar=nchar(location)) %>% filter(nchar>3) %>% mutate.(selector=location)) %>% left_join.(kg_names %>% select.(wikientity, spelling=selector) %>% mutate.(spelling=str_squish(spelling))) %>% distinct.() %>% mutate(nchar2=nchar(spelling)) %>% filter(nchar2>3)
kg_names_locations[is.na(kg_names_locations$spelling),"spelling"] <- kg_names_locations[is.na(kg_names_locations$spelling),"selector"]
chinese_cities_spellings <- read_html("https://en.wikipedia.org/wiki/List_of_cities_in_China") %>% html_node("table.selected_now") %>% html_table() %>% select(location=City, spelling=Chinese) %>% mutate(location=tolower(location), country="China")
kg_names_locations <- kg_names_locations %>% select.(location, country, spelling) %>% bind_rows.(chinese_cities_spellings)
kg_names_locations <- kg_names_locations %>% distinct.()
websites_html <- arrow::read_parquet("/run/media/knut/HD/MLearningAlgoTests/aistartup2") %>% na.omit()
websites_html2 <- arrow::read_parquet("/run/media/knut/HD/MLearningAlgoTests/aicompanies_websites2") %>% na.omit()
websites_html <- bind_rows.(websites_html, websites_html2)
websites_ngrams <- websites_html %>% unnest_ngrams("ngram", body, n_min = 1)
locations_quick <- websites_ngrams %>% mutate(ngram=tolower(ngram)) %>% inner_join.(kg_names_locations %>% rename(ngram=spelling)) %>% rename(spelling=ngram) %>% distinct.()
suffix <- locations_quick %>% mutate(suffix=urltools::domain(source) %>% urltools::suffix_extract())
suffix <- suffix$suffix$suffix
locations_clean <- locations_quick %>% mutate(suffix=suffix)
country_codes <- read_html("https://www.sitepoint.com/complete-list-country-code-top-level-domains/") %>% html_table("td", header = F, trim = T) %>% as.data.frame()%>% filter(X1%in%c(".ai", ".io", ".com")==F) %>% rename(suffix=X1, Country=X2) %>% mutate(suffix=str_remove(suffix, ".")) %>% mutate(Country=str_replace(Country, "People's Republic of China", "China")) %>% mutate(Country=str_replace(Country, "United States of America", "United States"))
## Warning: The `fill` argument of `html_table()` is deprecated as of rvest 1.0.0.
## An improved algorithm fills by default so it is no longer needed.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
locations_clean <- locations_clean %>% left_join.(country_codes)
locations_filter <- locations_clean %>% filter(suffix%in%c("co", "ai", "com", "net", "org")==F) %>% mutate(country=ifelse(suffix=="io", "United States", country)) %>% filter(country==Country)
I am mostly interested in grabbing the address data from the small address boxes usually at the bottom of the website. As a simple heuristic, I parse all data with libpostal docker container, which can classify location strings as belonging to cities, streetnames etc. If the city is from the same country, I’ll add it to the data. I checked how often it is right with a sample. More than 90%, though there is some data loss, some companies are on multiple locations.
# docker run -d -p 8070:8080 clicksend/libpostal-rest
Using a knowledge graph and other stuff to get a few more records out of here. These measures yield ca 3800 records, roughly 300 more, many new records from India, as shown below (only the new records).
parse_address <- function(address, source) {
prep_query <- function(x) {
paste0('{"query": "', x , '"}')
}
query <- prep_query(clean(address))
query %>%
purrr::map_dfr(~
httr::POST(url="localhost:8070/parser", body = .x) %>%
httr::content("text", encoding = "UTF-8") %>%
jsonlite::fromJSON() %>% mutate(source=source)
)
}
safeparse <- possibly(parse_address, otherwise = data.table(label=c(), value=c(), source=c()))
websites_html <- arrow::read_parquet("/run/media/knut/HD/MLearningAlgoTests/aistartup2") %>% na.omit()
websites_html2 <- arrow::read_parquet("/run/media/knut/HD/MLearningAlgoTests/aicompanies_websites2") %>% na.omit()
websites_html <- bind_rows.(websites_html, websites_html2)
#adresses_extracted <-purrrgress::pro_map2_dfr(.x = websites_html$body, .y = websites_html$source, .f = safeparse)
adresses_extracted <- arrow::read_parquet("/run/media/knut/HD/MLearningAlgoTests/aicompanies_websites_adresses")
roads <- adresses_extracted %>% filter.(label=="road") %>% distinct.(source, value, label)
city <- adresses_extracted %>% filter.(label=="city") %>% distinct.(source, value, label)
kg_names <- read_parquet("/run/media/knut/HD/MLearningAlgoTests/data/polar/w5mentities.parquet") %>% separate.(label, c("selector", "disambiguation"), sep = "[(]") %>% mutate.(disambiguation=str_remove(disambiguation, "[)]"))
kg_names_locations <- kg_names %>% right_join.(cities%>% mutate(nchar=nchar(location)) %>% filter(nchar>3) %>% mutate.(selector=location)) %>% left_join.(kg_names %>% select.(wikientity, spelling=selector) %>% mutate.(spelling=str_squish(spelling))) %>% distinct.() %>% mutate(nchar2=nchar(spelling)) %>% filter(nchar2>3)
kg_names_locations[is.na(kg_names_locations$spelling),"spelling"] <- kg_names_locations[is.na(kg_names_locations$spelling),"selector"]
chinese_cities_spellings <- read_html("https://en.wikipedia.org/wiki/List_of_cities_in_China") %>% html_node("table.selected_now") %>% html_table() %>% select(location=City, spelling=Chinese) %>% mutate(location=tolower(location), country="China")
kg_names_locations <- kg_names_locations %>% select.(location, country, spelling) %>% bind_rows.(chinese_cities_spellings)
kg_names_locations <- kg_names_locations %>% distinct.()
city_confirmed <- city %>% mutate(value=tolower(value)) %>% inner_join.(kg_names_locations %>% rename(value=spelling)) %>% rename(spelling=value) %>% distinct.()
body <- websites_html%>% inner_join.(city_confirmed)
suffix <- body %>% mutate(suffix=urltools::domain(source) %>% urltools::suffix_extract())
suffix <- suffix$suffix$suffix
body <- body %>% mutate(suffix=suffix)
country_codes <- read_html("https://www.sitepoint.com/complete-list-country-code-top-level-domains/") %>% html_table("td", header = F, trim = T) %>% as.data.frame()%>% filter(X1%in%c(".ai", ".io", ".com")==F) %>% rename(suffix=X1, Country=X2) %>% mutate(suffix=str_remove(suffix, ".")) %>% mutate(Country=str_replace(Country, "People's Republic of China", "China")) %>% mutate(Country=str_replace(Country, "United States of America", "United States"))
ai_companies_no_proc <- fread("/run/media/knut/HD/MLearningAlgoTests/aicompanies.csv")
body_city <- body %>% left_join.(country_codes) %>% filter(country==Country) %>% distinct.(-body) %>% inner_join.(ai_companies_no_proc %>% select(source=website, company))
ai_companies_more <- bind_rows.(locations_filter%>% inner_join.(ai_companies_no_proc %>% select(source=website, company)), body_city)
ai_companies_more <- ai_companies_more %>% distinct(company, location, country, source) %>% as.data.frame() %>% filter(company%in%c("SIS Software GmbH", "JOBFIE", "Neurobotics", "Rai")==F, location%in%c("orange", "mobile")==F) %>% bind_rows.(ai_companies_more%>% filter(company%in%c("Neurobotics")==T) %>% filter(location=="moscow"))
ai_companies <- ai_companies %>% bind_rows.(ai_companies_more)%>% distinct(company, location, country) %>% as.data.frame()
ggcharts::bar_chart(ai_companies_more, country, top_n = 30)
#roads_cities <- roads %>% inner_join.(adresses_extracted)
Further datasets
I also collected two other datasets. Now we have to merge and deduplicate them.
ai_startups_europe <- openxlsx::read.xlsx("/run/media/knut/HD/MLearningAlgoTests/ai_startups.xlsx", sheet = 1)
ai_startups_world <- openxlsx::read.xlsx("/run/media/knut/HD/MLearningAlgoTests/ai_startups.xlsx", sheet = 2)
ai_startups_europe %>% sample_n(10)
## Name Logo Website Country
## 1 Bodyguard.ai NA https://www.bodyguard.ai/fr France
## 2 Mobius Labs NA https://www.mobiuslabs.com/ Germany
## 3 Truphysics NA http://truphysics.com/ Germany
## 4 Canotic NA http://canotic.com/ Germany
## 5 Deepsense NA http://www.thedeepsense.co/ France
## 6 Keen Eye NA https://www.keeneyetechnologies.com/en/ France
## 7 Hallidai AI Gaming NA http://hellofridai.com/ Germany
## 8 Loop Robots NA https://looprobots.com/ Netherlands
## 9 Toposens NA http://toposens.com/ Germany
## 10 Pricefx NA http://www.pricefx.com/ Germany
## City Founding.Year Enterprise.Intell.
## 1 Nice 2017 Computer Linguistics
## 2 Berlin 2018 Computer Vision
## 3 Stuttgart 2014 <NA>
## 4 Berlin 2018 <NA>
## 5 Paris 2018 Computer Vision
## 6 Paris 2015 <NA>
## 7 Berlin 2018 <NA>
## 8 Delft 2020 <NA>
## 9 Munich 2015 <NA>
## 10 Pfaffenhofen An Der Glonn 2011 Discovery
## Enterprise.Func. Industry Technology.Type
## 1 <NA> Other Service Activities <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> Manufacturing <NA>
## 4 <NA> <NA> Applications
## 5 IT & Security <NA> <NA>
## 6 <NA> Human Health & Social Work Activities <NA>
## 7 <NA> Arts, Entertainment & Recreation <NA>
## 8 <NA> Human Health & Social Work Activities <NA>
## 9 <NA> <NA> Infrastructure
## 10 Sales <NA> <NA>
ai_startups_world %>% sample_n(10)
## Country State City Name
## 1 United States Massachusetts Boston Adhark
## 2 Japan <NA> <NA> Kinpen
## 3 United States California Sunnyvale ThroughPut Inc.
## 4 United States New York New York Pymetrics
## 5 Israel <NA> Tel Aviv PhraseTech
## 6 Japan <NA> Fukuoka Next-System
## 7 Israel <NA> <NA> K.Y.C Int.
## 8 Canada Newfoundland and Labrador St. John's Afinin Labs
## 9 United Kingdom England London Behavox
## 10 China <NA> Shenzhen Prafly
## Category.-.Final
## 1 Computer Vision
## 2 Travel
## 3 Business Intelligence
## 4 Human Resources
## 5 Sales/Marketing
## 6 Software Development
## 7 Defense/Security
## 8 Core AI
## 9 Business Intelligence
## 10 Communication
## Description
## 1 Adhark is an image performance software company.
## 2 <NA>
## 3 Data Consultant Automation that helps supply chains run leaner.
## 4 Pymetrics is the next generation job marketplace.
## 5 <NA>
## 6 cutting-edge technology and utilizing the content, Provide the best service to society, the future society is rich in comfort and convenience for everyone and creating a corporate mission .
## 7 <NA>
## 8 Afinin Labs has a proprietary machine learning system that identifies and adapts to trends in the financial markets. The system generates buy/sell signals for targeted equities when it determines there is an opportunity for profit. Pioneered by experts in the field of machine learning and applied financial analysis, Afinin provides a trading algorithm unlike anything available in the marketplace. Backed by years of research excellence, the algorithm delivers a trading signal with a high level of prediction accuracy to the execution systems of hedge funds. Hedge funds realize an exceptional rate of return for investors by leveraging the algorithm for the management of a portion of their funds under management.
## 9 Behavox is an enterprise compliance software company which provides holistic employee surveillance solutions. The company's solutions allow Senior Management, Risk & Compliance Officers to detect cases of market abuse, insider threat, collusion and reckless behavior in real time.
## 10 <NA>
## Website
## 1 http://www.adhark.com/
## 2 https://info.kinpen.me/
## 3 https://throughput.world
## 4 https://www.pymetrics.com
## 5 https://www.phrasetech.com
## 6 http://www.next-system.com/eng/index.html
## 7 https://www.kycint.com/home
## 8 http://www.afinin.com
## 9 http://www.behavox.com
## 10 http://www.prafly.com/
I use the reclin package for record linkage deduplication based on string edit distance (jaccard). For the first two datasets, there are only a few duplicates.
world_s <- ai_startups_world %>% select(company=Name, country=Country, location=City)%>% mutate(location=tolower(location))
europe <- ai_startups_europe %>% select(company=Name, country=Country, location=City)%>% mutate(location=tolower(location))
p <- pair_blocking(world_s, europe, large = FALSE)
p <- compare_pairs(p, by = c("company", "location", "country"))
p <- compare_pairs(p, by = c("company", "location", "country"),
default_comparator = jaccard(0.9), overwrite = TRUE)
p <- score_simsum(p, var = "simsum")
m <- problink_em(p)
## Warning: `group_by_()` was deprecated in dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
p <- score_problink(p, model = m, var = "weight")
p <- select_threshold(p, "weight", var = "threshold", threshold = 18.94455)
p <- add_from_x(p, id_x = "id")
linked_data_set <- link(p) %>% na.omit()
linked_data_set
## company.x country.x location.x company.y
## 1 Vyer Technologies Sweden stockholm LINKAI Technologies AB
## 2 SoftRobot Sweden uppsala SoftRobot Sweden AB
## 3 Aiir Innovations Netherlands amsterdam Aiir Innovations
## 4 Storyzy France paris Storyzy
## 5 Lili.ai France paris Lili.ai
## 6 LightOn France paris LightOn
## 7 Scibids Technology France paris Kili Technology
## 8 Shift Technology France paris Kili Technology
## 9 Calldesk France paris calldesk
## 10 Walnut Algorithms France paris Walnut Algorithms
## 11 Heuritech France paris Heuritech
## 12 Scibids Technology France paris Scibids Technology
## 13 Shift Technology France paris Scibids Technology
## 14 Doctrine France paris Doctrine
## 15 Beyable France paris Beyable
## 16 Scibids Technology France paris Shift Technology
## 17 Shift Technology France paris Shift Technology
## 18 Wiidii France bordeaux Wiidii
## 19 EasyMile France toulouse Easymile
## 20 Deepomatic France paris Deepomatic
## 21 Craft AI France paris Craft.AI
## 22 Sensewaves France paris Sensewaves
## 23 Clustaar France paris Clustaar
## 24 DreamQuark France paris DreamQuark
## 25 Ubiant France lyon Ubiant
## 26 Kayrros France paris Kayrros
## 27 SESAMm France metz SESAMm
## 28 Qynapse France paris Qynapse
## 29 Karos France paris Karos
## 30 Invenis France paris Invenis
## 31 Happyr Sweden stockholm Happyr AB
## 32 Unibap Sweden uppsala Unibap AB
## 33 Unibap Sweden uppsala Unibap AB
## 34 Greater Than Sweden stockholm Greater Than
## 35 Gleechi Sweden stockholm Gleechi AB
## 36 Univrses Sweden stockholm Univrses AB
## 37 Lexplore Sweden stockholm Lexplore
## 38 Imagimob Sweden stockholm Imagimob AB
## 39 Aaron Germany berlin Aaron
## 40 Ada Germany berlin Ada
## 41 parlamind Germany berlin Parlamind
## 42 Risk Ident Germany hamburg Risk Ident
## 43 Rasa Germany berlin Rasa
## 44 Bunch Germany berlin Bunch
## 45 Fraugster Germany berlin Fraugster
## 46 Inspirient Germany berlin Inspirient
## 47 Mapegy Germany berlin Mapegy
## 48 Lateral Germany berlin Lateral
## 49 Lateral Germany berlin Realrate
## 50 Cargonexx Germany hamburg Cargonexx
## 51 TwentyBN Germany berlin Twentybn
## 52 German Autolabs Germany berlin German Autolabs
## 53 Xbird Germany berlin Xbird
## 54 picsure Germany munich Picsure
## 55 Picsure Germany munich Picsure
## 56 micropsi industries Germany berlin Micropsi Industries
## 57 Micropsi Industries Germany berlin Micropsi Industries
## 58 Contiamo Germany berlin Contiamo
## 59 Explosion AI Germany berlin Explosion
## 60 SearchInk Germany berlin SiaSearch
## country.y location.y
## 1 Sweden stockholm
## 2 Sweden uppsala
## 3 Netherlands amsterdam
## 4 France paris
## 5 France paris
## 6 France paris
## 7 France paris
## 8 France paris
## 9 France paris
## 10 France paris
## 11 France paris
## 12 France paris
## 13 France paris
## 14 France paris
## 15 France paris
## 16 France paris
## 17 France paris
## 18 France bordeaux
## 19 France toulouse
## 20 France paris
## 21 France paris
## 22 France paris
## 23 France paris
## 24 France paris
## 25 France lyon
## 26 France paris
## 27 France metz
## 28 France paris
## 29 France paris
## 30 France paris
## 31 Sweden stockholm
## 32 Sweden uppsala
## 33 Sweden uppsala
## 34 Sweden stockholm
## 35 Sweden stockholm
## 36 Sweden stockholm
## 37 Sweden stockholm
## 38 Sweden stockholm
## 39 Germany berlin
## 40 Germany berlin
## 41 Germany berlin
## 42 Germany hamburg
## 43 Germany berlin
## 44 Germany berlin
## 45 Germany berlin
## 46 Germany berlin
## 47 Germany berlin
## 48 Germany berlin
## 49 Germany berlin
## 50 Germany hamburg
## 51 Germany berlin
## 52 Germany berlin
## 53 Germany berlin
## 54 Germany munich
## 55 Germany munich
## 56 Germany berlin
## 57 Germany berlin
## 58 Germany berlin
## 59 Germany berlin
## 60 Germany berlin
There were some duplicates in the scraped dataset. Getting rid of them here.
both <- world_s %>% bind_rows.(europe %>% filter(company%in%linked_data_set$company.y==F)) %>% distinct.() %>% na.omit()
p <- pair_blocking(ai_companies, ai_companies, large = FALSE)
p <- compare_pairs(p, by = c("company", "location", "country"))
p <- compare_pairs(p, by = c("company", "location", "country"),
default_comparator = jaccard(0.9), overwrite = TRUE)
p <- score_simsum(p, var = "simsum")
m <- problink_em(p)
p <- score_problink(p, model = m, var = "weight")
p <- select_threshold(p, "weight", var = "threshold", threshold = 10.95548)
p <- add_from_x(p, id_x = "id")
p <- p %>% filter(weight<16.40807)
linked_data_set <- link(p) %>% na.omit()
linked_data_set
## [1] company.x location.x country.x company.y location.y country.y
## <0 rows> (or 0-length row.names)
These are mostly literal matches.
ai_companies <- ai_companies%>% filter(company%in%linked_data_set$company.y==F)
p <- pair_blocking(ai_companies, both, large = FALSE)
p <- compare_pairs(p, by = c("company", "location", "country"))
p <- compare_pairs(p, by = c("company", "location", "country"),
default_comparator = jaccard(0.9), overwrite = TRUE)
p <- score_simsum(p, var = "simsum")
m <- problink_em(p)
p <- score_problink(p, model = m, var = "weight")
p <- select_threshold(p, "weight", var = "threshold", threshold = 9.959988)
p <- add_from_x(p, id_x = "id")
p <- p %>% filter(weight<16.40807)
linked_data_set <- link(p) %>% na.omit()
linked_data_set
## company.x location.x country.x
## 1 Alitheia Technologies Inc. toronto Canada
## 2 Augmented Knowledge incheon Korea, South
## 3 Cochlear.ai seoul Korea, South
## 4 Dream Youngs seoul Korea, South
## 5 Driva - AI Dash Cam Driving Assistant shanghai China
## 6 Elektronik Virtual Asisten (EVA.id) bandung Indonesia
## 7 GoodAI prague Czechia
## 8 Granata Decision Systems toronto Canada
## 9 Internuncio Technologies Inc. vancouver Canada
## 10 MeasureChina seoul Korea, South
## 11 Moran Cognitive Technology beijing China
## 12 Omnious seoul Korea, South
## 13 Qualaris Healthcare Solutions pittsburgh United States
## 14 Riiid seoul Korea, South
## 15 SC5 (part of Nordcloud Group) helsinki Finland
## 16 Visual Camp seoul Korea, South
## company.y country.y location.y
## 1 Alitheia Technologies Inc. Canada toronto
## 2 Augmented Knowledge South Korea incheon
## 3 Cochlear.ai South Korea seoul
## 4 Dream Youngs South Korea seoul
## 5 Driva - AI Dash Cam Driving Assistant China shanghai
## 6 Elektronik Virtual Asisten (EVA.id) Indonesia bandung
## 7 GoodAI Czech Republic prague
## 8 Granata Decision Systems Canada toronto
## 9 Internuncio Technologies Inc. Canada vancouver
## 10 MeasureChina South Korea seoul
## 11 Moran Cognitive Technology China beijing
## 12 Omnious South Korea seoul
## 13 Qualaris Healthcare Solutions United States pittsburgh
## 14 Riiid South Korea seoul
## 15 SC5 (part of Nordcloud Group) Finland helsinki
## 16 Visual Camp South Korea seoul
Unsurprisingly, this dataset will have some bias, but it should work fine to show AI hubs. Below how many companies come from the top 30 countries. The dataset contains more than 7k AI companies.
ai_companies_all <- ai_companies %>% bind_rows.(both %>% filter(company%in%linked_data_set$company.y==F)) %>% distinct.()
ggcharts::bar_chart(ai_companies_all, country, top_n = 30)
AI hubs around the world
ai_companies_per_city <- ai_companies_all %>% group_by(location, country) %>% count(sort = T)%>% rename(`Companies in City`=n) %>% mutate(country=str_replace(country, "United States", "USA"), large=ifelse(`Companies in City`>9, location, NA))
ai_companies_per_country <- ai_companies_all %>% group_by(country) %>% count(sort = T) %>% rename(`Companies in Country`=n)
World3 <- left_join(World, ai_companies_per_country %>% rename(name=country)) %>% ungroup()
## Joining, by = "name"
World3 <- st_sf(World3)
ai_cities <- world.cities %>% mutate(location=tolower(name)) %>% inner_join.(ai_companies_per_city, by=c("location"="location", "country.etc"="country")) %>% filter(location!="mobile")
cities <- ai_cities %>%
st_as_sf(coords = c("long", "lat"), crs = 4326) %>%
st_cast("POINT")
tmap_mode("plot")
## tmap mode set to plotting
ai_hubs <- tm_shape(World3) +
tm_polygons(col = "Companies in Country", style = "fisher", palette = "-RdGy", n=10) +
tm_layout(legend.outside = F) + tm_shape(cities)+tm_bubbles(size = "Companies in City", col = "Companies in City", palette="Blues") + tmap::tm_style("grey")+ tm_text("large", size = "Companies in City")
ai_hubs
## Note that tm_style("grey") resets all options set with tm_layout, tm_view, tm_format, or tm_legend. It is therefore recommended to place the tm_style element prior to the other tm_layout/tm_view/tm_format/tm_legend elements.
library(sf)
north_america <- st_bbox(cities %>% filter(country.etc %in% c("USA", "Canada", "Mexico")))
tm_shape(World3, bbox = north_america) +
tm_polygons(col = "Companies in Country", style = "fisher", palette = "-RdGy", n=10) +
tm_layout(legend.outside = F) + tm_shape(cities)+tm_bubbles(size = "Companies in City", col = "Companies in City", palette="Blues") + tmap::tm_style("grey")+ tm_text("large", size = 0.5)+ tm_legend(show=FALSE)
## Note that tm_style("grey") resets all options set with tm_layout, tm_view, tm_format, or tm_legend. It is therefore recommended to place the tm_style element prior to the other tm_layout/tm_view/tm_format/tm_legend elements.
Hubs, hubs almost everywhere
europe <- st_bbox(cities %>% filter(country.etc %in% c("Spain", "Finland", "Turkey", "Ireland")))
tm_shape(World3, bbox = europe) +
tm_polygons(col = "Companies in Country", style = "fisher", palette = "-RdGy") +
tm_layout(legend.show = F) + tm_shape(cities)+tm_bubbles(size = "Companies in City", col = "Companies in City", palette="Blues") + tmap::tm_style("grey")+ tm_text("large", size = 0.5)+ tm_legend(show=FALSE)
## Note that tm_style("grey") resets all options set with tm_layout, tm_view, tm_format, or tm_legend. It is therefore recommended to place the tm_style element prior to the other tm_layout/tm_view/tm_format/tm_legend elements.
east_asia <- st_bbox(cities %>% filter(country.etc %in% c("China", "Japan")))
tm_shape(World3, bbox = east_asia) +
tm_polygons(col = "Companies in Country", style = "fisher", palette = "-RdGy") +
tm_layout(legend.show = F) + tm_shape(cities)+tm_bubbles(size = "Companies in City", col = "Companies in City", palette="Blues") + tmap::tm_style("grey")+ tm_text("large", size = 0.8)+ tm_legend(show=FALSE)
## Note that tm_style("grey") resets all options set with tm_layout, tm_view, tm_format, or tm_legend. It is therefore recommended to place the tm_style element prior to the other tm_layout/tm_view/tm_format/tm_legend elements.
india <- st_bbox(cities %>% filter(country.etc %in% c("India")))
tm_shape(World3, bbox = india) +
tm_polygons(col = "Companies in Country", style = "fisher", palette = "-RdGy") +
tm_layout(legend.show = F) + tm_shape(cities)+tm_bubbles(size = "Companies in City", col = "Companies in City", palette="Blues") + tmap::tm_style("grey")+ tm_text("large", size = 0.8)+ tm_legend(show=FALSE)
## Note that tm_style("grey") resets all options set with tm_layout, tm_view, tm_format, or tm_legend. It is therefore recommended to place the tm_style element prior to the other tm_layout/tm_view/tm_format/tm_legend elements.
And the table itself.
ai_companies_all
## # A tidytable: 7,357 × 3
## company location country
## <chr> <chr> <chr>
## 1 Hong Jing Drive hangzhou China
## 2 Gritworld shanghai China
## 3 Anruan Keji shenzhen China
## 4 Jiangsu Fant Technology hong kong Hong Kong
## 5 Hebin Intelligence hefei China
## 6 Lvzhou Technology hefei China
## 7 Fantai AI shanghai China
## 8 ccvui.com hangzhou China
## 9 Intengine Technology beijing China
## 10 CAIWA Service tokyo Japan
## # … with 7,347 more rows
Could be more could be less ;)