Blog

Interpretable Word Embeddings from knowledge graph embeddings

Tuesday, Nov 22, 2022 by Knut Jägersberg

Interpretable Word Embeddings from knowledge graph embeddings A while ago, I created interpretable word embeddings using polar opposites (I used their jupyter notebook from here https://github.com/Sandipan99/POLAR) from wikidata5m knowledge graph embeddings (from here: https://graphvite.io/docs/latest/pretrained_model.html). It resulted in a gigantic file of pretrained embeddings which sort concepts along 700 semantic differentials, i.e. like good/bad. However, the wikidata5m knowledge graph is huge. Roundabout 5 million concepts and 13 million spellings. A joined parquet file would properly take 100 GB of disk space.

Cleaning data science engagement data

Wednesday, Nov 16, 2022 by Knut Jägersberg

Content Intelligence headline engagement In this post, I’ll mix together a bunch of headlines datasets I discovered with engagement and make it suitable for predicting engagement level from text for the domain of content intelligence. Data sources tweets on data science reddit posts search keywords blog posts ML paper social shares Content Intelligence Tweets These tweets come from various topics from data science and content marketing.

What is influence engineering?

Monday, Oct 3, 2022 by Knut Jägersberg

What is influence engineering? Influence engineering refers to the practice of narrating strategic communication backed by content intelligence, applied influence science, design thinking and systems thinking. Influence engineering is an engineering practice, because the design and craft of embedded StratCom is akin to building and maintaining a machine. Influence engineering is an application of narrative intelligence. Narrative intelligence refers to the capacity of solving problems by telling stories. Influence engineering always has the aim of some form of social engineering, though the term has become too exclusively associated with cybersecurity problems.

Mining Trends in Data Science Blog Headlines - Fractal Dimension Reduction for Topic Modeling

Thursday, Aug 25, 2022 by Knut Jägersberg

Fractal Dimension Reduction for Topic Modeling In this post, I will share an alternative approach to mine important topics from Data Science Blog headlines. This approach does not use clustering, but tries to explain document embeddings along interpretable dimensions. Many reprojections using PCA and the likes are possible. We want to use an approximation of the datasets intrinsic dimensions, the fractal dimension of the reprojection method.

Mining Trends in Data Science Blog Headlines - Synonym Extraction with Shallow Transfer Learning from CommonSense Knowlege graph embeddings and Fasttext embeddings pretrained on 1.3 billion Google Queries

Monday, Aug 8, 2022 by Knut Jägersberg

Mining Trends in Data Science Blog Headlines - Spawn Selectors We’ll look at how to mine blog posts for learning about influence vectors of the data science bubble. A workflow for labeling text data in an ML supported way. It is based on the same process used for writing dictionaries in dictionary based labeling approaches and is utilizes semantic folding. Semantic folding is the utilization of self-supervised learning and massive pretrained models to journey through semantic space at superlogical velocity.

AI startup hubs

Thursday, Jun 2, 2022 by Knut Jägersberg Data Journalism

Where are some corporate AI hubs? Some Context: Digital Intelligence Index The Digital Intelligence Index by Fletcher School / Tufts University is nice context information for this little research (data source: https://digitalintelligence.fletcher.tufts.edu/trajectory). They gathered a wide array of secundary data sources per country, aggregated into clusters, components, drivers and final scores rescaled from 0 to 100, estimating how overall digitally mature a nations economy is. In particular, their scoring for the scorecard component innovation and change is interesting, because AI startups are part of that.

Thought vectors for text classification in R

Tuesday, Apr 26, 2022 by Knut Jägersberg Natural Language Processing, Text Mining, Topic Modelling, Text Classification

What are thought vectors? The term thought vector is often used as synonym for (deep learning based) document embeddings. Originally, thought vectors were a generalization of skip-gram word2vec to capture meaning on the more abstract level of the “trains of thought” amongst text documents. Whilst the original skip-thought architecture uses a bi-directional recurrent neural network, the idea of thought vectors is simply to apply the same reasoning from word vectors on sentence representations: The distribution of the meaning of neighboring claims encodes the meaning of each claim.

Finding long tail keywords by scraping Google Search results

Thursday, Apr 14, 2022 Webscarping, Coding, SEO

What are long tail keywords? Long tail keywords signal more specific information needs. Long tail keywords are useful for content strategy and SEO. Long tail keyword phrases can be questions, verb phrases and and entities of interest of your target audience expressed as word ngrams. Why scrape search results? Search results are valuable data for digital marketing and stakeholder intelligence. The best way to understand what your stakeholders problems are is to put yourself into their shoes.