Here's a great book on this: https://www.tidytextmining.com/. Incidentally, the author is joining RStudio next month. However, in my experience sklearn's feature extract methods [1] are more straightforward and NLP libraries like Pytorch/Tensorflow, Spacy, NLTK, Gensim and Snorkel are more geared toward python as well.