Joachim Schork – Author & Founder of Statistics Globe

Joachim Schork Statistician Programmer

Hi, my name is Joachim Schork and I’m the guy behind Statistics Globe.

On this page, I’ll give you a brief overview about my background in statistics and why I started this platform.

I have spent a lot of time increasing my skills in statistics and programming within the last 15 years. My academic career started with a Bachelor in Educational Science at the University of Tübingen, Germany.

During my Bachelor’s studies, I fell in love with statistics the first time. I focused as much as possible on statistical research methods for the monitoring of students’ skills and the assessment of educational interventions.

Due to my increasing interest in statistics and the corresponding software tools such as R and Python, I decided to focus even more on this field in my Master’s studies. For that reason, I moved to Trier University, where I have finished a Master of Survey Statistics and an EMOS Certificate in 2017.

 

Afterward, I started a job as a Microdata Expert located at STATEC, the national statistical institute of Luxembourg.

This job position was definitely another boost for my statistical skills, since I was able to work on many different data sets such as the Statistics on Income and Living Conditions (SILC), the Labour Force Survey (LFS), and the Business and Leisure Tourism Survey.

In discussions with colleagues or at research conferences, I noticed how important it is to exchange with other statisticians and researchers of different fields, and this was definitely one of the main reasons why I started Statistics Globe.

While working at STATEC, I also started a side-business for online marketing and webdesign. This business was initially a part-time job, but at the end of 2019 I decided to quit my job at STATEC to dedicate all my time to my own business.

Since then, Statistics Globe is not only a way to exchange with other programmers and researchers, it also allows me to combine both of my interests (statistics and webdesign).

Over the years, Statistics Globe has evolved from a one-person company into a thriving team of statistics and data science enthusiasts. Today, the platform attracts hundreds of thousands of visitors monthly, becoming a prominent resource in the field.

Furthermore, we’ve expanded our offerings to include comprehensive online courses and consulting services in addition to our free tutorials. These additions aim to empower individuals to enhance their own data science capabilities or support their projects with our expertise.

Social Media & Contact

As already mentioned: A major goal of this platform is to exchange with other statisticians, data scientists, programmers, and researchers of any field. So please let me know if you have questions, topics you wish to discuss, or if you’re interested in any of our services!

In case you would like to follow me on social media or if you would like to contact me via email, you can find all the details below.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


46 Comments. Leave new

  • Hello Jo

    We like your portal “statistics Globe” and I am a frequent visitor.
    Can you do some tutorials in Debug functions in R. Also if you do so, please include a layman’s explanation about how to interpret explainations that debug functions give because these are equally technical .

    Thanks

    Reply
  • Hallo Joachim, coole Seite. Gibt es die Möglichkeit Beobachtungen in denen Ergebnisse von 1%-100% existieren und Skalen von 1-5 umzuwandeln? Also das man sagt, alles was in dieser Spalte ist, in der die Prozente sind und im Prozentbereich von 1-20% liegt wird in eine 1 umgewandelt und alles was zwischen 21-40% liegt ist eine 2 usw usw. Dafür würde ich direkt Trinkgeld auf Paypal schicken.

    Reply
    • Hi Daniele,

      vielen Dank für die netten Worte!

      Du kannst deine Daten wie folgt umwandeln:

      # Create example data
      set.seed(287346)
      x <- paste0(round(runif(20, 1, 100)), "%")
      x
      #  [1] "32%" "35%" "64%" "70%" "13%" "23%" "70%" "94%" "95%" "68%" "42%" "32%" "39%" "15%" "54%" "57%" "22%" "34%" "61%" "71%"
       
      # Convert data into groups
      x_new <- rep(NA, length(x))
      x_new[x %in% paste0(1:20, "%")] <- 1
      x_new[x %in% paste0(21:40, "%")] <- 2
      x_new[x %in% paste0(41:60, "%")] <- 3
      x_new[x %in% paste0(61:80, "%")] <- 4
      x_new[x %in% paste0(81:100, "%")] <- 5
      x_new
      # [1] 2 2 4 4 1 2 4 5 5 4 3 2 2 1 3 3 2 2 4 4

      Ich hoffe, das hilft dir weiter!

      Falls du mich unterstützen möchtest, kannst du dir gerne mal meinen Patreon Account anschauen (ist natürlich völlig freiwillig! 🙂 ): https://www.patreon.com/statisticsglobe

      Viele Grüße und schöne Feiertage!

      Joachim

      Reply
  • Carsten Grube
    January 30, 2021 2:49 pm

    Congratulations, Joachim, for your great work on statisticsglobe.com – and a huge ‘Thank you’ as it is really helpful in my studying of statistics and R programming. Very well and detailed explanations and yet helt super simple. Just seen your vid on ‘Join Data with dplyr in R’ – what a great way of explaining. This 9 min. video has saved me hours of reading and has made it much more clear to me using your graphical explanations – THANK YOU SO MUCH!

    Reply
    • Carsten,

      Thank you so much for this amazing feedback! I’m very happy to hear that you enjoy my content and that it helps to improve your statistics and R programming skills.

      Don’t hesitate to let me know in the comments in case you have any questions in your future learning progress.

      Regards,

      Joachim

      Reply
  • Dr. Kamal Nain Kapoor
    January 30, 2021 6:33 pm

    Dear Joachim,
    You are doing a great Job. Its really very helpful for all.

    Reply
  • Fuzzy clustering:
    I want to do (time-series data fuzzy clustering) using R program

    Use my data rather than (CharTraj ) in this code but use same style (CharTraj )

    data <-read.csv(file.choose(),sep = ',')

    library("dtwclust")
    data("uciCT")

    # Calculate autocorrelation up to 50th lag
    acf_fun <- function(series, …) {
    lapply(series, function(x) {
    as.numeric(acf(x, lag.max = 50, plot = FALSE)$acf)
    })
    }

    # Fuzzy c-means
    fc <- tsclust(CharTraj[1:25], type = "f", k = 4L,
    preproc = acf_fun, distance = "L2",
    seed = 42)

    # Fuzzy membership matrix
    fc@fcluster

    example results

    ## cluster_1 cluster_2 cluster_3 cluster_4
    ## A.V1 0.944079794 0.010596054 0.020895926 0.0244282262
    ## A.V2 0.973024707 0.004558053 0.009814713 0.0126025278
    ## A.V3 0.910457782 0.013363454 0.026818391 0.0493603740
    ## A.V4 0.487954179 0.212700292 0.219111649 0.0802338802
    ## A.V5 0.557762811 0.172923239 0.188579412 0.0807345380
    ## B.V1 0.128665544 0.034803979 0.082738850 0.7537916278
    ## B.V2 0.010999524 0.002277317 0.004997756 0.9817254027
    ## B.V3 0.197222739 0.033052784 0.061935472 0.7077890056
    ## B.V4 0.166409909 0.031366546 0.050323544 0.7519000007
    ## B.V5 0.427121633 0.235092628 0.187510917 0.1502748225
    ## C.V1 0.311652169 0.047492672 0.197978128 0.4428770302
    ## C.V2 0.007458354 0.002748052 0.986187858 0.0036057365
    ## C.V3 0.075206881 0.051338895 0.840850637 0.0326035878
    ## C.V4 0.340863672 0.055549042 0.357239701 0.2463475850
    ## C.V5 0.015607418 0.006151640 0.970146090 0.0080948526
    ## D.V1 0.017714824 0.958605028 0.016256793 0.0074233544
    ## D.V2 0.047929862 0.903236104 0.030495920 0.0183381136
    ## D.V3 0.002225743 0.994942451 0.001865065 0.0009667418
    ## D.V4 0.004954758 0.988846881 0.004040801 0.0021575597
    ## D.V5 0.018867912 0.954708141 0.017683168 0.0087407796

    Reply
  • how can I do (time-series data fuzzy clustering, for data 25 columns)?

    add to, which code to make my data from column to lists?
    or
    which code can I use to (R Create Data Frame where a Column is a List | Different Variable Types )

    Reply
  • Hi Joachim, your contents are very helpful:). I have a large data (with 11 variables) in .txt and when i use the “read.table (“data.txt”, ….) I do not like how R has read the data. Please see head (data) below. How can i ensure that data is read properly as a data frame or table with 11 dimensions/variables? Your ideas are greatly appreciated, thanks

    combined head(data)
    SNP.Name.Sample.ID.Allele1…Forward.Allele2…Forward.Allele1…Top.Allele2…Top.Allele1…AB.Allele2…AB.GC.Score.X.Y
    1 1-65462706-C-G-rs43237859 JERGBRF000000051557 C C C C A A 0 2250 523
    2 1-69673871-C-T-rs209885271 JERGBRF000000051557 C C G G B B 0 639 2749
    3 1-69756947-C-G-rs469945562 JERGBRF000000051557 C C C C A A 0 1405 455.4
    4 1-69832336-G-A-rs209887380 JERGBRF000000051557 G G G G B B 0 1159 6483
    5 11-19037605-G-A-rs381309800 JERGBRF000000051557 A G A G A B 0 1878 2468
    6 11-19079043-C-T-rs208164936 JERGBRF000000051557 T C A G A B 0 1710 1730

    Reply
    • Hi again Elsie,

      Thank you for the kind words! 🙂

      Regarding your question: It seems like you have to change the separator, i.e. the sep argument within the read.table function.

      For example, you could try the following codes:

      read.table ("data.txt", sep = ",")

      or

      read.table ("data.txt", sep = ";")

      Does this help?

      Joachim

      Reply
  • Mahesh Doshi
    May 19, 2021 10:35 am

    Hi Joachim, I want to make a time series plot of 8 lines and one scatter plot on the same graph. My data is long – in the sense, 2 columns – open for the dates and the other for the output(lines and scatter plot points). How do I do this? I know how to plot the lines on a plot and know how to do a scatter plot – both separately; but how to do it other in one plot?

    Reply
  • Hi Joachim. I am roll two die, 1:4. Initially the exercise was to add the sim, sum(dice), now I’m being asked to modify the function to get the product. I’ve tried reducing sum with prod, but it does not seem to work. I get numbers like 5 and 7 which cannot possibly be the product of any numbers 1 through 4.

    Reply
  • A great Job. Its really very helpful for all. keep it up

    Reply
  • Hello Joachim Schork, I’m Andrés from Colombia (South America), thank you so much for this wonderful and comprehensive work of yours, truly it is one of a kind and so helpful for statistics enthusiast and students worldwide I discovered your page during the pandemic and it was one of the best things that ever happened during this trying times, you make me want to aspire to be like you, thank you so much for the hard work that you have put into creating this site I hope you know that you are loved and that we are very appreciative of your work and knowledge!

    Reply
    • Wow, thank you so much for this wonderful feedback and the very kind words Andres! It makes me very happy to hear that! 🙂

      I wish you the best of luck for your learning progress and just let me know in case you have any questions.

      Greetings from Germany to beautiful Colombia (I’ve been in Bogotá and Cartagena once, and it was a great experience!)

      Reply
  • Алексей Соловьев
    October 8, 2021 7:38 am

    Hi Joachim Schork. Once again, your site helped quickly find a solution to a problem. Your simple and understandable examples often help out. THANKS FOR YOUR WORK !!!
    If you will be in Russia, come visit 🙂

    Reply
    • Hi mate,

      This is great to hear! Thank you so much for this wonderful feedback! 🙂

      I always wanted to travel in Russia, hopefully I can do this soon.

      Regards

      Joachim

      Reply
  • Muhammad Nafees
    December 2, 2021 4:27 pm

    Hi Joachim.
    I have created a heatmap, but the sequences of x-axis and y-axis variables are not organized like in the excel sheet. How can I organize the heatmap?
    thanks

    Reply
  • i need to solve this problem, but i cant

    can you please help me ou..
    Q. s1 = rnorm(100)
    Calculate the square root for the positive values and store it with index.

    Reply
  • Hi, Joachim

    I have been using the R-studio for climate data analysis which is .nc file format. However, I found the error ” incorrect number of dimensions” during image plot. Could you help me, here is the code Ak = ncvar_get(rai, “pr”, start = c(lonind[1],latind[1],1), count = c(length(lonind),length(latind),-1))
    CLIMATE = image.plot(Ak[, , 1])

    Reply
    • Hey Twork,

      Is the error message returned after the first line or the second line of code? What is returned when you run head(Ak) and dim(Ak) ?

      Regards,
      Joachim

      Reply
  • Dear Joachim,

    I find your website very useful , but I’m stuck concerning an R analysis. I have a data frame including a column with countries (150+) and all have different rows associated with them regarding different years. In order to create a loop for future calculations for each countries, how can I index my data per countries? I need one index for each country, but the same index for all the rows regarding the same countries.

    Thank you in advance !

    Best regards,
    Camille

    Reply
    • Hi Camille,

      First of all, thank you very much for the very kind words! Glad you find my tutorials helpful! 🙂

      Regarding your question, could you please illustrate in an example how the first few rows of your current data look like, and how the data should look like at the end? This would help to understand your question better.

      Regards,
      Joachim

      Reply
  • Hi Joachim,

    Love how you are help people out! 🙂

    Im not really a statistic, but i do genetics and are now stuck.. I need to do a 5-fold cross validation with my model running as a BGLR. I do not code, so im really lost in how to figure this out.

    Thank you in advance!

    Regards,
    Kathrine

    Reply
    • Hey Kathrine,

      Thanks a lot for the very kind words, glad you like my website! 🙂

      Unfortunately, I’m not an expert on this topic. However, I have recently created a Facebook discussion group where people can ask questions about R programming and statistics. Could you post your question there? This way, others can contribute/read as well: https://www.facebook.com/groups/statisticsglobe

      Regards,
      Joachim

      Reply
  • how are you sir
    I want to find code Comparing five dissimilarity measures by ( adjusted Rand index) based on my data to find best one to my data

    Reply
    • Hey Ahmed,

      Please excuse the late response. I was on a long holiday so unfortunately I wasn’t able to reply sooner. Still need help with your code?

      Regards,
      Joachim

      Reply
  • lm_info <- lm(t0.5 ~ AUC, data = data_all), (na.rm = TRUE)

    above is my commant in python R and showing below error can u suggest the solution

    Error: unexpected ',' in "lm_info <- lm(t0.5 ~ AUC, data = data_all),"

    Reply
  • Hi Joachim,
    I really appreciate how you are helping people here with Statistics in R.

    I have a doubt with:

    GMAT scores of applicants to a Business Analytics course

    Interval

    501-550

    551-600

    601-650

    651-700

    701-750

    751-800

    Corresponding Frequency

    25

    40

    42

    31

    26

    21

    What is the median class of the above GMAT scores?

    After the second round of applications, this data table was updated. 13 students were added to the Interval551-600 and 22 students were added to 701-750. What is the modal class of the new data?
    I am able to do this by conventional method.But how would one do this in R?

    I am very new with R, so I would really appreciate it if you could help me with this.Thanks!

    Reply
    • Hello Sherry,

      First of all, thank you so much for your kind words. I am not so familiar with the concepts that you mentioned. Based on my small research, I could come up with the following solution.

      For the median class, you can do the following:

      intervals <- c("501-550", "551-600", "601-650", "651-700", "701-750", "751-800")
      frequencies <- c(25, 40, 42, 31, 26, 21) # define data
       
      cumulative_frequencies <- cumsum(frequencies) # calculate cumulative frequencies
      cumulative_frequencies
      # [1]  25  65 107 138 164 185
       
      median_position <- sum(frequencies) / 2 # find median position
      # [1] 92.5
       
      median_class_index <- which(cumulative_frequencies >= median_position)[1] # position of median class
      median_class <- intervals[median_class_index] # retrieve respective interval 
      median_class
      # [1] "601-650"

      For modal class, you can do the following in R:

      #Update data
      frequencies[2] <- frequencies[2] + 13  # For 551-600
      frequencies[5] <- frequencies[5] + 22  # For 701-750 
       
      modal_class_index <- which.max(frequencies) # position of modal class
      modal_class <- intervals[modal_class_index] # retrieve respective interval
      modal_class
      # [1] "551-600"

      I hope I didn’t make a conceptual mistake. You can play with the code to correct it, if so. If you need something additional, let me know.

      Best,
      Cansu

      Reply
  • Adekunle Odunayo Ajiboye
    October 11, 2023 8:17 pm

    Hi. Thanks so much for the code you put up here. I don’t no why the data points are not showing on my pca plots. I have 20 samples with 34034 gene counts each

    Reply
  • stinkyfoot gopherson
    January 12, 2024 4:17 pm

    hi jo can i get a shout out in your next video?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top