Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Constructing public MS2 datbase using metid #4

Open
liaochenlanruo opened this issue May 8, 2023 · 2 comments
Open

Error in Constructing public MS2 datbase using metid #4

liaochenlanruo opened this issue May 8, 2023 · 2 comments

Comments

@liaochenlanruo
Copy link

Hi Shen,
When I try to construct public MS2 database using metid, some errors happened as below (someone else also suffered this error : R package:metID(六):代谢物的鉴定), any idea to deal with that? Thanks!

Error in `dplyr::select()`: 
! Can't subset columns that don't exist.Column `Name` doesn't exist. 
Run `rlang::last_trace()` to see where the error occurred. 
> rlang::last_trace() 
<error/vctrs_error_subscript_oob>
 Error in `dplyr::select()`:
 ! Can't subset columns that don't exist. 
✖ Column `Name` doesn't exist.
--- 
Backtrace:1. ├─metid::construct_mona_database(...)  
2. │ └─metabolite_info %>% ...
3. ├─dplyr::select(...)  
4. └─dplyr:::select.data.frame(...) 
Run rlang::last_trace(drop = FALSE) to see 18 hidden frames. 
> rlang::last_trace(drop = FALSE) 
<error/vctrs_error_subscript_oob> 
Error in `dplyr::select()`: 
! Can't subset columns that don't exist.Column `Name` doesn't exist.
--- 
Backtrace: 

1. ├─metid::construct_mona_database(...)  
2. │ └─metabolite_info %>% ...  
3. ├─dplyr::select(...)  
4. ├─dplyr:::select.data.frame(...)  
5. │ └─tidyselect::eval_select(expr(c(...)), data = .data, error_call = error_call)  
6. │   └─tidyselect:::eval_select_impl(...)  
7. │     ├─tidyselect:::with_subscript_errors(...)  
8. │     │ └─rlang::try_fetch(...)  
9. │     │   └─base::withCallingHandlers(...)  
10. │     └─tidyselect:::vars_select_eval(...)  
11. │       └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)  
12. │         └─tidyselect:::eval_c(expr, data_mask, context_mask)  
13. │           └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)  
14. │             └─tidyselect:::walk_data_tree(new, data_mask, context_mask)  
15. │               └─tidyselect:::as_indices_sel_impl(...)  
16. │                 └─tidyselect:::as_indices_impl(...)  
17. │                   └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg) 
18. │                     └─vctrs::vec_as_location(...)  
19. └─vctrs (local) `<fn>`()  
20.  └─vctrs:::stop_subscript_oob(...)  
21.  └─vctrs:::stop_subscript(...)  
22.  └─rlang::abort(...)
@jaspershen
Copy link
Member

Please use massdatabase for the public database construction.
You can install massdatabase here: https://massdatabase.tidymass.org/index.html

data <- 
  massdatabase::read_msp_data("MoNA-export-CASMI_2016.msp", source = "mona")

massdatabase::convert_mona2metid(data = data, path = ".", threads = 3)

@ShawnWx2019
Copy link

I encountered the same issue. After parsing step by step, I found that it might be the problem with the .mgf files provided by the MoNA database. For instance, some .mgf files do not provide collision energy, and some positive and negative spectra are not marked properly. For example, there are spaces before "p" labels => p. Therefore, when MetID parses and converts these .mgf files, will report errors.

Then, I modified the construct_mona_database() function to address these two issues.

construct_mona_database2 = function(
    file, only.remain.ms2 = TRUE, path = ".", version = "0.0.1", 
    source = "MoNA", link = "https://mona.fiehnlab.ucdavis.edu/", 
    creater = "Xiaotao Shen", email = "[email protected]", rt = FALSE, 
    threads = 5
    ) {
  mona_database = read_msp_mona(file = file)
  
  #> Issue1: set rownames of mona_database[[i]]$info
    mona_database = purrr::map(mona_database,function(x){
    x$info = data.frame(
      row.names = x$info$info,
      value = x$info$value
    )
    db = list(info = x$info,
              spec = x$spec)
    return(db)
    })
    
    all_metabolite_names = purrr::map(mona_database, function(x) {
        rownames(x$info)
    }) %>% unlist() %>% unique()
    metabolite_info = mona_database %>% purrr::map(function(x) {
        x = as.data.frame(x$info)
        new_x = x[, 1]
        names(new_x) = rownames(x)
        new_x = new_x[all_metabolite_names]
        names(new_x) = all_metabolite_names
        new_x
    }) %>% do.call(rbind, .) %>% as.data.frame()
    colnames(metabolite_info) = all_metabolite_names
    if (only.remain.ms2) {
        remain_idx = which(metabolite_info$Spectrum_type == "MS2")
        metabolite_info = metabolite_info[remain_idx, ]
        mona_database = mona_database[remain_idx]
    }
    metabolite_info = 
      metabolite_info %>% 
      dplyr::select(
        Compound.name = Name, 
        mz = ExactMass, Formula,
        MoNA.ID = `DB#`, 
        dplyr::everything()
        )
    metabolite_info = 
      metabolite_info %>% 
      dplyr::mutate(
        Lab.ID = paste("MoNA", seq_len(nrow(metabolite_info)), sep = "_"), 
        RT = NA, 
        CAS.ID = NA, 
        HMDB.ID = NA, 
        KEGG.ID = NA, 
        mz.pos = NA, 
        mz.neg = NA, 
        Submitter = "MoNA", 
        Family = NA, 
        Sub.pathway = NA, 
        Note = NA) %>% 
      dplyr::select(
        Lab.ID, 
        Compound.name, 
        mz, 
        RT, 
        CAS.ID, 
        HMDB.ID, 
        KEGG.ID, 
        Formula, 
        mz.pos, 
        mz.neg, 
        Submitter, 
        Family, 
        Sub.pathway, 
        Note, 
        dplyr::everything()
        )
    #> Issue2: Collision_energy
     if(!"Collision_energy"%in%colnames(metabolite_info)) {
       metabolite_info$Collision_energy = NA
       }
    
    metabolite_info$Collision_energy[is.na(metabolite_info$Collision_energy)] = "not_available"
    metabolite_info$Collision_energy[metabolite_info$Collision_energy == ""] = "not_available"
    
    #> Issue3: Ion_mode
    metabolite_info = 
    metabolite_info %>% 
    mutate(Ion_mode = 
      case_when(
        str_detect(Ion_mode,regex("P",ignore_case = T)) ~ "P",
        str_detect(Ion_mode,regex("N",ignore_case = T)) ~ "N"
      )
    )
    
    positive_idx = which(metabolite_info$Ion_mode == "P")
    negative_idx = which(metabolite_info$Ion_mode == "N")
    Spectra.positive = mona_database[positive_idx]
    Spectra.negative = mona_database[negative_idx]
    names(Spectra.positive) = metabolite_info$Lab.ID[positive_idx]
    names(Spectra.negative) = metabolite_info$Lab.ID[negative_idx]
    Spectra.positive = purrr::map2(.x = Spectra.positive, .y = metabolite_info$Collision_energy[positive_idx], 
        .f = function(x, y) {
            x = x$spec
            x = list(x)
            names(x) = y
            x
        })
    Spectra.negative = purrr::map2(.x = Spectra.negative, .y = metabolite_info$Collision_energy[negative_idx], 
        .f = function(x, y) {
            x = x$spec
            x = list(x)
            names(x) = y
            x
        })
    database.info <- list(Version = version, Source = source, 
        Link = link, Creater = creater, Email = email, RT = rt)
    spectra.info <- as.data.frame(metabolite_info)
    rm(list = "metabolite_info")
    Spectra <- list(Spectra.positive = Spectra.positive, Spectra.negative = Spectra.negative)
    database <- new(Class = "databaseClass", database.info = database.info, 
        spectra.info = spectra.info, spectra.data = Spectra)
    database@database.info$RT <- ifelse(all(is.na(database@spectra.info$RT)), 
        FALSE, TRUE)
    message(crayon::bgRed("All done!\n"))
    return(database)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants