Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textmodel integration #250

Open
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

Chandu-4444
Copy link
Contributor

Can do the following:

lm = FastText.LanguageModel(true)
classifier = FastText.TextClassifier(lm)
FastText.train_classifier!(classifier) # Would throw an error as I haven't fully enabled the model to work with FastAI's data container.

@lorenzoh
Copy link
Member

I think you've readded some files from TextModels.jl that we don't need, could you remove those? 🙂

@Chandu-4444
Copy link
Contributor Author

Sure! Will clean up in the next commit.

@Chandu-4444
Copy link
Contributor Author

  • Deleted the files from TextModels.jl that are unnecessary.
  • Add a utility function for padding the batches for sequence data (Very naively implemented, as a starting point though)
julia> batches = FastText.load_batchseq(data, task)
julia> batches[1][1]
92-element Vector{Vector{Int64}}:
 [25000, 25000, 25000, 25000, 25000, 25000, 25000, 25000]
 [633779, 633779, 633779, 633779, 633779, 633779, 633779, 633779]
 [2731, 34, 315, 354, 2087, 2209, 70, 1307]
 [44047, 435, 633779, 633779, 6589, 633779, 633779, 205]
 
 [0, 0, 0, 0, 0, 213, 0, 0]
 [0, 0, 0, 0, 0, 25, 0, 0]
 [0, 0, 0, 0, 0, 1778, 0, 0]

julia> batches[1][2]
8-element Vector{Int64}:
 1
 1
 1
 1
 1
 0
 1
 1

@Chandu-4444
Copy link
Contributor Author

  • fastai's way of encoding data doesn't include the removal of stop words (and Jeremy recommends it). So, I removed the stop word removal step.
  • I've added a vocab_size keyword argument to TextClassificationSingle.
  • Added <unk>, <pad> to the vocabulary.

Next:

  • A batch loader for textdata with padding length = max(sentence length in a batch).

@ToucheSir
Copy link
Member

Should the vocab CSV files be checked in? I would've assumed they would be artifacts or DataDeps as well.

Copy link
Member

@ToucheSir ToucheSir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Porting the TextModels code is going to be a big task, but for now the priority is just to make things work e2e :)

FastText/src/models/train_text_classifier.jl Outdated Show resolved Hide resolved
FastText/src/models/pretrain_lm.jl Outdated Show resolved Hide resolved
Chain(
de,
VarDrop(in_drop_prob),
AWD_LSTM(embedding_size, hid_lstm_sz, hid_drop_prob; init = (dims...) -> init_weights(1/hid_lstm_sz, dims...)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I think this repeated lambda should be pulled out into a helper function of some sort.

FastText/src/models/pretrain_lm.jl Outdated Show resolved Hide resolved
FastText/src/models/pretrain_lm.jl Outdated Show resolved Hide resolved
FastText/src/models/dataloader.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
Comment on lines 204 to 218
function asgd_step!(iter::Integer, layer::AWD_LSTM)
if (iter >= layer.T) & (layer.T > 0)
p = get_trainable_params([layer])
avg_fact = 1/max(iter - layer.T + 1, 1)
if avg_fact != 1
layer.accum = layer.accum .+ p
for (ps, accum) in zip(p, layer.accum)
ps .= avg_fact*accum
end
else
layer.accum = deepcopy(p) # Accumulator for ASGD
end
end
return
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lorenzoh I don't believe we have any utils for this in Flux or FastAI, right? Haven't kept up with the list of callbacks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think so. This also seems very model-specific so it’s maybe best to keep it as part of the model.

FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
@Chandu-4444
Copy link
Contributor Author

Chandu-4444 commented Aug 8, 2022

julia> data, blocks = load(datarecipes()["imdb"])
((mapobs(loadfile, ObsView(::MLDatasets.FileDataset{typeof(identity), String}, ::Vector{Int64})), mapobs(parentname, ObsView(::MLDatasets.FileDataset{typeof(identity), String}, ::Vector{Int64}))), (Paragraph(), Label{String}(["neg", "pos"])))

julia> task = TextClassificationSingle(blocks, data)
SupervisedTask(Paragraph -> Label{String})

julia> model = FastAI.taskmodel(task, FastText.LanguageModel(false, task))
#90 (generic function with 1 method)

julia> batches = FastText.load_batchseq(data, task)
6250-element Vector{Tuple{Vector{Vector{Int64}}, WARNING: both Losses and NNlib export "ctc_loss"; uses of it in module Flux must be qualified
Flux.OneHotArray{UInt32, 2, 1, 2, Vector{UInt32}}}}:
 ([[35, 35, 35, 35], [3, 3, 3, 9], [40, 18025, 15, 14], [224, 10, 3541, 3040], [737, 34, 24, 505], [49, 7, 809, 3], [4, 4, 221, 3836], [1927, 104, 4, 
3], [7, 16, 629, 28440], [6, 351, 7, 17]    [2, 2, 2, 44], [2, 2, 2, 3], [2, 2, 2, 9839], [2, 2, 2, 17], [2, 2, 2, 1041], [2, 2, 2, 27], [2, 2, 2, 3], [2, 2, 2, 3836], [2, 2, 2, 3], [2, 2, 2, 28440]], [0 0 1 1; 1 1 0 0])

julia> using FluxTraining

julia> td, vd = splitobs(batches, at=0.9)

julia> using Flux

julia> learner = Learner(model, Flux.Losses.logitcrossentropy, callbacks=[Metrics(accuracy)]; data=(td, vd))
Learner()

julia> fit!(learner, 1)
Epoch 1 TrainingPhase():   0%||  ETA: 4 days, 3:35:31 

Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewing the weight dropped LSTM for now

Comment on lines +51 to +52
h::S
c::S
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are now replaced by state0 and stored in Recur

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is still outstanding.

FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More changes from the call

FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
true,
(Flux.zeros32(out, 1), Flux.zeros32(out, 1))
)
cell.b[gate(out, 2)] .= 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it only mask part of the bias @Chandu-4444?

Copy link
Contributor Author

@Chandu-4444 Chandu-4444 Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is ported from TextModels.jl I'm not completely sure of this. Even Flux.jl has this line this.
Would be helpful if someone could explain the reason.

FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
testmode!(m::DroppedEmbeddings, mode=true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)

function reset_masks!(de::DroppedEmbeddings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overload Flux.reset! here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping this comment

FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
Comment on lines 172 to 188
mutable struct VarDrop{F}
p::F
mask
reset::Bool
active::Bool
end

VarDrop(p::Float64=0.0) = VarDrop(p, Array{Float32, 2}(UndefInitializer(), 0, 0), true, true)

function (vd::VarDrop)(x)
vd.active || return x
if vd.reset
vd.mask = drop_mask(x, vd.p)
vd.reset = false
end
return (x .* vd.mask)
end
Copy link
Member

@ToucheSir ToucheSir Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As promised, here's the proposal from our call today:

mutable struct VarDropCell{F}
    p::F
    active::Union{Bool, Nothing} # matches other norm layers
end
@functor VarDropCell

VarDropCell(p::Real=0.0) = VarDropCell(p, nothing)

function (vd::VarDropCell)((has_mask, mask), x)
    if Flux._isactive(vd)
        mask = has_mask ? mask : drop_mask(x, vd.p)
        return (true, mask), x .* mask
    elseif !has_mask
        return (has_mask, mask), x
    else
        error("Mask set but layer is in test mode. Call `reset!` to clear the mask.")
    end
end

# The single-element array keeps Recur happy.
# Limitation: typeof(p) must == typeof(<inputs>)
VarDrop(p::Real) = Recur(VarDropCell(p), ones(typeof(p), 1, 1))

Plus bits to make testmode!, _isactive and reset! work properly.

@Chandu-4444
Copy link
Contributor Author

The changes have been merged to #258.

Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this

Comment on lines +51 to +52
h::S
c::S
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is still outstanding.

FastText/src/models/train_text_classifier.jl Outdated Show resolved Hide resolved
FastText/src/models/train_text_classifier.jl Show resolved Hide resolved
Comment on lines 204 to 218
function asgd_step!(iter::Integer, layer::AWD_LSTM)
if (iter >= layer.T) & (layer.T > 0)
p = get_trainable_params([layer])
avg_fact = 1/max(iter - layer.T + 1, 1)
if avg_fact != 1
layer.accum = layer.accum .+ p
for (ps, accum) in zip(p, layer.accum)
ps .= avg_fact*accum
end
else
layer.accum = deepcopy(p) # Accumulator for ASGD
end
end
return
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think so. This also seems very model-specific so it’s maybe best to keep it as part of the model.

FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the issues that you posted on Zulip should be resolved by this

FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
Comment on lines 102 to 103
maskWi = Flux.dropout_mask(Flux.rng_from_array(), layer.cell.Wi, layer.cell.p)
maskWh = Flux.dropout_mask(Flux.rng_from_array(), layer.cell.Wh, layer.cell.p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
maskWi = Flux.dropout_mask(Flux.rng_from_array(), layer.cell.Wi, layer.cell.p)
maskWh = Flux.dropout_mask(Flux.rng_from_array(), layer.cell.Wh, layer.cell.p)
maskWi = Flux.dropout_mask(Flux.rng_from_array(layer.cell.Wi), layer.cell.Wi, layer.cell.p)
maskWh = Flux.dropout_mask(Flux.rng_from_array(layer.cell.Wh), layer.cell.Wh, layer.cell.p)

FastText/src/models/custom_layers.jl Outdated Show resolved Hide resolved
testmode!(m::DroppedEmbeddings, mode=true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)

function reset_masks!(de::DroppedEmbeddings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping this comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants